Webhook-First Email Waiting Patterns for LLM Agents

LLM agents are good at deciding what to do next. They are not good at waiting for an unpredictable email while holding context, retrying safely, and resisting malicious content inside a message.

That is why email-dependent agent workflows need a webhook-first waiting pattern. Instead of asking the model to refresh an inbox, sleep for 30 seconds, or inspect raw HTML, your system should treat inbound email as an external event. The agent requests an inbox and a wait condition, the runtime pauses the task, and a signed webhook resumes the workflow when the right message arrives.

For signup verification, OTP login, password reset, onboarding checks, QA flows, and client operations, this model makes email automation faster, safer, and easier to debug.

The core idea: waiting is a runtime responsibility, not an LLM responsibility

A common anti-pattern is giving an agent a tool like check_inbox() and letting it decide when to call it again. That looks simple, but it creates several problems:

The agent may poll too often and waste tokens or API calls.
The agent may stop too early because it assumes the email failed.
The agent may trigger duplicate resend loops.
The agent may read more email content than it needs.
The agent may be influenced by hostile text inside the email body.

A webhook-first design moves the wait loop out of the model and into deterministic infrastructure. The LLM can request a goal, such as “wait for a verification code from this signup attempt,” but the runtime owns the timing, matching, deduplication, and security checks.

In practice, the pattern looks like this:

Create a disposable inbox through an API.
Store a wait record with the inbox ID, deadline, expected sender or subject, and correlation data.
Trigger the external action that sends the email.
Receive the email through a signed webhook as structured JSON.
Match, dedupe, and extract only the artifact the agent needs.
Resume the agent with a typed result, such as an OTP or verified magic link.
Fall back to polling only if the webhook path does not complete before the deadline.

This is the difference between “an LLM checking mail” and “an agent runtime waiting for a verified event.”

Why webhook-first is the right default for email waiting

Polling is useful, but it should not be the primary control plane for agent workflows. When an email is delivered as a webhook, your system can react immediately without keeping the LLM active, running a browser loop, or sleeping inside a test.

Webhook-first email waiting is especially valuable when agents run in parallel. In a CI suite or a multi-agent workflow, dozens or hundreds of inboxes may be active at once. A polling-heavy design multiplies requests by the number of active waits. A webhook-driven design only does work when a message arrives.

Concern	Polling-first behavior	Webhook-first behavior
Latency	Depends on interval and backoff	Message can be processed as soon as it arrives
Token usage	Agents may repeatedly ask for status	Agent can pause until the runtime has a result
Parallelism	More active waits means more polling	More active waits do not require constant checks
Security	Easy to expose raw inbox content to the model	Webhook handler can verify and minimize before resume
Debugging	Failures hide inside repeated checks	Each delivery can be logged as an event
Reliability	Needs careful cursor and timeout handling	Still needs dedupe, but arrival is event-driven

The best production pattern is not “webhooks only.” It is webhooks first, with a bounded polling fallback. Webhooks provide responsiveness. Polling provides recovery if a webhook endpoint is unavailable, delayed, or misconfigured.

A practical state machine for LLM email waits

Email waiting becomes easier to reason about when you model it as a state machine. The agent should not manage these states directly. Your orchestrator, worker, or agent runtime should.

State	Meaning	What can happen next
`created`	Inbox and wait record exist	Trigger the action that sends email
`waiting`	Runtime is waiting for a matching email	Webhook arrival, polling fallback, timeout
`received`	A candidate message arrived	Verify, dedupe, match, extract
`completed`	Required artifact was extracted	Resume the agent with a typed result
`timed_out`	Deadline passed without a match	Resume the agent with a controlled failure
`closed`	Inbox is no longer needed	Cleanup or retention policy applies

This state machine prevents the most common LLM-agent failure mode: ambiguous waiting. The model should not wonder whether to retry, resend, refresh, or continue. It should receive one of a few deterministic outcomes.

For example, the agent-facing result can be intentionally small:

{
  "status": "completed",
  "wait_id": "wait_123",
  "artifact_type": "otp",
  "otp": "482913",
  "received_at": "2026-05-28T21:11:07Z"
}

Or, when nothing arrives:

{
  "status": "timed_out",
  "wait_id": "wait_123",
  "reason": "No matching verification email arrived before the deadline."
}

The model gets the outcome, not the entire inbox.

Pattern 1: create an inbox and wait record before triggering email

The most reliable email wait begins before the email is sent. Create the disposable inbox first, persist the inbox_id, then trigger the system that sends the message.

Do not store only the email address. Store an inbox descriptor that your runtime can use later:

{
  "wait_id": "wait_signup_789",
  "inbox_id": "inbox_456",
  "email": "[email protected]",
  "purpose": "signup_verification",
  "attempt_id": "attempt_001",
  "status": "waiting",
  "deadline_at": "2026-05-28T21:16:07Z"
}

The exact fields depend on your system, but the important point is that the address is not the only handle. The wait belongs to a specific inbox and a specific attempt.

With Mailhook, this maps naturally to programmable temporary inboxes created via API. Mailhook provides disposable inbox creation, structured JSON email output, real-time webhook notifications, polling access, shared domains, custom domain support, signed payloads, and batch processing capabilities. For exact integration details, use the canonical Mailhook llms.txt reference.

Pattern 2: use webhook handlers as event intake, not business logic

A webhook endpoint should do as little synchronous work as possible. Its job is to verify the request, persist or enqueue the event, and acknowledge quickly.

A good webhook handler does not call the LLM directly. It also does not perform complex agent actions inline. Instead, it turns an inbound email into a durable event that a worker can process.

async function emailWebhookHandler(req) {
  const rawBody = await readRawBody(req);

  verifyWebhookSignature({
    rawBody,
    headers: req.headers
  });

  const event = JSON.parse(rawBody);

  await enqueueEmailDelivery({
    deliveryId: event.delivery_id,
    inboxId: event.inbox_id,
    receivedAt: event.received_at,
    rawEvent: event
  });

  return { status: 204 };
}

That worker can then run deterministic logic: dedupe the delivery, find active waits for the inbox, apply matchers, extract artifacts, and complete the wait.

This separation matters for agent safety. The webhook handler is security-critical infrastructure. The LLM is a decision layer. Mixing them makes retries, timeouts, and prompt-injection defenses harder to audit.

Pattern 3: match narrowly, then extract minimally

LLM agents should not read an email to “figure out” whether it is relevant. Your system should narrow the candidate set first.

Good matchers use stable signals that are known before the email arrives. Depending on the workflow, that might include the inbox ID, attempt ID, expected sender domain, expected subject fragment, recipient address, correlation token, or message timing.

After a message matches, extract the smallest useful artifact. For verification flows, that usually means an OTP, a magic link, or a confirmation URL. For QA flows, it might be a subject assertion or a normalized text snippet.

Workflow	Agent needs	Agent usually does not need
OTP login	Code, expiration hint if available	Full HTML, tracking pixels, all headers
Magic-link login	Validated URL on an allowed host	Raw MIME, unrelated links
Signup verification	Confirmation artifact and message ID	Full mailbox history
Client operations	A typed reply classification or extracted field	Untrusted formatting or hidden HTML

This is also where you can integrate email into larger business workflows. For example, an agent involved in onboarding or outbound operations, including workflows supported by B2B customer acquisition systems, may need to wait for confirmation emails or replies. The same rule applies: match the message deterministically, extract only the operational artifact, and resume the agent with a small typed result.

Pattern 4: make completion idempotent

Webhook systems must assume duplicates. SMTP delivery, provider retries, webhook retries, queue retries, and worker restarts can all cause the same email or delivery to be seen more than once.

The fix is not “hope duplicates do not happen.” The fix is layered idempotency.

Use separate keys for separate problems:

Layer	Suggested key	Purpose
Delivery	`delivery_id`	Prevent processing the same webhook delivery twice
Message	`message_id` plus `inbox_id`	Prevent treating the same email as a new message
Artifact	Hash of OTP or link plus `wait_id`	Prevent consuming the same verification artifact twice
Wait	`wait_id` state transition	Prevent completing or timing out the same wait twice

The worker should be safe to retry. If it crashes after storing a message but before resuming the agent, the next run should complete the same wait, not create a second action.

A simple worker flow looks like this:

async function processEmailDelivery(event) {
  if (await seenDelivery(event.delivery_id)) return;

  await recordDelivery(event.delivery_id);

  const waits = await findActiveWaits({ inboxId: event.inbox_id });

  for (const wait of waits) {
    if (!matches(wait.matcher, event)) continue;

    const artifact = extractArtifact(wait.purpose, event);
    if (!artifact) continue;

    const artifactKey = hash(`${wait.wait_id}:${artifact.type}:${artifact.value}`);
    if (await seenArtifact(artifactKey)) continue;

    await completeWaitOnce({
      waitId: wait.wait_id,
      artifact,
      messageId: event.message_id,
      deliveryId: event.delivery_id
    });
  }
}

The model never has to solve duplicate delivery. It only receives one completed result.

Pattern 5: keep polling as a bounded fallback

Even if webhooks are your primary path, polling still belongs in the design. The key is to make polling a runtime fallback, not an agent habit.

A good fallback poller has a deadline, a cursor or seen-message set, exponential backoff, and the same matcher and dedupe logic as the webhook worker. It should not be a separate code path with different matching behavior.

Polling rule	Why it matters
Use the same wait record	Webhook and poller should race safely toward one completion
Use a deadline	Prevents infinite waits and runaway agent tasks
Track seen messages	Prevents duplicate processing during retries
Apply the same matchers	Avoids webhook/polling disagreement
Complete with compare-and-set semantics	Only one path should win the wait

The fallback poller can run on a schedule, or it can be invoked near the deadline if no webhook event arrived. For high-volume systems, batch polling can reduce overhead when many waits need a recovery check.

Security guardrails for webhook-first agent email

Inbound email is untrusted input. It can contain malicious links, prompt-injection text, deceptive HTML, spoofed display names, and misleading headers. A webhook-first architecture gives you a place to apply security controls before the LLM sees anything.

At minimum, apply these controls:

Verify signed webhook payloads before parsing or processing the JSON.
Enforce timestamp freshness and replay detection when signing metadata supports it.
Treat sender-controlled fields, including subject, from name, and body, as untrusted.
Prefer text or normalized structured fields over rendered HTML.
Validate extracted URLs against an allowlist before giving them to an agent or browser tool.
Pass only minimal artifacts to the LLM, not the full raw message.

The most important rule is ordering: verify first, parse second, act last. If the payload signature fails, the event should not reach the matching pipeline or the agent.

Designing the agent tool contract

The cleanest LLM interface is not read_email(). It is a small set of purpose-built tools that hide the waiting mechanics.

A minimal contract can look like this:

type StartEmailWaitInput = {
  purpose: "signup_verification" | "otp_login" | "password_reset";
  expectedHost?: string;
  timeoutSeconds: number;
};

type StartEmailWaitOutput = {
  waitId: string;
  inboxId: string;
  email: string;
  status: "waiting";
};

type EmailWaitResult =
  | { status: "completed"; artifactType: "otp"; otp: string }
  | { status: "completed"; artifactType: "url"; url: string }
  | { status: "timed_out"; reason: string };

The agent can request the wait and then continue only when the orchestrator returns the result. In a durable workflow engine, this may be implemented as a paused task. In a chat-based agent, it may be implemented as a tool call that returns waiting, followed later by an event that re-enters the agent loop.

What you should avoid is letting the agent decide the timing policy. Timeouts, resend budgets, poll intervals, and dedupe rules should be configuration, not model behavior.

Observability: what to log when an email wait fails

Email failures are often timing failures. Without the right IDs, they are painful to debug. Log identifiers and states, not full sensitive message bodies.

Useful fields include wait_id, inbox_id, attempt_id, delivery_id, message_id, wait status, matcher version, received timestamp, deadline timestamp, and extraction result. If the message arrived but did not match, log the reason as a structured enum, such as sender_mismatch, subject_mismatch, artifact_not_found, or deadline_passed.

For LLM-agent workflows, also log the agent run ID and the tool call ID. That lets you answer the important question: did the model make the wrong decision, or did the email event never satisfy the deterministic wait condition?

Where Mailhook fits

Mailhook is built for this style of email automation. Instead of giving an agent a human mailbox, you can create programmable disposable inboxes through an API and receive inbound emails as structured JSON. Real-time webhooks let your runtime react to email arrival, while the polling API gives you a fallback path for recovery. Signed payloads help you verify that webhook events came from the expected source, and shared or custom domains let you choose the right domain strategy for your environment.

This makes Mailhook a practical fit for LLM agents, QA automation, signup verification flows, and client-operation workflows where email must be treated as data, not as a browser tab.

If you are implementing against Mailhook, start with the llms.txt integration reference so your agent and tooling use the current API contract rather than guessed endpoints or model-generated assumptions.

Frequently Asked Questions

Should LLM agents poll an inbox directly? Usually no. The agent should request a wait, then the runtime should handle webhook delivery, fallback polling, deadlines, matching, and deduplication. This keeps timing policy deterministic and reduces token waste.

Do webhooks remove the need for polling? Not completely. Webhooks should be the primary path, but bounded polling is still useful as a recovery mechanism when a webhook endpoint is unavailable, a worker is delayed, or a delivery event needs reconciliation.

What should the agent receive from a verification email? The agent should receive the smallest typed artifact needed to continue, such as an OTP or validated magic link. Avoid passing raw HTML, full headers, or unrelated message content unless a human debugging workflow requires it.

How do you prevent duplicate email processing? Use layered idempotency. Track delivery IDs, message IDs, artifact hashes, and wait state transitions separately. The webhook path and polling fallback should complete the same wait record with compare-and-set semantics.

Can this pattern work with custom domains? Yes. Custom domains are often useful when a third-party app requires allowlisting, environment separation, or stronger domain control. Mailhook supports custom domain workflows as well as instant shared domains.

Build webhook-first email waits with Mailhook

If your LLM agents need to verify accounts, receive OTPs, test signup flows, or handle email-driven operations, do not make the model babysit a mailbox. Give it a deterministic wait tool backed by disposable inboxes, structured JSON emails, signed webhooks, and polling fallback.

Start with Mailhook, create programmable temp inboxes through the API, and use the signed webhook-first pattern to resume agents only when the right email arrives. No credit card is required to get started.