LLM agents are good at deciding what to do next. They are not good at waiting for an unpredictable email while holding context, retrying safely, and resisting malicious content inside a message.
That is why email-dependent agent workflows need a webhook-first waiting pattern. Instead of asking the model to refresh an inbox, sleep for 30 seconds, or inspect raw HTML, your system should treat inbound email as an external event. The agent requests an inbox and a wait condition, the runtime pauses the task, and a signed webhook resumes the workflow when the right message arrives.
For signup verification, OTP login, password reset, onboarding checks, QA flows, and client operations, this model makes email automation faster, safer, and easier to debug.
The core idea: waiting is a runtime responsibility, not an LLM responsibility
A common anti-pattern is giving an agent a tool like check_inbox() and letting it decide when to call it again. That looks simple, but it creates several problems:
- The agent may poll too often and waste tokens or API calls.
- The agent may stop too early because it assumes the email failed.
- The agent may trigger duplicate resend loops.
- The agent may read more email content than it needs.
- The agent may be influenced by hostile text inside the email body.
A webhook-first design moves the wait loop out of the model and into deterministic infrastructure. The LLM can request a goal, such as “wait for a verification code from this signup attempt,” but the runtime owns the timing, matching, deduplication, and security checks.
In practice, the pattern looks like this:
- Create a disposable inbox through an API.
- Store a wait record with the inbox ID, deadline, expected sender or subject, and correlation data.
- Trigger the external action that sends the email.
- Receive the email through a signed webhook as structured JSON.
- Match, dedupe, and extract only the artifact the agent needs.
- Resume the agent with a typed result, such as an OTP or verified magic link.
- Fall back to polling only if the webhook path does not complete before the deadline.
This is the difference between “an LLM checking mail” and “an agent runtime waiting for a verified event.”
Why webhook-first is the right default for email waiting
Polling is useful, but it should not be the primary control plane for agent workflows. When an email is delivered as a webhook, your system can react immediately without keeping the LLM active, running a browser loop, or sleeping inside a test.
Webhook-first email waiting is especially valuable when agents run in parallel. In a CI suite or a multi-agent workflow, dozens or hundreds of inboxes may be active at once. A polling-heavy design multiplies requests by the number of active waits. A webhook-driven design only does work when a message arrives.
| Concern | Polling-first behavior | Webhook-first behavior |
|---|---|---|
| Latency | Depends on interval and backoff | Message can be processed as soon as it arrives |
| Token usage | Agents may repeatedly ask for status | Agent can pause until the runtime has a result |
| Parallelism | More active waits means more polling | More active waits do not require constant checks |
| Security | Easy to expose raw inbox content to the model | Webhook handler can verify and minimize before resume |
| Debugging | Failures hide inside repeated checks | Each delivery can be logged as an event |
| Reliability | Needs careful cursor and timeout handling | Still needs dedupe, but arrival is event-driven |
The best production pattern is not “webhooks only.” It is webhooks first, with a bounded polling fallback. Webhooks provide responsiveness. Polling provides recovery if a webhook endpoint is unavailable, delayed, or misconfigured.
A practical state machine for LLM email waits
Email waiting becomes easier to reason about when you model it as a state machine. The agent should not manage these states directly. Your orchestrator, worker, or agent runtime should.
| State | Meaning | What can happen next |
|---|---|---|
created |
Inbox and wait record exist | Trigger the action that sends email |
waiting |
Runtime is waiting for a matching email | Webhook arrival, polling fallback, timeout |
received |
A candidate message arrived | Verify, dedupe, match, extract |
completed |
Required artifact was extracted | Resume the agent with a typed result |
timed_out |
Deadline passed without a match | Resume the agent with a controlled failure |
closed |
Inbox is no longer needed | Cleanup or retention policy applies |
This state machine prevents the most common LLM-agent failure mode: ambiguous waiting. The model should not wonder whether to retry, resend, refresh, or continue. It should receive one of a few deterministic outcomes.
For example, the agent-facing result can be intentionally small:
{
"status": "completed",
"wait_id": "wait_123",
"artifact_type": "otp",
"otp": "482913",
"received_at": "2026-05-28T21:11:07Z"
}
Or, when nothing arrives:
{
"status": "timed_out",
"wait_id": "wait_123",
"reason": "No matching verification email arrived before the deadline."
}
The model gets the outcome, not the entire inbox.
Pattern 1: create an inbox and wait record before triggering email
The most reliable email wait begins before the email is sent. Create the disposable inbox first, persist the inbox_id, then trigger the system that sends the message.
Do not store only the email address. Store an inbox descriptor that your runtime can use later:
{
"wait_id": "wait_signup_789",
"inbox_id": "inbox_456",
"email": "[email protected]",
"purpose": "signup_verification",
"attempt_id": "attempt_001",
"status": "waiting",
"deadline_at": "2026-05-28T21:16:07Z"
}
The exact fields depend on your system, but the important point is that the address is not the only handle. The wait belongs to a specific inbox and a specific attempt.
With Mailhook, this maps naturally to programmable temporary inboxes created via API. Mailhook provides disposable inbox creation, structured JSON email output, real-time webhook notifications, polling access, shared domains, custom domain support, signed payloads, and batch processing capabilities. For exact integration details, use the canonical Mailhook llms.txt reference.
Pattern 2: use webhook handlers as event intake, not business logic
A webhook endpoint should do as little synchronous work as possible. Its job is to verify the request, persist or enqueue the event, and acknowledge quickly.
A good webhook handler does not call the LLM directly. It also does not perform complex agent actions inline. Instead, it turns an inbound email into a durable event that a worker can process.
async function emailWebhookHandler(req) {
const rawBody = await readRawBody(req);
verifyWebhookSignature({
rawBody,
headers: req.headers
});
const event = JSON.parse(rawBody);
await enqueueEmailDelivery({
deliveryId: event.delivery_id,
inboxId: event.inbox_id,
receivedAt: event.received_at,
rawEvent: event
});
return { status: 204 };
}
That worker can then run deterministic logic: dedupe the delivery, find active waits for the inbox, apply matchers, extract artifacts, and complete the wait.
This separation matters for agent safety. The webhook handler is security-critical infrastructure. The LLM is a decision layer. Mixing them makes retries, timeouts, and prompt-injection defenses harder to audit.
Pattern 3: match narrowly, then extract minimally
LLM agents should not read an email to “figure out” whether it is relevant. Your system should narrow the candidate set first.
Good matchers use stable signals that are known before the email arrives. Depending on the workflow, that might include the inbox ID, attempt ID, expected sender domain, expected subject fragment, recipient address, correlation token, or message timing.
After a message matches, extract the smallest useful artifact. For verification flows, that usually means an OTP, a magic link, or a confirmation URL. For QA flows, it might be a subject assertion or a normalized text snippet.
| Workflow | Agent needs | Agent usually does not need |
|---|---|---|
| OTP login | Code, expiration hint if available | Full HTML, tracking pixels, all headers |
| Magic-link login | Validated URL on an allowed host | Raw MIME, unrelated links |
| Signup verification | Confirmation artifact and message ID | Full mailbox history |
| Client operations | A typed reply classification or extracted field | Untrusted formatting or hidden HTML |
This is also where you can integrate email into larger business workflows. For example, an agent involved in onboarding or outbound operations, including workflows supported by B2B customer acquisition systems, may need to wait for confirmation emails or replies. The same rule applies: match the message deterministically, extract only the operational artifact, and resume the agent with a small typed result.
Pattern 4: make completion idempotent
Webhook systems must assume duplicates. SMTP delivery, provider retries, webhook retries, queue retries, and worker restarts can all cause the same email or delivery to be seen more than once.
The fix is not “hope duplicates do not happen.” The fix is layered idempotency.
Use separate keys for separate problems:
| Layer | Suggested key | Purpose |
|---|---|---|
| Delivery | delivery_id |
Prevent processing the same webhook delivery twice |
| Message |
message_id plus inbox_id
|
Prevent treating the same email as a new message |
| Artifact | Hash of OTP or link plus wait_id
|
Prevent consuming the same verification artifact twice |
| Wait |
wait_id state transition |
Prevent completing or timing out the same wait twice |
The worker should be safe to retry. If it crashes after storing a message but before resuming the agent, the next run should complete the same wait, not create a second action.
A simple worker flow looks like this:
async function processEmailDelivery(event) {
if (await seenDelivery(event.delivery_id)) return;
await recordDelivery(event.delivery_id);
const waits = await findActiveWaits({ inboxId: event.inbox_id });
for (const wait of waits) {
if (!matches(wait.matcher, event)) continue;
const artifact = extractArtifact(wait.purpose, event);
if (!artifact) continue;
const artifactKey = hash(`${wait.wait_id}:${artifact.type}:${artifact.value}`);
if (await seenArtifact(artifactKey)) continue;
await completeWaitOnce({
waitId: wait.wait_id,
artifact,
messageId: event.message_id,
deliveryId: event.delivery_id
});
}
}
The model never has to solve duplicate delivery. It only receives one completed result.
Pattern 5: keep polling as a bounded fallback
Even if webhooks are your primary path, polling still belongs in the design. The key is to make polling a runtime fallback, not an agent habit.
A good fallback poller has a deadline, a cursor or seen-message set, exponential backoff, and the same matcher and dedupe logic as the webhook worker. It should not be a separate code path with different matching behavior.
| Polling rule | Why it matters |
|---|---|
| Use the same wait record | Webhook and poller should race safely toward one completion |
| Use a deadline | Prevents infinite waits and runaway agent tasks |
| Track seen messages | Prevents duplicate processing during retries |
| Apply the same matchers | Avoids webhook/polling disagreement |
| Complete with compare-and-set semantics | Only one path should win the wait |
The fallback poller can run on a schedule, or it can be invoked near the deadline if no webhook event arrived. For high-volume systems, batch polling can reduce overhead when many waits need a recovery check.
Security guardrails for webhook-first agent email
Inbound email is untrusted input. It can contain malicious links, prompt-injection text, deceptive HTML, spoofed display names, and misleading headers. A webhook-first architecture gives you a place to apply security controls before the LLM sees anything.
At minimum, apply these controls:
- Verify signed webhook payloads before parsing or processing the JSON.
- Enforce timestamp freshness and replay detection when signing metadata supports it.
- Treat sender-controlled fields, including subject, from name, and body, as untrusted.
- Prefer text or normalized structured fields over rendered HTML.
- Validate extracted URLs against an allowlist before giving them to an agent or browser tool.
- Pass only minimal artifacts to the LLM, not the full raw message.
The most important rule is ordering: verify first, parse second, act last. If the payload signature fails, the event should not reach the matching pipeline or the agent.
Designing the agent tool contract
The cleanest LLM interface is not read_email(). It is a small set of purpose-built tools that hide the waiting mechanics.
A minimal contract can look like this:
type StartEmailWaitInput = {
purpose: "signup_verification" | "otp_login" | "password_reset";
expectedHost?: string;
timeoutSeconds: number;
};
type StartEmailWaitOutput = {
waitId: string;
inboxId: string;
email: string;
status: "waiting";
};
type EmailWaitResult =
| { status: "completed"; artifactType: "otp"; otp: string }
| { status: "completed"; artifactType: "url"; url: string }
| { status: "timed_out"; reason: string };
The agent can request the wait and then continue only when the orchestrator returns the result. In a durable workflow engine, this may be implemented as a paused task. In a chat-based agent, it may be implemented as a tool call that returns waiting, followed later by an event that re-enters the agent loop.
What you should avoid is letting the agent decide the timing policy. Timeouts, resend budgets, poll intervals, and dedupe rules should be configuration, not model behavior.
Observability: what to log when an email wait fails
Email failures are often timing failures. Without the right IDs, they are painful to debug. Log identifiers and states, not full sensitive message bodies.
Useful fields include wait_id, inbox_id, attempt_id, delivery_id, message_id, wait status, matcher version, received timestamp, deadline timestamp, and extraction result. If the message arrived but did not match, log the reason as a structured enum, such as sender_mismatch, subject_mismatch, artifact_not_found, or deadline_passed.
For LLM-agent workflows, also log the agent run ID and the tool call ID. That lets you answer the important question: did the model make the wrong decision, or did the email event never satisfy the deterministic wait condition?
Where Mailhook fits
Mailhook is built for this style of email automation. Instead of giving an agent a human mailbox, you can create programmable disposable inboxes through an API and receive inbound emails as structured JSON. Real-time webhooks let your runtime react to email arrival, while the polling API gives you a fallback path for recovery. Signed payloads help you verify that webhook events came from the expected source, and shared or custom domains let you choose the right domain strategy for your environment.
This makes Mailhook a practical fit for LLM agents, QA automation, signup verification flows, and client-operation workflows where email must be treated as data, not as a browser tab.
If you are implementing against Mailhook, start with the llms.txt integration reference so your agent and tooling use the current API contract rather than guessed endpoints or model-generated assumptions.
Frequently Asked Questions
Should LLM agents poll an inbox directly? Usually no. The agent should request a wait, then the runtime should handle webhook delivery, fallback polling, deadlines, matching, and deduplication. This keeps timing policy deterministic and reduces token waste.
Do webhooks remove the need for polling? Not completely. Webhooks should be the primary path, but bounded polling is still useful as a recovery mechanism when a webhook endpoint is unavailable, a worker is delayed, or a delivery event needs reconciliation.
What should the agent receive from a verification email? The agent should receive the smallest typed artifact needed to continue, such as an OTP or validated magic link. Avoid passing raw HTML, full headers, or unrelated message content unless a human debugging workflow requires it.
How do you prevent duplicate email processing? Use layered idempotency. Track delivery IDs, message IDs, artifact hashes, and wait state transitions separately. The webhook path and polling fallback should complete the same wait record with compare-and-set semantics.
Can this pattern work with custom domains? Yes. Custom domains are often useful when a third-party app requires allowlisting, environment separation, or stronger domain control. Mailhook supports custom domain workflows as well as instant shared domains.
Build webhook-first email waits with Mailhook
If your LLM agents need to verify accounts, receive OTPs, test signup flows, or handle email-driven operations, do not make the model babysit a mailbox. Give it a deterministic wait tool backed by disposable inboxes, structured JSON emails, signed webhooks, and polling fallback.
Start with Mailhook, create programmable temp inboxes through the API, and use the signed webhook-first pattern to resume agents only when the right email arrives. No credit card is required to get started.