Email is one of the last “human-first” protocols that modern systems still depend on. If you’re building CI tests, QA automation, or LLM agents that must wait for a verification link or OTP, you quickly discover a painful truth: an email receive API is an eventually consistent event stream. Messages can arrive late, out of order, or more than once.
Reliability comes from accepting those semantics and designing around them: webhook delivery for speed, polling for recovery, and dedupe everywhere so retries do not become double-processing.
The reliability model: treat inbound email like an event stream
A robust email receive API design usually has three distinct concepts:
- Inbox: a short-lived container that isolates traffic for one workflow (for example, one signup attempt).
- Message: the normalized representation of an email (headers, bodies, attachments), often derived from raw RFC 5322 content.
- Delivery event: the provider’s attempt to notify you (webhook call) or your attempt to fetch (poll) that message.
Even if a provider exposes only “list messages”, the system behaves like an event stream under the hood: there is ingestion, normalization, storage, and delivery.
Two consequences matter for reliability:
- At-least-once is normal: webhook retries, queue retries, and client retries create duplicates.
- Ordering is not guaranteed: SMTP, provider pipelines, and internal retries can reorder arrival.
If your consumer assumes exactly-once and in-order delivery, you will get flakes.
If you want a quick reference for how Mailhook models this (disposable inboxes, JSON output, webhooks, polling), use the canonical integration contract in llms.txt.
Webhooks: best latency, most failure modes
Webhooks are the right default when you care about speed and scale. They also move failure handling into your infrastructure, so you have to be intentional.
Common webhook failure modes
- Slow handler causes retries: your endpoint times out, provider retries, you process twice.
- Transient outages: deploys, regional issues, DNS hiccups.
- Replay or spoofing attempts: an attacker (or a misconfigured internal system) replays old payloads.
- Downstream partial failure: you accept the webhook but crash while processing.
Webhook reliability checklist
Design your webhook consumer as a thin, idempotent ingestion layer:
- Verify authenticity before parsing. For example, verify a signature over the raw request body, then parse JSON.
- Ack fast, process async. Respond 2xx as soon as you have durably recorded the event (for example, queued it).
- Make processing idempotent. Dedupe using stable identifiers (more on this below).
- Add replay protection. Use a delivery identifier plus a timestamp tolerance window.
If you want a general reference for webhook operational patterns, Stripe’s webhook docs are a good baseline (even though this is not email-specific): webhook best practices.

Polling: simpler surface area, better recovery, easy to misuse
Polling is attractive because it is straightforward and debuggable. It is also easy to implement badly.
When polling is the right choice
- You cannot expose a public webhook endpoint.
- Your environment is short-lived (some CI runners).
- You need a deterministic “wait up to N seconds” primitive.
- You want a backup path when webhooks fail.
Polling reliability rules
A production-ready poller should have:
- A clear deadline (overall timeout), not an infinite loop.
- Backoff with jitter to avoid thundering herds.
- A cursor or “seen set” so repeated list calls do not reprocess old messages.
- A narrow matcher (filter by inbox, expected sender, subject intent, or correlation token), not “latest email wins.”
Polling also benefits from a “two clocks” approach:
- A per-request timeout (network, server).
- An overall workflow budget (for example, 60 seconds to receive the email).
If you need background context on email structure (why normalizing to JSON matters), the canonical format for email messages is defined in RFC 5322.
Webhook-first, polling fallback: the reliability sweet spot
The most reliable pattern for an email receive API consumer is:
- Use webhooks for low-latency arrival.
- Use polling to reconcile missed deliveries, delayed messages, or webhook outages.
- Feed both paths into the same dedupe and storage layer.
A practical workflow (provider-agnostic) looks like this:
1) Provision an isolated inbox resource
Return an object that includes both:
- the email address to send to
- the inbox identifier you will read from later
This prevents cross-test collisions and makes retries safe.
2) Start two waiters with one deadline
- Webhook listener records delivery events.
- Poll loop runs only until the deadline, then stops.
3) Select the intended message deterministically
Selection rules should be explicit and testable:
- correct inbox_id
- correlation token match (header, local-part encoding, or metadata)
- “received_at” within the attempt window
4) Extract the minimal artifact
For automation and agents, the artifact is usually:
- OTP (one-time passcode)
- verification URL (magic link)
Treat everything else as untrusted or unnecessary.
5) Expire the inbox
Short lifetimes reduce risk, reduce noise, and simplify debugging.
Dedupe: do it at multiple layers, not just once
Teams often add dedupe as a single “if we’ve seen this message_id, skip” check. That helps, but it does not cover webhook retries, resend flows, or multiple emails that contain the same OTP.
A reliable system dedupes at multiple layers.
| Layer | What can duplicate? | Example dedupe key | Why it matters |
|---|---|---|---|
| Delivery | Webhook call retries |
delivery_id (or a hash of raw request body) |
Prevents double-ingest when your endpoint is slow or down |
| Message | Same message stored/returned twice |
message_id (provider-stable), or normalized Message-ID
|
Prevents double-processing when listing/polling repeats |
| Artifact | Same OTP/link appears multiple times |
artifact_hash (for example, hash of OTP or URL) |
Prevents “verify twice” and flaky assertions |
| Attempt | Your workflow retries the whole step |
attempt_id (generated by you) |
Prevents retry storms from causing side effects |
A dedupe design that survives retries
You want idempotency to be enforced by your storage constraints, not by “best effort” code paths.
A simple approach:
- Store deliveries with a unique constraint on
delivery_id. - Store messages with a unique constraint on
(inbox_id, message_id). - Store artifacts with a unique constraint on
(attempt_id, artifact_hash).
Then your handler can be written as “upsert and continue,” instead of “if-else spaghetti.”
Dedupe pitfalls to avoid
- Using timestamps as primary keys. Two messages can share the same second, clocks drift, and ordering is not guaranteed.
- Deduping only by subject. Subjects are not unique and templates change.
- Assuming resend means new intent. Many systems resend the same link or code.
Observability: the metrics that actually catch email flakes
Email reliability issues often look like “it just didn’t arrive.” In practice, you need to instrument the pipeline to see where time was lost or duplicates were introduced.
Track these as first-class metrics:
- Time to first message: from inbox creation to matched message.
- Webhook delivery success rate: 2xx vs non-2xx responses.
- Webhook retry count: signals slow handlers or outages.
- Polling request count per attempt: signals webhook gaps or overly aggressive polling.
- Duplicate rate: how often dedupe rules fired at each layer.
And log stable identifiers (not raw email content):
| Field | Why log it |
|---|---|
attempt_id |
Correlates retries and test runs |
inbox_id |
Proves isolation and routing |
delivery_id |
Debugs webhook retries and replay |
message_id |
Debugs message duplication and ordering |
received_at |
Helps reason about windows and delays |
LLM agent reliability and safety: reduce the surface area
For LLM agents, reliability and security are connected. Email content is untrusted input, and it is a common carrier for prompt injection, malicious links, and confusing formatting.
Practical guardrails:
- Expose a minimized JSON view to the model (OTP or a single URL, plus a few trusted IDs).
-
Never require the model to “read HTML.” Prefer
text/plainextraction and deterministic parsing. - Constrain link handling. If you extract a verification URL, validate the host allowlist and forbid redirects.
- Prevent resend loops. Give the agent a strict retry budget and dedupe at the artifact layer.
Mailhook is explicitly built for this “email as a tool” model: disposable inboxes created via API, emails delivered as structured JSON, plus both webhooks and polling for deterministic waits. (See the exact contract in llms.txt.)
Webhooks vs polling: choosing defaults
You rarely need to choose only one. Still, it helps to set an opinionated default.
| Dimension | Webhooks | Polling |
|---|---|---|
| Latency | Best | Depends on interval |
| Operational complexity | Higher (public endpoint, verification) | Lower |
| Cost at scale | Usually lower | Can get expensive/noisy |
| Failure recovery | Needs replay/retry handling | Naturally re-reads state |
| Fit for CI | Great if reachable | Great if not reachable |
Recommendation for most teams: webhook-first, polling fallback, with a shared dedupe store.
Frequently Asked Questions
Do I really need dedupe if I use webhooks? Yes. Most webhook systems are at-least-once by design. Retries are normal, so your consumer must be idempotent.
What’s a reasonable timeout for waiting on a verification email? Use an overall deadline (often 30 to 90 seconds depending on your system and environment) and stop polling after that budget. Avoid fixed sleeps.
Can polling replace webhooks entirely? It can, especially in environments where webhooks are hard. But polling should still use cursors or “seen” tracking, backoff, and strict deadlines.
How should I verify webhook authenticity for an email receive API? Verify a signature over the raw request body, enforce a timestamp tolerance window, and dedupe by a delivery identifier to prevent replay.
How do I make email safe for LLM agents? Treat email as hostile input, do deterministic extraction (OTP or allowlisted URL), minimize what the model sees, and keep side-effect tools behind idempotency keys.
Build a reliability-first email receive pipeline with Mailhook
If you’re implementing an email receive API flow for agents, QA, or signup verification, the fastest path to reliability is starting with primitives that already match the event-stream reality.
Mailhook provides:
- Programmable disposable inboxes via API
- Emails delivered as structured JSON
- Real-time webhook notifications (with signed payloads)
- Polling endpoints for fallback and reconciliation
- Shared domains plus custom domain support
Get the canonical integration details from Mailhook’s llms.txt, then explore the platform at Mailhook.