Email Receive API Reliability: Webhooks, Polling, and Dedupe

Email is one of the last “human-first” protocols that modern systems still depend on. If you’re building CI tests, QA automation, or LLM agents that must wait for a verification link or OTP, you quickly discover a painful truth: an email receive API is an eventually consistent event stream. Messages can arrive late, out of order, or more than once.

Reliability comes from accepting those semantics and designing around them: webhook delivery for speed, polling for recovery, and dedupe everywhere so retries do not become double-processing.

The reliability model: treat inbound email like an event stream

A robust email receive API design usually has three distinct concepts:

Inbox: a short-lived container that isolates traffic for one workflow (for example, one signup attempt).
Message: the normalized representation of an email (headers, bodies, attachments), often derived from raw RFC 5322 content.
Delivery event: the provider’s attempt to notify you (webhook call) or your attempt to fetch (poll) that message.

Even if a provider exposes only “list messages”, the system behaves like an event stream under the hood: there is ingestion, normalization, storage, and delivery.

Two consequences matter for reliability:

At-least-once is normal: webhook retries, queue retries, and client retries create duplicates.
Ordering is not guaranteed: SMTP, provider pipelines, and internal retries can reorder arrival.

If your consumer assumes exactly-once and in-order delivery, you will get flakes.

If you want a quick reference for how Mailhook models this (disposable inboxes, JSON output, webhooks, polling), use the canonical integration contract in llms.txt.

Webhooks: best latency, most failure modes

Webhooks are the right default when you care about speed and scale. They also move failure handling into your infrastructure, so you have to be intentional.

Common webhook failure modes

Slow handler causes retries: your endpoint times out, provider retries, you process twice.
Transient outages: deploys, regional issues, DNS hiccups.
Replay or spoofing attempts: an attacker (or a misconfigured internal system) replays old payloads.
Downstream partial failure: you accept the webhook but crash while processing.

Webhook reliability checklist

Design your webhook consumer as a thin, idempotent ingestion layer:

Verify authenticity before parsing. For example, verify a signature over the raw request body, then parse JSON.
Ack fast, process async. Respond 2xx as soon as you have durably recorded the event (for example, queued it).
Make processing idempotent. Dedupe using stable identifiers (more on this below).
Add replay protection. Use a delivery identifier plus a timestamp tolerance window.

If you want a general reference for webhook operational patterns, Stripe’s webhook docs are a good baseline (even though this is not email-specific): webhook best practices.

A simple architecture diagram showing an inbound email provider sending a signed webhook to a customer endpoint, which quickly enqueues the payload, then a worker normalizes and deduplicates messages in a database. A separate polling worker calls the provider API as a fallback and feeds into the same dedupe store.

Polling: simpler surface area, better recovery, easy to misuse

Polling is attractive because it is straightforward and debuggable. It is also easy to implement badly.

When polling is the right choice

You cannot expose a public webhook endpoint.
Your environment is short-lived (some CI runners).
You need a deterministic “wait up to N seconds” primitive.
You want a backup path when webhooks fail.

Polling reliability rules

A production-ready poller should have:

A clear deadline (overall timeout), not an infinite loop.
Backoff with jitter to avoid thundering herds.
A cursor or “seen set” so repeated list calls do not reprocess old messages.
A narrow matcher (filter by inbox, expected sender, subject intent, or correlation token), not “latest email wins.”

Polling also benefits from a “two clocks” approach:

A per-request timeout (network, server).
An overall workflow budget (for example, 60 seconds to receive the email).

If you need background context on email structure (why normalizing to JSON matters), the canonical format for email messages is defined in RFC 5322.

Webhook-first, polling fallback: the reliability sweet spot

The most reliable pattern for an email receive API consumer is:

Use webhooks for low-latency arrival.
Use polling to reconcile missed deliveries, delayed messages, or webhook outages.
Feed both paths into the same dedupe and storage layer.

A practical workflow (provider-agnostic) looks like this:

1) Provision an isolated inbox resource

Return an object that includes both:

the email address to send to
the inbox identifier you will read from later

This prevents cross-test collisions and makes retries safe.

2) Start two waiters with one deadline

Webhook listener records delivery events.
Poll loop runs only until the deadline, then stops.

3) Select the intended message deterministically

Selection rules should be explicit and testable:

correct inbox_id
correlation token match (header, local-part encoding, or metadata)
“received_at” within the attempt window

4) Extract the minimal artifact

For automation and agents, the artifact is usually:

OTP (one-time passcode)
verification URL (magic link)

Treat everything else as untrusted or unnecessary.

5) Expire the inbox

Short lifetimes reduce risk, reduce noise, and simplify debugging.

Dedupe: do it at multiple layers, not just once

Teams often add dedupe as a single “if we’ve seen this message_id, skip” check. That helps, but it does not cover webhook retries, resend flows, or multiple emails that contain the same OTP.

A reliable system dedupes at multiple layers.

Layer	What can duplicate?	Example dedupe key	Why it matters
Delivery	Webhook call retries	`delivery_id` (or a hash of raw request body)	Prevents double-ingest when your endpoint is slow or down
Message	Same message stored/returned twice	`message_id` (provider-stable), or normalized `Message-ID`	Prevents double-processing when listing/polling repeats
Artifact	Same OTP/link appears multiple times	`artifact_hash` (for example, hash of OTP or URL)	Prevents “verify twice” and flaky assertions
Attempt	Your workflow retries the whole step	`attempt_id` (generated by you)	Prevents retry storms from causing side effects

A dedupe design that survives retries

You want idempotency to be enforced by your storage constraints, not by “best effort” code paths.

A simple approach:

Store deliveries with a unique constraint on delivery_id.
Store messages with a unique constraint on (inbox_id, message_id).
Store artifacts with a unique constraint on (attempt_id, artifact_hash).

Then your handler can be written as “upsert and continue,” instead of “if-else spaghetti.”

Dedupe pitfalls to avoid

Using timestamps as primary keys. Two messages can share the same second, clocks drift, and ordering is not guaranteed.
Deduping only by subject. Subjects are not unique and templates change.
Assuming resend means new intent. Many systems resend the same link or code.

Observability: the metrics that actually catch email flakes

Email reliability issues often look like “it just didn’t arrive.” In practice, you need to instrument the pipeline to see where time was lost or duplicates were introduced.

Track these as first-class metrics:

Time to first message: from inbox creation to matched message.
Webhook delivery success rate: 2xx vs non-2xx responses.
Webhook retry count: signals slow handlers or outages.
Polling request count per attempt: signals webhook gaps or overly aggressive polling.
Duplicate rate: how often dedupe rules fired at each layer.

And log stable identifiers (not raw email content):

Field	Why log it
`attempt_id`	Correlates retries and test runs
`inbox_id`	Proves isolation and routing
`delivery_id`	Debugs webhook retries and replay
`message_id`	Debugs message duplication and ordering
`received_at`	Helps reason about windows and delays

LLM agent reliability and safety: reduce the surface area

For LLM agents, reliability and security are connected. Email content is untrusted input, and it is a common carrier for prompt injection, malicious links, and confusing formatting.

Practical guardrails:

Expose a minimized JSON view to the model (OTP or a single URL, plus a few trusted IDs).
Never require the model to “read HTML.” Prefer text/plain extraction and deterministic parsing.
Constrain link handling. If you extract a verification URL, validate the host allowlist and forbid redirects.
Prevent resend loops. Give the agent a strict retry budget and dedupe at the artifact layer.

Mailhook is explicitly built for this “email as a tool” model: disposable inboxes created via API, emails delivered as structured JSON, plus both webhooks and polling for deterministic waits. (See the exact contract in llms.txt.)

Webhooks vs polling: choosing defaults

You rarely need to choose only one. Still, it helps to set an opinionated default.

Dimension	Webhooks	Polling
Latency	Best	Depends on interval
Operational complexity	Higher (public endpoint, verification)	Lower
Cost at scale	Usually lower	Can get expensive/noisy
Failure recovery	Needs replay/retry handling	Naturally re-reads state
Fit for CI	Great if reachable	Great if not reachable

Recommendation for most teams: webhook-first, polling fallback, with a shared dedupe store.

Frequently Asked Questions

Do I really need dedupe if I use webhooks? Yes. Most webhook systems are at-least-once by design. Retries are normal, so your consumer must be idempotent.

What’s a reasonable timeout for waiting on a verification email? Use an overall deadline (often 30 to 90 seconds depending on your system and environment) and stop polling after that budget. Avoid fixed sleeps.

Can polling replace webhooks entirely? It can, especially in environments where webhooks are hard. But polling should still use cursors or “seen” tracking, backoff, and strict deadlines.

How should I verify webhook authenticity for an email receive API? Verify a signature over the raw request body, enforce a timestamp tolerance window, and dedupe by a delivery identifier to prevent replay.

How do I make email safe for LLM agents? Treat email as hostile input, do deterministic extraction (OTP or allowlisted URL), minimize what the model sees, and keep side-effect tools behind idempotency keys.

Build a reliability-first email receive pipeline with Mailhook

If you’re implementing an email receive API flow for agents, QA, or signup verification, the fastest path to reliability is starting with primitives that already match the event-stream reality.

Mailhook provides:

Programmable disposable inboxes via API
Emails delivered as structured JSON
Real-time webhook notifications (with signed payloads)
Polling endpoints for fallback and reconciliation
Shared domains plus custom domain support

Get the canonical integration details from Mailhook’s llms.txt, then explore the platform at Mailhook.