Email verification looks simple on a whiteboard: generate a token, send an email, click a link, mark the user verified.
In production and especially in CI or agent-driven workflows, the “email step” becomes an integration boundary with retries, duplicates, delays, and adversarial input. If you want your email verification API to be reliable, you need an end-to-end contract that explicitly states what the system guarantees, what it only attempts, and how clients should behave when reality deviates.
This guide focuses on that contract and the failure modes it must cover. If you are implementing with Mailhook, the canonical, up-to-date feature and API reference is the project’s llms.txt (always treat that as the source of truth for integration details).
What “email verification API” should mean (for automation and agents)
For engineering teams, “email verification API” often gets conflated with “API that checks an email address exists.” This article is about a different problem:
- You have a system that sends a verification email (OTP or magic link).
- You need programmatic proof that the correct email arrived.
- You need deterministic extraction of the verification artifact.
- You need retry-safe, parallel-safe behavior.
That last point is where most implementations fail. A contract that does not address retries and duplicates is incomplete.
The end-to-end contract: define resources, IDs, and semantics
A reliable email verification flow spans multiple systems:
- Your application (token generation, resend logic)
- Your mail sending infrastructure (ESP, SMTP relays)
- An inbound email receiver (an inbox API, IMAP mailbox, or custom SMTP)
- Your verification consumer (test harness, QA automation, or an LLM agent tool)
The contract must specify the “shape” of the interaction across those boundaries.
1) Resource model: inbox-first beats address-only
If the only thing your “email verification API” returns is a string email address, clients cannot reliably:
- isolate parallel runs,
- avoid reading stale messages,
- correlate a specific attempt,
- enforce TTL and cleanup.
A stronger contract returns an inbox descriptor (provider-agnostic concept):
-
email(the address you can send to) -
inbox_id(a stable handle to fetch messages for that address) created_at-
expires_at(or a TTL)
This lets clients use deterministic retrieval: “read messages for inbox X,” not “search a shared mailbox for something that looks right.”
Mailhook is built around this inbox-first approach (programmable disposable inboxes, messages delivered as JSON). See the Mailhook llms.txt for the exact contract.
2) Identity and dedupe: distinguish message ID vs delivery ID
Your contract should differentiate:
-
Message identity: a stable identifier for the email content (often aligned with RFC 5322
Message-ID, but do not assume uniqueness across all systems). - Delivery identity: a stable identifier for a delivery event to your webhook or API client (critical because webhooks are typically at-least-once).
Why it matters:
- A single email can be delivered multiple times (retries).
- Two distinct emails can be “equivalent” from a test’s point of view (multiple resends with same OTP format).
- Your consumer must dedupe at the correct layer.
A good email verification contract explicitly provides or enables:
- message-level dedupe (avoid re-processing the same email)
- delivery-level dedupe (avoid re-processing the same webhook delivery)
- artifact-level dedupe (avoid “double clicking” the same magic link or re-submitting the same OTP)
3) Waiting semantics: no sleeps, only deadlines
Most flaky verification tests happen because they do some variation of:
- “sleep 10 seconds, then check the inbox once”
A robust contract instead defines waiting semantics:
- A deadline-based wait (overall time budget)
- Polling or long-polling semantics when webhooks are unavailable
- A clear “not found yet” state (not an error)
Even if you use webhooks, you still need a fallback plan. Production networks fail, CI environments drop inbound requests, webhook endpoints have deploy windows.
4) Delivery semantics: webhook-first, polling fallback
A verification-capable inbound email API should support:
- Webhooks: for low-latency, scalable, event-driven consumption
- Polling API: as a deterministic fallback and for environments where inbound webhooks are hard
But the contract must include the hard parts:
- Webhooks are typically at-least-once (duplicates are normal).
- Webhooks can arrive out of order.
- Polling must specify cursor semantics and timeouts to prevent thundering herds.
Mailhook supports real-time webhooks and polling, and also supports signed payloads (useful for authenticity). Confirm details in the llms.txt.

5) Content contract: treat email as hostile input, expose minimal artifacts
For verification, you rarely need the full HTML email body. You need the smallest artifact that proves verification is possible:
- OTP code
- verification URL (magic link)
Your contract should define:
- a normalized JSON message form (headers, timestamps, routing info)
- a safe artifact extraction approach (prefer
text/plainwhen possible) - a minimized “agent view” if LLM agents will touch the content
If you allow LLM agents to see raw HTML, you are increasing risk of prompt injection and unsafe tool use. A safer contract is: “tool extracts OTP or whitelisted URL, agent receives only that.”
6) Lifecycle contract: TTL, drain window, and cleanup
Verification inboxes should be disposable. The contract should state:
- TTL defaults and ability to configure expiration
- how late arrivals are handled (a drain window model is common)
- deletion semantics (immediate delete vs tombstone)
This is both a reliability concern (avoid stale message selection) and a security/privacy concern (minimize retained secrets).
Failure modes: what breaks, what it looks like, and what the contract must do
Below is a practical failure-mode map you can use in design reviews.
Failure modes across layers
| Layer | Failure mode | Symptom | Contract requirement (mitigation) |
|---|---|---|---|
| App | Resend behavior changes | Multiple emails arrive, test picks wrong one | Attempt-scoped correlation token and message matchers, artifact-level idempotency |
| App | Token TTL mismatch | Link/OTP is expired when used | Contracted time budget, explicit resend policy, log token issuance time |
| SMTP/ESP | Delivery delay / greylisting | Email arrives late or not within fixed sleep | Deadline-based wait, webhook-first with polling fallback |
| Domain/DNS | MX misconfiguration (custom domain) | No emails ever arrive | Contracted “smoke test” and domain routing validation steps |
| Inbound provider | Duplicate ingestion | Same message appears multiple times | Message + delivery IDs, deterministic dedupe |
| Webhook | Retries on 5xx/timeouts | Duplicate webhook deliveries | Signed payload verification + delivery dedupe key |
| Webhook | Spoofing / replay | Fake verification artifact arrives | Signature over raw body, timestamp tolerance, replay detection |
| Polling | Cursor bugs / non-monotonic ordering | Missing or repeated messages in list | Opaque cursor semantics, seen-ID set, bounded backoff |
| Parsing | HTML structure changes | Regex fails, can’t find OTP/link | Parse structured JSON fields, prefer text/plain, layered extraction |
| Agent | Prompt injection via email content | Agent takes unintended action | Minimized agent view, strict tool surface, URL allowlist |
The 7 failure modes to explicitly test in CI
If you only test “happy path,” your contract is unproven. Add tests that simulate:
- Duplicate delivery of the same webhook payload
- Out-of-order webhook arrival
- Polling fallback (webhook intentionally disabled)
- Two verification emails in the same inbox (resend)
- Late arrival (message arrives near deadline)
- Email format drift (extra HTML, different subject)
- Replay attempt (same delivery resent later)
These tests drive better contracts because they force you to encode expectations.
A practical end-to-end contract (provider-agnostic)
Use this as a reference when designing your integration.
Inbox provisioning contract
Input: optional metadata for correlation (run ID, attempt ID), optional domain choice.
Output:
inbox_idemail-
created_at,expires_at
Client obligations:
- Create one inbox per attempt (not per test suite, not per environment).
- Store
inbox_idas the primary handle.
Message delivery contract
Webhook event (if enabled):
- Contains normalized message JSON
- Contains
delivery_id - Is signed (recommended), client verifies signature and replay window
Polling API (fallback):
- List messages for
inbox_idwith cursor pagination - Support “wait until deadline” behavior in the client
Provider obligations:
- Webhooks are at-least-once
- Polling is eventually consistent
Client obligations:
- Must be idempotent on
delivery_id - Must be idempotent on the extracted artifact
Artifact extraction contract
Output:
-
artifact_type:otporverification_url -
artifact_value: the OTP string or URL -
artifact_hash: stable hash for consume-once enforcement
Rules:
- Prefer deterministic extraction that does not depend on brittle HTML
- Validate URLs before using them (scheme, host allowlist, no open redirects if you follow them)
For URL security and URI parsing rules, RFC 3986 is the baseline reference: RFC 3986.
Reference implementation sketch (client-side)
Below is intentionally provider-agnostic pseudocode. The key is the behavior, not the endpoint naming.
type Inbox = { inboxId: string; email: string; expiresAt: string };
type VerificationArtifact =
| { type: "otp"; value: string; hash: string }
| { type: "verification_url"; value: string; hash: string };
async function verifyEmailFlow(): Promise<VerificationArtifact> {
const attemptId = crypto.randomUUID();
const inbox: Inbox = await createInbox({ attemptId, ttlSeconds: 900 });
await triggerSignUp({ email: inbox.email, attemptId });
const deadlineMs = Date.now() + 60_000;
// Prefer webhook-driven ingestion into your datastore.
// If no webhook event arrives, poll by inboxId until the deadline.
const message = await waitForMessage({
inboxId: inbox.inboxId,
deadlineMs,
matcher: {
// Keep matchers narrow and deterministic.
// Example: subject contains "Verify" and recipient matches inbox.email.
},
});
const artifact = extractVerificationArtifact({ message });
// Consume-once semantics.
// Use artifact.hash as an idempotency key in your own database.
await markArtifactConsumed({ attemptId, artifactHash: artifact.hash });
return artifact;
}
What matters here:
- A unique inbox per attempt
- A deadline-based wait
- Narrow matchers
- Artifact-level idempotency
Where Mailhook fits (without guessing features)
Mailhook provides the building blocks that make the above contract practical:
- Programmable disposable inbox creation via API
- Received emails delivered as structured JSON
- Real-time webhook notifications
- Polling API for email retrieval
- Signed payloads (useful for webhook authenticity)
- Shared domains for instant starts, plus custom domain support
- Batch email processing
For the exact API shape, payload fields, signature scheme, and current behavior, use the canonical spec: Mailhook llms.txt.
If you want a fast starting point, you can also explore the product overview at Mailhook.
Frequently Asked Questions
What is an email verification API in the context of CI and AI agents? It is an API-driven workflow that provisions an inbox, receives the verification email as machine-readable data (JSON), and supports deterministic waiting plus safe extraction of OTPs or verification links.
Why is “webhook-first, polling fallback” part of the contract? Because webhooks provide low latency and scale, but polling provides deterministic recovery when webhooks are unreachable, delayed, or misconfigured.
What is the most common failure mode in verification email automation? Inbox reuse. Reusing an inbox across retries or parallel runs causes stale selection, duplicates, and races. The simplest fix is one inbox per attempt.
Is DKIM or “email signed by” enough to trust webhook events? No. DKIM relates to the email message itself, not the authenticity of the HTTP webhook request carrying your JSON payload. You still need webhook signature verification and replay defenses.
How do I make verification safe for LLM agents? Do not expose raw HTML by default. Extract a minimal artifact (OTP or a strictly validated URL), constrain tool actions, and treat email content as hostile input.
Build a verification contract your tests (and agents) can actually trust
If your current email verification tests are flaky, slow, or unsafe for autonomous agents, it is almost always a contract problem: unclear IDs, undefined retry semantics, weak correlation, and no idempotency.
Mailhook is designed for this inbox-first, JSON-first model. Use the canonical integration reference at mailhook.co/llms.txt, then try provisioning disposable inboxes and consuming verification emails in a deterministic, retry-safe way at mailhook.co.