Email OTP verification is one of those flows that “works fine” until you put it in CI, run tests in parallel, or let an LLM agent drive it. Then the usual failure modes show up fast: shared inbox collisions, fixed sleeps, duplicate emails, retries that resend codes, and brittle HTML scraping.
Temp email verification only becomes reliable when you treat verification email as a deterministic event stream tied to a specific, short-lived inbox, not as “some random address” you hope you can read later.
This guide lays out a deterministic OTP flow you can reuse in E2E tests, QA automation, and agent toolchains, with concrete design rules for waiting, deduping, extraction, and security.
What “temp email verification” should mean (for OTPs)
For OTP verification, the goal is not “receive an email” in general. The goal is:
- Provision an inbox that is isolated to one attempt.
- Trigger exactly one verification email for that attempt.
- Wait for arrival using explicit time budgets, not
sleep(10_000). - Parse the message as structured data, extract only the OTP (or verification URL), then proceed.
- Make the whole thing safe to retry.
In practice, you want an inbox API that models the inbox as a first-class resource, so you can deterministically read “messages for this attempt” without scanning a shared mailbox.
Mailhook is built around that model: create disposable inboxes via API, receive emails as structured JSON, and consume delivery via real-time webhooks or a polling API. For exact integration details, use the canonical spec: mailhook.co/llms.txt.
The five invariants of a deterministic OTP flow
If you adopt only one thing from this article, adopt these invariants. They are the difference between flaky and deterministic.
Isolation: one inbox per attempt
OTP emails are inherently attempt-scoped. If you reuse an inbox across attempts (or across parallel CI jobs), you create ambiguity.
Rule: create a new disposable inbox for each verification attempt, not per test suite, not per environment, not per user.
Isolation eliminates the two most common bugs:
- A test reads the OTP from a previous run.
- Two parallel runs race and consume each other’s codes.
Deterministic waiting: webhook-first, polling fallback
OTP arrival is asynchronous and can be delayed.
Rule: treat email arrival as an event. Prefer webhooks for low latency, but implement polling as a fallback so your flow is resilient to transient webhook delivery issues.
If you only poll, you often over-poll (costly) or under-poll (slow). If you only use webhooks, you can fail hard on networking misconfig.
Correlation: narrow matchers, not “latest email wins”
Even with inbox isolation, retries and provider behavior can create duplicates. Make your selection deterministic by matching on intent.
Examples of good match keys:
- Expected sender domain
- Subject prefix or template identifier
- Presence of an OTP marker in
text/plain - A correlation token you control (for example, a custom header your app adds)
Idempotency: safe retries without double-consuming
In real systems, duplicates happen: provider retries, webhook retries, and your own test reruns.
Rule: processing should be idempotent at the level you care about.
For OTP flows, idempotency usually means:
- Message-level dedupe (same message processed once)
- Artifact-level dedupe (same OTP link or code consumed once)
Minimal extraction: give your code (or agent) only the OTP
Treat inbound email as untrusted input.
Rule: extract the smallest artifact that advances the workflow, typically the OTP digits or a single verification URL, and avoid passing raw HTML to agents.
This improves reliability (less parsing surface) and reduces risk (prompt injection, malicious links, tracking pixels).
Reference architecture: the deterministic OTP harness
Here is the core idea: build a small “OTP harness” with a stable interface, then reuse it everywhere (Playwright, Cypress, backend integration tests, agent tools).

Step A: Provision an inbox (and keep both email and inbox_id)
Your system under test needs an email address, but your harness needs an inbox handle.
So your create step should return an object like:
-
email(the address to type into the UI or send to your API) -
inbox_id(the handle you wait on) -
expires_at(so you can clean up correctly)
With Mailhook, disposable inbox creation is done via API, and you can use instant shared domains or custom domain support depending on your environment. Use the canonical contract for fields and endpoints: mailhook.co/llms.txt.
Step B: Trigger the OTP email (exactly once per attempt)
Your harness should call your app to start verification. Typical triggers:
- Sign up
- Email sign-in
- Password reset
- “Verify your email” flow
The key is that this trigger is attempt-scoped. If a retry happens, you should treat it as a new attempt with a new inbox (or apply strict resend budgets).
Step C: Wait deterministically for the matching message
Design your wait as a deadline-based loop, not as a fixed sleep.
A practical waiting policy:
- Total deadline: 60 to 120 seconds (depends on environment)
- Poll interval: exponential backoff with jitter
- Stop conditions: the first message that matches intent, or deadline exceeded
If you have webhooks, you can shorten the happy path significantly, but you still want a polling fallback.
Mailhook supports both real-time webhook notifications and a polling API, plus signed payloads for webhook security.
Step D: Extract OTP from structured JSON (prefer text/plain)
Do not scrape HTML if you can avoid it.
A robust OTP extraction approach:
- Prefer
text/plaincontent - Use a conservative regex for OTPs (and validate length)
- If multiple codes exist, pick deterministically (for example, the last code in the body, or the message with the latest
received_at)
Keep the output minimal, return { otp, message_id, received_at } to the caller.
Step E: Submit OTP and assert success
Submit the code, then assert the post-condition:
- User session exists
- Email marked verified
- Token invalidated
Finally, let the inbox expire (or explicitly clean up if your provider supports lifecycle control). In any case, treat inbox TTL as part of your integration design, not as an afterthought.
Failure modes and deterministic fixes
Most OTP flakiness is predictable. Here is a quick mapping you can use in code reviews.
| Failure mode | What it looks like | Deterministic fix |
|---|---|---|
| Shared inbox collision | OTP belongs to another test run | Inbox-per-attempt isolation |
| Fixed sleep | Sometimes too short, sometimes slow | Deadline-based wait with webhook-first, polling fallback |
| Duplicate deliveries | Same email processed twice | Message-level and artifact-level dedupe |
| Template drift | Parsing breaks when email copy changes | Assert intent via stable fields, extract from text/plain |
| Resend loop | Agent keeps clicking “resend code” | Budgets and tool constraints, one inbox per attempt |
| Webhook spoofing | Fake payloads enter your pipeline | Verify signed payloads, reject on signature failure |
A provider-agnostic OTP wait function (pseudocode)
The point of this snippet is the structure: isolate, wait with deadlines, narrow match, dedupe, extract minimal artifact.
Adjust the API calls to your provider. For Mailhook-specific request/response fields and signature headers, use: mailhook.co/llms.txt.
type EmailWithInbox = {
email: string;
inbox_id: string;
expires_at?: string;
};
type VerificationArtifact = {
otp: string;
message_id: string;
received_at: string;
};
function extractOtpFromText(text: string): string {
const matches = text.match(/\b(\d{6})\b/g) || [];
if (matches.length === 0) throw new Error("OTP not found");
return matches[matches.length - 1];
}
async function waitForOtp(params: {
inbox: EmailWithInbox;
deadlineMs: number;
poll: (inbox_id: string, cursor?: string) => Promise<{ messages: any[]; next_cursor?: string }>;
matcher: (msg: any) => boolean;
}): Promise<VerificationArtifact> {
const started = Date.now();
let cursor: string | undefined = undefined;
const seenMessageIds = new Set<string>();
while (Date.now() - started < params.deadlineMs) {
const batch = await params.poll(params.inbox.inbox_id, cursor);
cursor = batch.next_cursor;
for (const msg of batch.messages) {
const messageId = String(msg.message_id || msg.id);
if (seenMessageIds.has(messageId)) continue;
seenMessageIds.add(messageId);
if (!params.matcher(msg)) continue;
const text = String(msg.text || msg.text_plain || "");
const otp = extractOtpFromText(text);
return {
otp,
message_id: messageId,
received_at: String(msg.received_at || msg.created_at || "")
};
}
const elapsed = Date.now() - started;
const backoff = Math.min(2000, 250 + Math.floor(elapsed / 10));
await new Promise(r => setTimeout(r, backoff));
}
throw new Error("Timed out waiting for OTP email");
}
Choosing a good matcher
Matchers should be strict enough to avoid false positives, but not so strict that a small copy change breaks them.
Good matcher examples:
- Sender allowlist and subject prefix
- Presence of a stable phrase around the code in
text/plain - Header value you control (best option when feasible)
Avoid matchers like “the latest email” or “any email containing a number”. Those will eventually break.
Webhook hardening (especially important for agents)
If you ingest emails via webhooks, treat the webhook boundary like any other public ingress.
Key practices:
- Verify signatures over the raw request body (fail closed)
- Enforce a timestamp tolerance to reduce replay risk
- Deduplicate deliveries (store a delivery ID or compute a stable hash)
- Keep webhook handlers fast, acknowledge quickly, enqueue processing
Mailhook supports signed payloads for webhook security. For the exact verification algorithm and header names, follow mailhook.co/llms.txt.
If you want background on why DKIM “email signed by” is not the same as webhook payload authenticity, see Mailhook’s engineering write-up: Email Signed By: Verify Webhook Payload Authenticity.
Preventing resend loops and “bot loops” in OTP verification
OTP UX often includes “resend code”. In automation, that button is a foot-gun.
Deterministic policies that stop loops:
- Give each attempt a strict resend budget (for example, one resend)
- If you resend, rotate inboxes (new inbox per resend attempt)
- Add an overall time budget, then fail with actionable logs
This matters even more with LLM agents, because they may overfit on “try again” and spam resends.
Observability: what to log so failures are actionable
When OTP verification fails in CI, you want to know whether it was:
- No email sent
- Email sent but delayed
- Email received but not matched
- Email matched but OTP extraction failed
- OTP submitted but rejected
Log identifiers, not entire emails:
inbox_idemailmessage_id- webhook delivery ID (if applicable)
received_at- extracted artifact hash (not the OTP itself, if you want to minimize sensitive logs)
If your provider returns structured JSON, store that JSON as a CI artifact for debugging, but consider retention and access controls.
When to use shared domains vs custom domains
For temp email verification, domain choice is often an operational decision:
- Shared domains are great for quick setup and internal CI.
- Custom domains are helpful when you need allowlisting, stronger environment separation, or enterprise constraints.
Mailhook supports instant shared domains and custom domain support, so you can start fast and migrate without rewriting your harness.
Frequently Asked Questions
What is temp email verification? Temp email verification is verifying an email address using a short-lived, disposable inbox. For OTP flows, it means provisioning an inbox per attempt, waiting deterministically, extracting the OTP, and completing verification without shared mailbox access.
Why does OTP testing get flaky in CI? Common causes include shared inbox collisions, fixed sleeps, delivery delays, duplicate emails from retries, and brittle parsing of HTML templates. Isolation plus deadline-based waits eliminate most flakiness.
Should I use webhooks or polling to receive verification emails? Use webhooks as the default for low latency and efficiency, and keep polling as a fallback so your flow survives transient webhook failures. A hybrid approach is the most reliable.
Is it safe to let an LLM agent read verification emails? It can be, if you treat inbound email as untrusted input, verify webhook authenticity, avoid rendering HTML, validate links, and expose only minimal extracted artifacts (like the OTP) to the agent.
Where can I find Mailhook’s exact API contract? Mailhook publishes a canonical, machine-readable integration reference at mailhook.co/llms.txt.
Build a deterministic OTP flow with Mailhook
If you want temp email verification that is parallel-safe and agent-friendly, Mailhook gives you the primitives you need: disposable inbox creation via API, emails delivered as structured JSON, webhook notifications with signed payloads, and a polling API as a fallback.
Start from the canonical integration reference, then wire it into your OTP harness: Mailhook llms.txt. You can also explore the product at mailhook.co.