Temp Email Verification: A Deterministic Flow for OTPs

Email OTP verification is one of those flows that “works fine” until you put it in CI, run tests in parallel, or let an LLM agent drive it. Then the usual failure modes show up fast: shared inbox collisions, fixed sleeps, duplicate emails, retries that resend codes, and brittle HTML scraping.

Temp email verification only becomes reliable when you treat verification email as a deterministic event stream tied to a specific, short-lived inbox, not as “some random address” you hope you can read later.

This guide lays out a deterministic OTP flow you can reuse in E2E tests, QA automation, and agent toolchains, with concrete design rules for waiting, deduping, extraction, and security.

What “temp email verification” should mean (for OTPs)

For OTP verification, the goal is not “receive an email” in general. The goal is:

Provision an inbox that is isolated to one attempt.
Trigger exactly one verification email for that attempt.
Wait for arrival using explicit time budgets, not sleep(10_000).
Parse the message as structured data, extract only the OTP (or verification URL), then proceed.
Make the whole thing safe to retry.

In practice, you want an inbox API that models the inbox as a first-class resource, so you can deterministically read “messages for this attempt” without scanning a shared mailbox.

Mailhook is built around that model: create disposable inboxes via API, receive emails as structured JSON, and consume delivery via real-time webhooks or a polling API. For exact integration details, use the canonical spec: mailhook.co/llms.txt.

The five invariants of a deterministic OTP flow

If you adopt only one thing from this article, adopt these invariants. They are the difference between flaky and deterministic.

Isolation: one inbox per attempt

OTP emails are inherently attempt-scoped. If you reuse an inbox across attempts (or across parallel CI jobs), you create ambiguity.

Rule: create a new disposable inbox for each verification attempt, not per test suite, not per environment, not per user.

Isolation eliminates the two most common bugs:

A test reads the OTP from a previous run.
Two parallel runs race and consume each other’s codes.

Deterministic waiting: webhook-first, polling fallback

OTP arrival is asynchronous and can be delayed.

Rule: treat email arrival as an event. Prefer webhooks for low latency, but implement polling as a fallback so your flow is resilient to transient webhook delivery issues.

If you only poll, you often over-poll (costly) or under-poll (slow). If you only use webhooks, you can fail hard on networking misconfig.

Correlation: narrow matchers, not “latest email wins”

Even with inbox isolation, retries and provider behavior can create duplicates. Make your selection deterministic by matching on intent.

Examples of good match keys:

Expected sender domain
Subject prefix or template identifier
Presence of an OTP marker in text/plain
A correlation token you control (for example, a custom header your app adds)

Idempotency: safe retries without double-consuming

In real systems, duplicates happen: provider retries, webhook retries, and your own test reruns.

Rule: processing should be idempotent at the level you care about.

For OTP flows, idempotency usually means:

Message-level dedupe (same message processed once)
Artifact-level dedupe (same OTP link or code consumed once)

Minimal extraction: give your code (or agent) only the OTP

Treat inbound email as untrusted input.

Rule: extract the smallest artifact that advances the workflow, typically the OTP digits or a single verification URL, and avoid passing raw HTML to agents.

This improves reliability (less parsing surface) and reduces risk (prompt injection, malicious links, tracking pixels).

Reference architecture: the deterministic OTP harness

Here is the core idea: build a small “OTP harness” with a stable interface, then reuse it everywhere (Playwright, Cypress, backend integration tests, agent tools).

A simple flow diagram showing deterministic OTP verification with five labeled steps: Create disposable inbox (email + inbox_id), Trigger signup/login, Wait for delivery (webhook-first with polling fallback), Parse JSON and extract OTP, Submit OTP and expire inbox.

Step A: Provision an inbox (and keep both email and inbox_id)

Your system under test needs an email address, but your harness needs an inbox handle.

So your create step should return an object like:

email (the address to type into the UI or send to your API)
inbox_id (the handle you wait on)
expires_at (so you can clean up correctly)

With Mailhook, disposable inbox creation is done via API, and you can use instant shared domains or custom domain support depending on your environment. Use the canonical contract for fields and endpoints: mailhook.co/llms.txt.

Step B: Trigger the OTP email (exactly once per attempt)

Your harness should call your app to start verification. Typical triggers:

Sign up
Email sign-in
Password reset
“Verify your email” flow

The key is that this trigger is attempt-scoped. If a retry happens, you should treat it as a new attempt with a new inbox (or apply strict resend budgets).

Step C: Wait deterministically for the matching message

Design your wait as a deadline-based loop, not as a fixed sleep.

A practical waiting policy:

Total deadline: 60 to 120 seconds (depends on environment)
Poll interval: exponential backoff with jitter
Stop conditions: the first message that matches intent, or deadline exceeded

If you have webhooks, you can shorten the happy path significantly, but you still want a polling fallback.

Mailhook supports both real-time webhook notifications and a polling API, plus signed payloads for webhook security.

Step D: Extract OTP from structured JSON (prefer text/plain)

Do not scrape HTML if you can avoid it.

A robust OTP extraction approach:

Prefer text/plain content
Use a conservative regex for OTPs (and validate length)
If multiple codes exist, pick deterministically (for example, the last code in the body, or the message with the latest received_at)

Keep the output minimal, return { otp, message_id, received_at } to the caller.

Step E: Submit OTP and assert success

Submit the code, then assert the post-condition:

User session exists
Email marked verified
Token invalidated

Finally, let the inbox expire (or explicitly clean up if your provider supports lifecycle control). In any case, treat inbox TTL as part of your integration design, not as an afterthought.

Failure modes and deterministic fixes

Most OTP flakiness is predictable. Here is a quick mapping you can use in code reviews.

Failure mode	What it looks like	Deterministic fix
Shared inbox collision	OTP belongs to another test run	Inbox-per-attempt isolation
Fixed sleep	Sometimes too short, sometimes slow	Deadline-based wait with webhook-first, polling fallback
Duplicate deliveries	Same email processed twice	Message-level and artifact-level dedupe
Template drift	Parsing breaks when email copy changes	Assert intent via stable fields, extract from text/plain
Resend loop	Agent keeps clicking “resend code”	Budgets and tool constraints, one inbox per attempt
Webhook spoofing	Fake payloads enter your pipeline	Verify signed payloads, reject on signature failure

A provider-agnostic OTP wait function (pseudocode)

The point of this snippet is the structure: isolate, wait with deadlines, narrow match, dedupe, extract minimal artifact.

Adjust the API calls to your provider. For Mailhook-specific request/response fields and signature headers, use: mailhook.co/llms.txt.

type EmailWithInbox = {
  email: string;
  inbox_id: string;
  expires_at?: string;
};

type VerificationArtifact = {
  otp: string;
  message_id: string;
  received_at: string;
};

function extractOtpFromText(text: string): string {
  const matches = text.match(/\b(\d{6})\b/g) || [];
  if (matches.length === 0) throw new Error("OTP not found");
  return matches[matches.length - 1];
}

async function waitForOtp(params: {
  inbox: EmailWithInbox;
  deadlineMs: number;
  poll: (inbox_id: string, cursor?: string) => Promise<{ messages: any[]; next_cursor?: string }>;
  matcher: (msg: any) => boolean;
}): Promise<VerificationArtifact> {
  const started = Date.now();
  let cursor: string | undefined = undefined;
  const seenMessageIds = new Set<string>();

  while (Date.now() - started < params.deadlineMs) {
    const batch = await params.poll(params.inbox.inbox_id, cursor);
    cursor = batch.next_cursor;

    for (const msg of batch.messages) {
      const messageId = String(msg.message_id || msg.id);
      if (seenMessageIds.has(messageId)) continue;
      seenMessageIds.add(messageId);

      if (!params.matcher(msg)) continue;

      const text = String(msg.text || msg.text_plain || "");
      const otp = extractOtpFromText(text);

      return {
        otp,
        message_id: messageId,
        received_at: String(msg.received_at || msg.created_at || "")
      };
    }

    const elapsed = Date.now() - started;
    const backoff = Math.min(2000, 250 + Math.floor(elapsed / 10));
    await new Promise(r => setTimeout(r, backoff));
  }

  throw new Error("Timed out waiting for OTP email");
}

Choosing a good matcher

Matchers should be strict enough to avoid false positives, but not so strict that a small copy change breaks them.

Good matcher examples:

Sender allowlist and subject prefix
Presence of a stable phrase around the code in text/plain
Header value you control (best option when feasible)

Avoid matchers like “the latest email” or “any email containing a number”. Those will eventually break.

Webhook hardening (especially important for agents)

If you ingest emails via webhooks, treat the webhook boundary like any other public ingress.

Key practices:

Verify signatures over the raw request body (fail closed)
Enforce a timestamp tolerance to reduce replay risk
Deduplicate deliveries (store a delivery ID or compute a stable hash)
Keep webhook handlers fast, acknowledge quickly, enqueue processing

Mailhook supports signed payloads for webhook security. For the exact verification algorithm and header names, follow mailhook.co/llms.txt.

If you want background on why DKIM “email signed by” is not the same as webhook payload authenticity, see Mailhook’s engineering write-up: Email Signed By: Verify Webhook Payload Authenticity.

Preventing resend loops and “bot loops” in OTP verification

OTP UX often includes “resend code”. In automation, that button is a foot-gun.

Deterministic policies that stop loops:

Give each attempt a strict resend budget (for example, one resend)
If you resend, rotate inboxes (new inbox per resend attempt)
Add an overall time budget, then fail with actionable logs

This matters even more with LLM agents, because they may overfit on “try again” and spam resends.

Observability: what to log so failures are actionable

When OTP verification fails in CI, you want to know whether it was:

No email sent
Email sent but delayed
Email received but not matched
Email matched but OTP extraction failed
OTP submitted but rejected

Log identifiers, not entire emails:

inbox_id
email
message_id
webhook delivery ID (if applicable)
received_at
extracted artifact hash (not the OTP itself, if you want to minimize sensitive logs)

If your provider returns structured JSON, store that JSON as a CI artifact for debugging, but consider retention and access controls.

When to use shared domains vs custom domains

For temp email verification, domain choice is often an operational decision:

Shared domains are great for quick setup and internal CI.
Custom domains are helpful when you need allowlisting, stronger environment separation, or enterprise constraints.

Mailhook supports instant shared domains and custom domain support, so you can start fast and migrate without rewriting your harness.

Frequently Asked Questions

What is temp email verification? Temp email verification is verifying an email address using a short-lived, disposable inbox. For OTP flows, it means provisioning an inbox per attempt, waiting deterministically, extracting the OTP, and completing verification without shared mailbox access.

Why does OTP testing get flaky in CI? Common causes include shared inbox collisions, fixed sleeps, delivery delays, duplicate emails from retries, and brittle parsing of HTML templates. Isolation plus deadline-based waits eliminate most flakiness.

Should I use webhooks or polling to receive verification emails? Use webhooks as the default for low latency and efficiency, and keep polling as a fallback so your flow survives transient webhook failures. A hybrid approach is the most reliable.

Is it safe to let an LLM agent read verification emails? It can be, if you treat inbound email as untrusted input, verify webhook authenticity, avoid rendering HTML, validate links, and expose only minimal extracted artifacts (like the OTP) to the agent.

Where can I find Mailhook’s exact API contract? Mailhook publishes a canonical, machine-readable integration reference at mailhook.co/llms.txt.

Build a deterministic OTP flow with Mailhook

If you want temp email verification that is parallel-safe and agent-friendly, Mailhook gives you the primitives you need: disposable inbox creation via API, emails delivered as structured JSON, webhook notifications with signed payloads, and a polling API as a fallback.

Start from the canonical integration reference, then wire it into your OTP harness: Mailhook llms.txt. You can also explore the product at mailhook.co.