Sign Up Verification Emails: Make Tests Retry-Safe

Q: Should I create one inbox per test run or one inbox per retry attempt?

For sign up verification, prefer one inbox per attempt. Retries are precisely where inbox collisions happen.

Email-based sign up verification tests tend to fail in the most annoying way: they are “mostly fine” until CI runs in parallel, a test retries, or your app resends a code and your harness accidentally consumes the wrong message.

To make these tests retry-safe, you need to assume two things are true:

Delivery is eventually consistent (the email arrives late sometimes).
Delivery and notifications are often at-least-once (duplicates happen, and retries are normal).

The goal is not to eliminate retries. It is to design your verification step so retries cannot corrupt the outcome.

This article focuses on sign up verification emails (OTP codes and verification links) and shows a practical harness design you can drop into CI and LLM-agent workflows.

If you are integrating Mailhook specifically, the canonical, machine-readable contract is here: mailhook.co/llms.txt.

Why sign up verification email tests are not retry-safe by default

Most “email verification” E2E tests start with a shared mailbox and a fixed sleep:

Generate an email address (often not truly isolated).
Trigger sign up.
sleep(10s).
Fetch “latest email” from a mailbox UI or IMAP.
Scrape HTML to find the link or OTP.

This fails under retries for a few predictable reasons:

1) Inbox collisions across parallel tests

If two attempts share the same inbox (or a catch-all with weak filtering), a retried test can pick up a message from a previous attempt, or from another worker.

2) Duplicate sends and duplicate notifications

Duplicates can originate from:

Your app retrying the email send job.
SMTP-level retries and delayed deliveries.
Webhook delivery retries (your endpoint returned a transient error).
Polling loops that re-read the same message.

If your harness treats “any message found” as success, retries can validate the wrong artifact.

3) Resend loops in automation (especially with agents)

A flaky wait causes the harness to click “Resend code”, which sends another email, which increases ambiguity, which triggers more resends. This can spiral into a bot loop.

4) Brittle parsing

Scraping HTML anchors or relying on a single regex for OTP extraction makes retries worse, because a template tweak turns into “resend” behavior.

The retry-safe contract: four invariants

Retry-safe sign up verification is mostly about enforcing a small set of invariants that stay true even if the test framework retries.

Invariant	What it means in practice	What it prevents
Isolation	Create a fresh, disposable inbox per attempt	Collisions across workers and retries
Deterministic waiting	Wait with a deadline, webhook-first when available, polling as fallback	Fixed sleeps, random timing failures
Strong correlation	Match only messages intended for this attempt	“Latest email wins” bugs
Idempotent consumption	Process the verification artifact exactly once (or safely multiple times)	Double-clicking links, double-submits, duplicate processing

A good mental model is: your test should behave correctly if it is restarted at any line.

A reference “retry-safe” flow for verification emails

Here is a reference flow that works whether you are using a disposable inbox API, your own inbound pipeline, or a provider.

A simple flow diagram showing retry-safe sign up verification: Create disposable inbox (attempt_id), Trigger sign up, Wait for email (webhook first, polling fallback), Select matching message, Extract OTP or verification link, Submit verification, Store JSON artifact, Expire inbox.

Step 1: Create an inbox per attempt (not per suite)

The unit of isolation should be the attempt, not the entire test file. In Playwright, for example, retries can re-run a single test. Treat each retry as a new attempt with a new inbox.

If your framework supports retries, configure them explicitly and design for it. (Playwright docs: test retries.)

Practical rule:

New attempt = new inbox = new email address

This one rule eliminates most collision bugs.

Step 2: Trigger sign up using that address

Use the generated email address when submitting your sign up form. Optionally add correlation on your side:

Put an attempt_id in the local-part if you control addressing.
Add an X-Correlation-Id header if you control the sending service.
Include an attempt token in the subject line if headers are not accessible.

Do not rely on “To:” matching alone if your flow can send multiple message types (welcome email, marketing email, verification email) to the same address.

Step 3: Wait deterministically (deadline-based, not sleep-based)

Use a deadline (for example 60 to 120 seconds) and a loop that waits for:

A matching verification email
With the expected sender
With a received timestamp after the inbox was created

If you have webhooks, they should be the fast path. Polling is the safety net.

Step 4: Extract the minimal artifact, then verify

Your harness should extract only what it needs:

OTP code, or
A single verification URL

Avoid giving a raw HTML email body to an LLM agent. Treat inbound email as untrusted input.

Step 5: Make consumption idempotent

Two separate idempotency layers matter:

Harness idempotency: Do not process the same message or artifact twice.
Application idempotency: Your verification endpoint should handle double-submits safely (common with retries and back button behavior).

Even if your app is perfect, your harness should still be defensive because duplicates are normal.

Dedupe keys: what to dedupe, and where

A common mistake is deduping on only one identifier. Retry-safe flows usually need multiple dedupe keys because duplicates can be introduced at multiple layers.

Layer	What can duplicate	Good dedupe key examples	Where to enforce
Delivery	Webhook delivery retries	`delivery_id` (provider) or signature timestamp + nonce	Webhook handler store (idempotent upsert)
Message	Same email stored or fetched twice	`message_id` (RFC Message-ID when available) or provider message UID	Message persistence and polling loop
Artifact	Same OTP or same verification link appears twice	`artifact_hash` (hash of OTP or URL)	Verification harness before submit
Attempt	Test retry runs the same scenario again	`attempt_id`	Test runner fixture + inbox-per-attempt

Design rule: dedupe as close to ingestion as possible, and also dedupe again right before consuming the artifact.

Resend logic that does not create bot loops

Retry safety is not just about reading email. It is also about what you do when you do not see it.

A safe resend policy:

Resend only when the wait hits a meaningful checkpoint (for example, 30 seconds with no matching message).
Use a hard budget (for example, max 1 or 2 resends).
Never resend because parsing failed. Parsing failures should fail fast and store the email JSON as an artifact.

This prevents a common failure mode: a template change breaks parsing, the harness “fixes” it by resending, and you get multiple emails that make matching even harder.

A minimal retry-safe harness (pseudocode)

Below is a provider-agnostic sketch. It assumes you can:

Create a disposable inbox
Wait for a matching message (webhook-first is ideal, polling fallback is fine)
Receive emails as structured JSON

type EmailWithInbox = {
  inbox_id: string;
  email: string;
  created_at: string; // ISO
  expires_at?: string; // ISO
};

type EmailMessage = {
  message_id?: string;
  received_at: string; // ISO
  from?: { address?: string };
  subject?: string;
  text?: string;
  html?: string;
};

async function runSignupVerificationAttempt(attemptId: string) {
  const inbox: EmailWithInbox = await createInbox({
    // pass metadata if your provider supports it
    metadata: { attempt_id: attemptId }
  });

  await triggerSignup({ email: inbox.email });

  const deadlineMs = Date.now() + 90_000;
  let resendBudget = 1;

  const seenMessageIds = new Set<string>();
  const seenArtifactHashes = new Set<string>();

  while (Date.now() < deadlineMs) {
    const msg: EmailMessage | null = await waitForNextMessage({
      inbox_id: inbox.inbox_id,
      timeout_ms: 10_000,
      match: {
        // keep matchers narrow
        subject_includes: "Verify",
        from_domain: "yourapp.example"
      }
    });

    if (!msg) {
      if (resendBudget > 0 && Date.now() + 30_000 > deadlineMs) {
        // optional: resend only once and only when time is getting tight
        await clickResendVerificationEmail();
        resendBudget -= 1;
      }
      continue;
    }

    const msgKey = msg.message_id ?? `${msg.received_at}:${msg.subject ?? ""}`;
    if (seenMessageIds.has(msgKey)) continue;
    seenMessageIds.add(msgKey);

    const artifact = extractVerificationArtifact(msg.text ?? "", msg.subject ?? "");
    const artifactHash = sha256(artifact);
    if (seenArtifactHashes.has(artifactHash)) continue;
    seenArtifactHashes.add(artifactHash);

    await submitVerificationArtifact(artifact);
    return;
  }

  throw new Error("Timed out waiting for verification email");
}

Notes:

This loop is retry-safe because it tolerates duplicates, avoids “latest email”, and limits resends.
The matcher should be as narrow as your product allows. If you can add a correlation header, do it.

Webhook-first is ideal, but polling can still be retry-safe

Webhooks reduce latency and avoid expensive polling loops, but they must be designed for retries:

Verify authenticity (signatures) before processing.
Make the webhook handler idempotent.
Acknowledge quickly, process async if needed.

If you poll, do it deterministically:

Use a cursor or “seen IDs” strategy.
Use exponential backoff.
Enforce an overall deadline.

If you want a deeper polling design, Mailhook’s engineering guidance on cursors, timeouts, and dedupe is a good reference: Pull Email with Polling: Cursors, Timeouts, and Dedupe.

Security guardrails (especially for LLM agents)

Retry safety often breaks when teams add agents that can “helpfully” take extra actions. Keep the tool interface constrained:

Prefer text/plain extraction over rendering HTML.
Validate verification URLs before visiting (allowlist hostname, block open redirects, avoid SSRF).
Treat inbound email as hostile input, including prompt injection attempts.
Verify webhook payload signatures when receiving email as JSON over HTTP.

For webhook authenticity in particular, it is worth separating “email authenticity” (DKIM) from “webhook payload authenticity” (signature over the raw request body). Mailhook covers this threat model here: Email Signed By: Verify Webhook Payload Authenticity.

How Mailhook helps make sign up verification retry-safe

Mailhook is built around the primitives that make retries boring:

Create disposable inboxes via API
Receive emails as structured JSON
Get real-time webhook notifications (with signed payloads)
Use a polling API as fallback
Choose shared domains for quick starts, or custom domains for control

If you are implementing the patterns in this post with Mailhook, start with the canonical integration reference: mailhook.co/llms.txt.

Frequently Asked Questions

What does “retry-safe” mean for sign up verification emails? It means your test can be retried (by the runner, CI, or your harness) without consuming the wrong email, double-verifying, or entering resend loops.

Should I create one inbox per test run or one inbox per retry attempt? For sign up verification, prefer one inbox per attempt. Retries are precisely where inbox collisions happen.

How do I handle duplicate verification emails? Deduplicate at multiple layers: delivery (webhook retries), message (same message read twice), and artifact (same OTP or link) before submitting.

Is polling acceptable in CI, or do I need webhooks? Polling is acceptable if it is deadline-based, deduped, and cursor-aware. Webhook-first plus polling fallback is typically the most reliable.

How can I keep LLM agents from doing unsafe things with email content? Give agents a constrained tool that returns only a minimal artifact (OTP or a validated URL), avoid raw HTML, and verify webhook signatures before processing.

Build retry-safe verification email tests with Mailhook

If your sign up verification tests are flaky because of shared inboxes, duplicates, or retries, Mailhook provides the core building blocks to make the email step deterministic: disposable inboxes via API, email-as-JSON, webhook notifications (with signed payloads), and polling fallback.

Get the exact integration contract here: Mailhook llms.txt, then start at mailhook.co when you are ready to wire it into your CI or agent toolchain.