Email-based sign up verification tests tend to fail in the most annoying way: they are “mostly fine” until CI runs in parallel, a test retries, or your app resends a code and your harness accidentally consumes the wrong message.
To make these tests retry-safe, you need to assume two things are true:
- Delivery is eventually consistent (the email arrives late sometimes).
- Delivery and notifications are often at-least-once (duplicates happen, and retries are normal).
The goal is not to eliminate retries. It is to design your verification step so retries cannot corrupt the outcome.
This article focuses on sign up verification emails (OTP codes and verification links) and shows a practical harness design you can drop into CI and LLM-agent workflows.
If you are integrating Mailhook specifically, the canonical, machine-readable contract is here: mailhook.co/llms.txt.
Why sign up verification email tests are not retry-safe by default
Most “email verification” E2E tests start with a shared mailbox and a fixed sleep:
- Generate an email address (often not truly isolated).
- Trigger sign up.
-
sleep(10s). - Fetch “latest email” from a mailbox UI or IMAP.
- Scrape HTML to find the link or OTP.
This fails under retries for a few predictable reasons:
1) Inbox collisions across parallel tests
If two attempts share the same inbox (or a catch-all with weak filtering), a retried test can pick up a message from a previous attempt, or from another worker.
2) Duplicate sends and duplicate notifications
Duplicates can originate from:
- Your app retrying the email send job.
- SMTP-level retries and delayed deliveries.
- Webhook delivery retries (your endpoint returned a transient error).
- Polling loops that re-read the same message.
If your harness treats “any message found” as success, retries can validate the wrong artifact.
3) Resend loops in automation (especially with agents)
A flaky wait causes the harness to click “Resend code”, which sends another email, which increases ambiguity, which triggers more resends. This can spiral into a bot loop.
4) Brittle parsing
Scraping HTML anchors or relying on a single regex for OTP extraction makes retries worse, because a template tweak turns into “resend” behavior.
The retry-safe contract: four invariants
Retry-safe sign up verification is mostly about enforcing a small set of invariants that stay true even if the test framework retries.
| Invariant | What it means in practice | What it prevents |
|---|---|---|
| Isolation | Create a fresh, disposable inbox per attempt | Collisions across workers and retries |
| Deterministic waiting | Wait with a deadline, webhook-first when available, polling as fallback | Fixed sleeps, random timing failures |
| Strong correlation | Match only messages intended for this attempt | “Latest email wins” bugs |
| Idempotent consumption | Process the verification artifact exactly once (or safely multiple times) | Double-clicking links, double-submits, duplicate processing |
A good mental model is: your test should behave correctly if it is restarted at any line.
A reference “retry-safe” flow for verification emails
Here is a reference flow that works whether you are using a disposable inbox API, your own inbound pipeline, or a provider.

Step 1: Create an inbox per attempt (not per suite)
The unit of isolation should be the attempt, not the entire test file. In Playwright, for example, retries can re-run a single test. Treat each retry as a new attempt with a new inbox.
If your framework supports retries, configure them explicitly and design for it. (Playwright docs: test retries.)
Practical rule:
- New attempt = new inbox = new email address
This one rule eliminates most collision bugs.
Step 2: Trigger sign up using that address
Use the generated email address when submitting your sign up form. Optionally add correlation on your side:
- Put an
attempt_idin the local-part if you control addressing. - Add an
X-Correlation-Idheader if you control the sending service. - Include an attempt token in the subject line if headers are not accessible.
Do not rely on “To:” matching alone if your flow can send multiple message types (welcome email, marketing email, verification email) to the same address.
Step 3: Wait deterministically (deadline-based, not sleep-based)
Use a deadline (for example 60 to 120 seconds) and a loop that waits for:
- A matching verification email
- With the expected sender
- With a received timestamp after the inbox was created
If you have webhooks, they should be the fast path. Polling is the safety net.
Step 4: Extract the minimal artifact, then verify
Your harness should extract only what it needs:
- OTP code, or
- A single verification URL
Avoid giving a raw HTML email body to an LLM agent. Treat inbound email as untrusted input.
Step 5: Make consumption idempotent
Two separate idempotency layers matter:
- Harness idempotency: Do not process the same message or artifact twice.
- Application idempotency: Your verification endpoint should handle double-submits safely (common with retries and back button behavior).
Even if your app is perfect, your harness should still be defensive because duplicates are normal.
Dedupe keys: what to dedupe, and where
A common mistake is deduping on only one identifier. Retry-safe flows usually need multiple dedupe keys because duplicates can be introduced at multiple layers.
| Layer | What can duplicate | Good dedupe key examples | Where to enforce |
|---|---|---|---|
| Delivery | Webhook delivery retries |
delivery_id (provider) or signature timestamp + nonce |
Webhook handler store (idempotent upsert) |
| Message | Same email stored or fetched twice |
message_id (RFC Message-ID when available) or provider message UID |
Message persistence and polling loop |
| Artifact | Same OTP or same verification link appears twice |
artifact_hash (hash of OTP or URL) |
Verification harness before submit |
| Attempt | Test retry runs the same scenario again | attempt_id |
Test runner fixture + inbox-per-attempt |
Design rule: dedupe as close to ingestion as possible, and also dedupe again right before consuming the artifact.
Resend logic that does not create bot loops
Retry safety is not just about reading email. It is also about what you do when you do not see it.
A safe resend policy:
- Resend only when the wait hits a meaningful checkpoint (for example, 30 seconds with no matching message).
- Use a hard budget (for example, max 1 or 2 resends).
- Never resend because parsing failed. Parsing failures should fail fast and store the email JSON as an artifact.
This prevents a common failure mode: a template change breaks parsing, the harness “fixes” it by resending, and you get multiple emails that make matching even harder.
A minimal retry-safe harness (pseudocode)
Below is a provider-agnostic sketch. It assumes you can:
- Create a disposable inbox
- Wait for a matching message (webhook-first is ideal, polling fallback is fine)
- Receive emails as structured JSON
type EmailWithInbox = {
inbox_id: string;
email: string;
created_at: string; // ISO
expires_at?: string; // ISO
};
type EmailMessage = {
message_id?: string;
received_at: string; // ISO
from?: { address?: string };
subject?: string;
text?: string;
html?: string;
};
async function runSignupVerificationAttempt(attemptId: string) {
const inbox: EmailWithInbox = await createInbox({
// pass metadata if your provider supports it
metadata: { attempt_id: attemptId }
});
await triggerSignup({ email: inbox.email });
const deadlineMs = Date.now() + 90_000;
let resendBudget = 1;
const seenMessageIds = new Set<string>();
const seenArtifactHashes = new Set<string>();
while (Date.now() < deadlineMs) {
const msg: EmailMessage | null = await waitForNextMessage({
inbox_id: inbox.inbox_id,
timeout_ms: 10_000,
match: {
// keep matchers narrow
subject_includes: "Verify",
from_domain: "yourapp.example"
}
});
if (!msg) {
if (resendBudget > 0 && Date.now() + 30_000 > deadlineMs) {
// optional: resend only once and only when time is getting tight
await clickResendVerificationEmail();
resendBudget -= 1;
}
continue;
}
const msgKey = msg.message_id ?? `${msg.received_at}:${msg.subject ?? ""}`;
if (seenMessageIds.has(msgKey)) continue;
seenMessageIds.add(msgKey);
const artifact = extractVerificationArtifact(msg.text ?? "", msg.subject ?? "");
const artifactHash = sha256(artifact);
if (seenArtifactHashes.has(artifactHash)) continue;
seenArtifactHashes.add(artifactHash);
await submitVerificationArtifact(artifact);
return;
}
throw new Error("Timed out waiting for verification email");
}
Notes:
- This loop is retry-safe because it tolerates duplicates, avoids “latest email”, and limits resends.
- The matcher should be as narrow as your product allows. If you can add a correlation header, do it.
Webhook-first is ideal, but polling can still be retry-safe
Webhooks reduce latency and avoid expensive polling loops, but they must be designed for retries:
- Verify authenticity (signatures) before processing.
- Make the webhook handler idempotent.
- Acknowledge quickly, process async if needed.
If you poll, do it deterministically:
- Use a cursor or “seen IDs” strategy.
- Use exponential backoff.
- Enforce an overall deadline.
If you want a deeper polling design, Mailhook’s engineering guidance on cursors, timeouts, and dedupe is a good reference: Pull Email with Polling: Cursors, Timeouts, and Dedupe.
Security guardrails (especially for LLM agents)
Retry safety often breaks when teams add agents that can “helpfully” take extra actions. Keep the tool interface constrained:
- Prefer
text/plainextraction over rendering HTML. - Validate verification URLs before visiting (allowlist hostname, block open redirects, avoid SSRF).
- Treat inbound email as hostile input, including prompt injection attempts.
- Verify webhook payload signatures when receiving email as JSON over HTTP.
For webhook authenticity in particular, it is worth separating “email authenticity” (DKIM) from “webhook payload authenticity” (signature over the raw request body). Mailhook covers this threat model here: Email Signed By: Verify Webhook Payload Authenticity.
How Mailhook helps make sign up verification retry-safe
Mailhook is built around the primitives that make retries boring:
- Create disposable inboxes via API
- Receive emails as structured JSON
- Get real-time webhook notifications (with signed payloads)
- Use a polling API as fallback
- Choose shared domains for quick starts, or custom domains for control
If you are implementing the patterns in this post with Mailhook, start with the canonical integration reference: mailhook.co/llms.txt.
Frequently Asked Questions
What does “retry-safe” mean for sign up verification emails? It means your test can be retried (by the runner, CI, or your harness) without consuming the wrong email, double-verifying, or entering resend loops.
Should I create one inbox per test run or one inbox per retry attempt? For sign up verification, prefer one inbox per attempt. Retries are precisely where inbox collisions happen.
How do I handle duplicate verification emails? Deduplicate at multiple layers: delivery (webhook retries), message (same message read twice), and artifact (same OTP or link) before submitting.
Is polling acceptable in CI, or do I need webhooks? Polling is acceptable if it is deadline-based, deduped, and cursor-aware. Webhook-first plus polling fallback is typically the most reliable.
How can I keep LLM agents from doing unsafe things with email content? Give agents a constrained tool that returns only a minimal artifact (OTP or a validated URL), avoid raw HTML, and verify webhook signatures before processing.
Build retry-safe verification email tests with Mailhook
If your sign up verification tests are flaky because of shared inboxes, duplicates, or retries, Mailhook provides the core building blocks to make the email step deterministic: disposable inboxes via API, email-as-JSON, webhook notifications (with signed payloads), and polling fallback.
Get the exact integration contract here: Mailhook llms.txt, then start at mailhook.co when you are ready to wire it into your CI or agent toolchain.