Why do email duplicates happen even with reliable mail providers?

Duplicates occur due to at-least-once delivery behavior across multiple layers including your app, job queues, SMTP delivery, webhook delivery, polling consumers, and CI orchestration. This is normal distributed systems behavior that should be handled with idempotency.

What's the difference between duplicates and bot loops in email testing?

Duplicates are single events repeated, while bot loops are feedback cycles where automation repeatedly triggers the same action. Bot loops are more dangerous as they can create infinite retry cycles that consume resources and trigger rate limits.

How should I implement webhook idempotency for email testing?

Verify signed payloads, implement timestamp tolerance for replay protection, and store event IDs to prevent duplicate processing. Always return 2xx only after durable writes and handle the same event arriving multiple times gracefully.

What's the recommended approach for LLM agents handling signup emails?

Use constrained tools with explicit budgets rather than giving agents generic email parsing abilities. Implement tools like create_signup_attempt(), wait_for_signup_email(), extract_verification_artifact(), and redeem_artifact_once() with built-in safeguards.

Sign Up Email Testing: Stop Duplicates and Bot Loops

Signup flows look simple until you automate them. Then you discover a frustrating reality: the sign up email is the noisiest part of the pipeline. Messages arrive late, arrive twice, or arrive after your test already moved on. If you add LLM agents on top, you can also get “bot loops”, where an agent re-triggers signup or replays a verification link until rate limits or lockouts kick in.

This guide focuses on two reliability killers in sign up email testing:

Duplicates (same email event processed multiple times)
Bot loops (automation repeatedly triggering the same email, or repeatedly consuming the same email)

The goal is not “make the test pass once”, it is make the email step deterministic, idempotent, and safe to retry.

Why duplicates happen in sign up email testing (it is not just your mail provider)

Duplicates typically come from at-least-once behavior somewhere in the chain. It helps to name the layer, so you can dedupe at the right boundary.

Where duplicates are born	Common cause	What it looks like in tests	Best fix
Your app	“Resend verification email” triggered twice, retries without idempotency, double form submits	Two emails with different tokens	Add an idempotency key per signup attempt, enforce one active token
Your job queue	Worker retries without a dedupe key	Same template, same token sent twice	Make the send job idempotent (attempt_id)
SMTP delivery path	Greylisting, transient failures, upstream retries	Two near-identical messages, possibly same `Message-ID`	Deduplicate by a stable message identifier and artifact
Webhook delivery	Your endpoint times out, provider retries	Same message delivered multiple times	Verify signatures and implement webhook idempotency
Polling consumer	Cursor bugs, eventual consistency, fetching “latest” repeatedly	Same message processed on every poll	Use a cursor or store “seen message ids”
CI / agent orchestration	Test retries rerun the same logical attempt	More emails than expected, flaky assertions	Isolate inbox per attempt, correlate run ids

A key takeaway: you cannot reliably “prevent” duplicates in distributed systems. You can only design so duplicates are harmless.

Why bot loops happen (and why they are worse than duplicates)

A duplicate is one event repeated. A bot loop is a feedback cycle.

Common loops in signup automation:

Retry loop: the agent times out waiting for the email, retries the signup, triggering another email, then repeats.
Replay loop: the agent receives the verification email, clicks the magic link, gets an error, and clicks again indefinitely.
Parser loop: the agent fails to extract the OTP, asks for resend, and keeps accumulating emails while still reading the oldest one.
Webhook replay loop (security + reliability): if you do not verify signed webhook payloads (and timestamp / replay tolerance), a captured payload can be replayed and cause repeated processing.

The fix is to treat signup verification like a small state machine with budgets:

A single attempt id
A single inbox scope
A bounded wait
A single consume of the verification artifact
A hard stop when budgets are exceeded

A simple flow diagram showing a signup attempt creating a disposable inbox, triggering an email send, receiving an email event (webhook or polling), extracting a verification artifact once, and marking it consumed to prevent duplicates and retry loops.

The deterministic pattern: inbox-per-attempt plus idempotent consume

If you are still using shared inboxes (or plus-addressing into one mailbox), you are fighting the wrong battle. The clean pattern for sign up email testing is:

Create a fresh disposable inbox per signup attempt
Send the signup verification email to that address
Wait deterministically (webhook-first, polling fallback)
Extract a minimal artifact (OTP or URL)
Consume it exactly once

Mailhook is designed for this style of automation: you create disposable inboxes via API and receive inbound messages as structured JSON, delivered via real-time webhooks and/or retrieved via polling. For exact endpoints and payload fields, use the canonical reference at Mailhook llms.txt.

Dedupe correctly: pick the right keys (message id vs artifact id)

To stop duplicates, you need a stable key for “this email event” and a stable key for “this verification action”. They are not always the same.

Recommended dedupe keys

Dedupe scope	What you are preventing	Suggested key	Notes
Message-level	Processing the same email more than once	Provider message id (preferred), or normalized `Message-ID` header	RFC 5322 defines `Message-ID`, but it is not guaranteed unique in practice, treat as best-effort
Artifact-level	Clicking the same verification link twice, or reusing an OTP	Hash of extracted artifact (OTP value, token, or canonicalized URL)	Canonicalize URL (strip tracking params) before hashing
Attempt-level	Creating multiple “active” attempts that race	`attempt_id` you generate before sending email	Store this in your DB and logs
Webhook delivery	Running your webhook handler twice	`delivery_id` or message id from payload	Return 2xx only after durable write

If you can only implement one thing: artifact-level idempotency. Even if you receive three emails, only the first artifact should be consumed.

Webhooks: assume at-least-once delivery and build idempotency in

Webhook retries are normal, not exceptional. Providers retry when:

Your endpoint times out
You return a non-2xx
Your load balancer closes the connection

So your webhook handler must be:

Authenticated (verify signed payloads)
Replay-resistant (timestamp tolerance, nonce if available)
Idempotent (same event can arrive twice)

Mailhook supports signed payloads for security, which lets you verify that the webhook really came from Mailhook and was not altered. Follow the verification procedure described in llms.txt.

Minimal webhook handler shape (pseudocode)

handleWebhook(request):
  payload = request.body
  assert verify_signature(request.headers, payload)

  event_id = payload.event_id OR payload.message.id

  if db.exists("webhook_events", event_id):
    return 200

  db.insert("webhook_events", {event_id, received_at: now()})
  enqueue("process_message", {message_id: payload.message.id, inbox_id: payload.inbox.id})

  return 200

Design note: write the idempotency record first, then enqueue. If the enqueue fails, you can retry safely.

For general webhook retry behavior and signature verification patterns, Stripe’s webhook docs are a good reference model, even if you are not using Stripe: webhook best practices.

Polling: stop “latest message wins” bugs with cursors and time budgets

Polling is a perfectly valid fallback, but “fetch latest and parse” is a common source of duplicates and bot loops.

A safer polling contract:

Poll until a deadline
Filter narrowly (recipient + attempt correlation)
Track a cursor or store processed message ids
Select the first message that matches the attempt, not “whatever arrived most recently”

Minimal polling loop (pseudocode)

waitForSignupEmail(inbox_id, attempt_id, deadline):
  seen = set()

  while now() < deadline:
    messages = api.list_messages(inbox_id)

    for m in messages:
      if m.id in seen:
        continue
      seen.add(m.id)

      if not matches_attempt(m, attempt_id):
        continue

      artifact = extract_verification_artifact(m)
      return {message_id: m.id, artifact}

    sleep(backoff())

  throw Timeout("No matching signup email")

This single change, “remember what you already looked at”, prevents a surprising amount of flakiness.

Correlation: make the right email easy to identify

Duplicates get dangerous when you cannot tell which email belongs to which attempt.

Correlation options, from strongest to weakest:

Inbox isolation: one disposable inbox per attempt (best)
Explicit attempt token in the email content: include attempt_id in the template (works well for internal systems)
Custom header: add X-Correlation-Id: <attempt_id> when sending
Subject tags: helpful, but easiest to break with localization or template changes

If you control the sender, a custom header is usually the cleanest, because it avoids brittle HTML parsing. If you do not control the sender (third-party SaaS), inbox isolation and narrow matchers are your best tools.

For a deep dive on which headers are worth trusting, see the RFC that defines the message format: RFC 5322.

“Consume once” rules that stop replay loops

Once you extract a verification link or OTP, your automation must treat it like a one-time capability.

Implement these rules:

Store a consumed marker keyed by artifact_hash
Do not click or submit an OTP twice even if the UI says “try again”
If redemption fails, stop and surface a debuggable error (do not retry blindly)

A simple database table is enough:

Column	Purpose
`artifact_hash`	Idempotency key, prevents double-consume
`attempt_id`	Links consume back to the run
`consumed_at`	Debuggability and audit
`result`	Success, already_used, expired, invalid

This is how you turn a potentially unbounded loop into a finite workflow.

LLM agents: prevent “autonomous resend” behavior with tool constraints

LLM agents are great at improvising, which is exactly what you do not want in auth flows.

If an agent is allowed to:

trigger signup
request resend
read emails
click links

then a small parsing glitch can cause it to spam resend and produce a self-sustaining loop.

The fix is to give the agent constrained tools and explicit budgets:

create_signup_attempt() returns {attempt_id, email, inbox_id, expires_at}
wait_for_signup_email(attempt_id) returns a single message or timeout
extract_verification_artifact(message) returns a single URL or OTP
redeem_artifact_once(attempt_id, artifact) enforces idempotency and returns a final status

Do not give the agent a generic “open browser and click anything in the email HTML” instruction. Prefer text extraction from structured JSON fields, then validate the URL against an allowlist before any navigation.

Observability: log the identifiers that make duplicates explainable

When a signup test fails, you want to answer these questions in one minute:

Which attempt was this?
Which inbox was used?
How many messages arrived, and when?
Which message was selected?
Which artifact was extracted?
Was the artifact consumed before?

A practical logging schema:

attempt_id
inbox_id
message_id
artifact_hash
delivery_method (webhook or polling)
latency_ms (send to receive)

If you use Mailhook, you can build this without parsing raw MIME, because messages are delivered as structured JSON and can be processed deterministically (see llms.txt for the canonical contract).

A short checklist to stop duplicates and bot loops

Use this as a pre-merge gate for email-dependent signup tests:

Use inbox-per-attempt, not shared inboxes
Wait via webhook-first, keep polling as fallback
Implement webhook idempotency and verify signed payloads
Implement artifact-level consume-once semantics
Add budgets (max resends, max wait time, max redemption attempts)
Log attempt_id, inbox_id, message_id, and artifact_hash

Where Mailhook fits

If your current approach depends on scraping a shared mailbox UI or parsing unpredictable HTML emails, duplicates and loops are almost guaranteed over time.

Mailhook provides the primitives that make signup automation boring again:

Create disposable inboxes via API
Receive emails as structured JSON
Get real-time webhook notifications (with signed payloads)
Use polling as a fallback retrieval path
Scale with batch processing, shared domains, or custom domain support

To integrate against the real API semantics and payload fields, start with Mailhook llms.txt, then explore the product at Mailhook.