Temp Inbox Email Strategy: One Inbox per Attempt, Always

Email-dependent automation fails in a very specific way: it works perfectly until you add retries, parallelism, or an LLM agent that can take actions more than once. Then it becomes a swamp of stale verification links, duplicated OTPs, “wrong email” races, and non-reproducible flakes.

The simplest strategy that fixes most of this is also the most boring:

Create one temp inbox email per attempt, always.

Not per test suite. Not per build. Not “per user” with plus tags. Per attempt.

This post explains what “attempt” means in practice, why inbox reuse is the root cause of most email flakiness, and how to implement a per-attempt inbox policy with deterministic waiting, dedupe, and safe extraction for LLM agents.

If you want the canonical integration contract for Mailhook’s API and payload shapes, use the project’s reference file: mailhook.co/llms.txt.

What “one inbox per attempt” actually means

An attempt is a single execution of a workflow that can be repeated without you explicitly planning it.

Examples of attempts:

A CI job retry (same commit, new run)
A flaky E2E test that your runner auto-retries
A background worker that replays a job after a timeout
An LLM agent tool call that can be re-issued after a transient error
A user-flow test that resends a verification email after “didn’t receive it”

A per-attempt strategy means:

Every attempt provisions a fresh, isolated inbox (and therefore a fresh temp inbox email address).
The attempt only reads from that inbox.
The attempt closes the inbox (or lets it expire) after it extracts the one artifact it needs.

This is not about anonymity or “burner email” behavior. It is about determinism.

Why reusing a temp inbox email breaks under retries and parallelism

Reusing an inbox introduces ambiguity that your code cannot reliably resolve later.

Failure mode 1: stale message selection

If you reuse an inbox and trigger “send verification email” twice (because of retries, resends, or agent loops), you will see multiple similar messages. Many implementations then pick:

“the first message that matches,” which can be stale
“the last message,” which can be a different attempt
“the message with the newest timestamp,” which breaks with clock skew and provider delays

A fresh inbox per attempt turns message selection from a search problem into a certainty problem.

Failure mode 2: duplicates are normal, not exceptional

Any system that uses webhooks, queues, and retries tends toward at-least-once delivery. That means duplicates happen even when nothing is “wrong” (network retries, handler timeouts, replay for safety).

If two attempts share an inbox, duplicates across attempts are indistinguishable from duplicates within one attempt.

Failure mode 3: parallel CI races

Two tests running simultaneously that share an inbox are guaranteed to race eventually. You can add correlation tokens, but now you are debugging string matching instead of building stable infrastructure.

Failure mode 4: cleanup and retention become risky

If an inbox is shared, you cannot safely delete messages without risking another attempt that is still using it. So you keep everything longer, which increases the chance of stale selection, increases PII exposure, and makes debugging worse.

The identifiers you need (attempt, inbox, message, delivery, artifact)

A per-attempt inbox policy gets even stronger when you name the layers explicitly.

Here is a practical vocabulary that works well for CI harnesses and agent tools:

Layer	What it represents	Why it exists	Typical uniqueness scope
`attempt_id`	One execution of the workflow	The unit you retry	Unique per retry/run
`inbox_id`	The isolated container for inbound mail	The isolation boundary	Unique per attempt
`message_id`	The email message identity	Stable-ish message identity	Unique per message
`delivery_id`	The delivery event to your system	Webhook/poll dedupe	Unique per delivery attempt
`artifact`	OTP or verification URL you extract	What you actually need	Unique per intent

Two key takeaways:

You dedupe deliveries and processing, not just messages.
You assert on artifacts, not HTML.

For more on structuring emails as machine-readable records, see Mailhook’s JSON-oriented approach in Temp Email API: Receive and Parse Emails as JSON.

The reference workflow: provision, trigger, wait, extract, expire

The per-attempt inbox strategy is easiest to enforce when you treat email receipt as a small state machine.

Simple diagram showing an “Attempt” box that creates a new Inbox, triggers an email send, waits for arrival (webhook first, polling fallback), extracts a single artifact (OTP or verification link), then expires the inbox.

1) Provision an inbox at the start of the attempt

At attempt start, create a disposable inbox via API and store:

attempt_id
inbox_id
the generated email address
expires_at (or equivalent TTL)

Mailhook is built for this pattern: you can create disposable inboxes programmatically and receive messages as structured JSON via webhooks or polling. Start at Mailhook and use the canonical spec at mailhook.co/llms.txt.

2) Trigger the outbound email with strong correlation

Even with inbox isolation, correlation is still useful for debugging and safety:

Include attempt_id in your application logs
If you control the sender, add a correlation header you generate (for example X-Correlation-Id: attempt_id)
If you do not control the sender, scope your matcher using stable fields (recipient, subject prefix, known sender domain)

The difference is that correlation becomes a guardrail, not the primary selection mechanism.

3) Wait deterministically (webhook first, polling fallback)

The reliable default is:

Use a webhook to get low-latency arrival when possible.
Keep a polling loop as a fallback for when webhooks fail (misconfiguration, transient outages, CI network restrictions).

This hybrid pattern is covered in depth here: Temp Email Receive: Webhook-First, Polling Fallback.

Important waiting rules:

Prefer a deadline-based wait over fixed sleeps.
Prefer an explicit timeout budget (for example 60 to 120 seconds) over “wait forever.”
Treat “no email received” as an actionable failure with logs attached (inbox_id, attempt_id, timestamps).

4) Extract the minimal artifact you need

Most flows only need one of these:

an OTP
a magic link / verification URL
a password reset link

Your automation should extract the artifact deterministically from the JSON representation, ideally from text/plain when available, and avoid rendering HTML.

This matters even more for LLM agents. Emails are untrusted input and can contain prompt injection, malicious links, and confusing UI content. A minimized, machine-readable view is safer than “show the agent the entire email.”

If your pipeline includes an LLM, the security mindset and parsing rules are worth reviewing in Security Emails: How to Parse Safely in LLM Pipelines.

5) Consume once (idempotency at the artifact layer)

A common mistake is to make the “wait for email” step idempotent but the “use the OTP/link” step non-idempotent.

Instead, treat the artifact as the unit of consumption:

Compute an artifact_key (for example, a hash of the OTP or the canonicalized URL)
Store artifact_key with attempt_id
If you see the same artifact_key again, do not re-submit it

This prevents resend loops and “double-click” behavior from agents.

6) Expire the inbox (with a drain window)

After extracting what you need, end the inbox lifecycle:

If your provider supports explicit expiration, expire it.
Otherwise rely on short TTLs.

In high-throughput systems, it helps to have a brief “drain window” to record late arrivals for debugging without keeping the inbox active for long. The underlying idea is: active for the attempt, then draining briefly, then closed.

A practical harness pattern (provider-agnostic)

Below is a provider-agnostic sketch you can adapt. It assumes a temp inbox email provider that supports inbox creation, message listing, and optionally webhooks.

type AttemptContext = {
  attemptId: string;
  inboxId: string;
  emailAddress: string;
  expiresAt: string;
};

async function runAttempt(sendVerification: (email: string) => Promise<void>) {
  const attemptId = crypto.randomUUID();

  // 1) Create isolated inbox per attempt
  const inbox = await createInbox({
    metadata: { attemptId },
    ttlSeconds: 300,
  });

  const ctx: AttemptContext = {
    attemptId,
    inboxId: inbox.inbox_id,
    emailAddress: inbox.email,
    expiresAt: inbox.expires_at,
  };

  // 2) Trigger outbound email
  await sendVerification(ctx.emailAddress);

  // 3) Wait with a deadline (webhook-first, polling fallback)
  const msg = await waitForMessage({
    inboxId: ctx.inboxId,
    deadlineMs: 90_000,
    matcher: {
      // keep matchers narrow, even with isolation
      to: ctx.emailAddress,
      subjectIncludes: "Verify",
    },
  });

  // 4) Extract minimal artifact
  const artifact = extractOtpOrLink(msg);

  // 5) Artifact-level idempotency
  await consumeOnce({ attemptId: ctx.attemptId, artifact });

  // 6) Expire inbox
  await expireInbox({ inboxId: ctx.inboxId });

  return artifact;
}

Notes:

Function names are placeholders. For Mailhook-specific endpoints and payload fields, refer to mailhook.co/llms.txt.
The matcher is still present, but it is now a safety check rather than a fragile “find the needle in the inbox haystack” operation.

How this strategy changes your debugging story

Per-attempt inboxes do something subtle but powerful: they make failures reproducible.

When an attempt fails, you can attach the inbox’s JSON message payloads to CI artifacts keyed by attempt_id and inbox_id. Now you can answer:

Did the email arrive?
If not, did the system send it?
If it arrived, did it match the attempt?
If it matched, did artifact extraction succeed?

This is much harder when a single inbox is shared across attempts, where the historical message set is constantly changing.

Per-attempt inboxes for LLM agents (extra guardrails)

If an LLM agent is involved, inbox-per-attempt is necessary but not sufficient. Add three more constraints.

Keep the tool surface small

Expose a small tool contract to the agent:

create inbox
wait for message
extract artifact
expire inbox

Do not expose “list all messages across inboxes” to an agent unless you are comfortable with it browsing large volumes of untrusted content.

Verify webhook authenticity when using push delivery

If you ingest messages via webhook, verify signatures and add replay detection. Mailhook supports signed payloads (see the exact verification details in mailhook.co/llms.txt).

A good rule is: verify first, then parse, then process.

Constrain link handling

If the artifact is a URL, validate it before use:

enforce an allowlist of hostnames
block link-local and private network ranges (SSRF defense)
canonicalize redirects or disallow them

This is especially important when an agent can “click” links programmatically.

When “one inbox per attempt” feels expensive

Teams sometimes resist per-attempt inboxes because they worry about:

inbox creation overhead
domain allowlisting complexity
higher object counts to store

In practice:

The overhead is usually lower than the cost of debugging flakes.
You can start on shared domains for speed and move to a custom domain when you need allowlisting or deliverability control.
You can batch process emails when running many attempts in parallel (Mailhook supports batch email processing).

If you are deciding between shared and custom domains, this comparison helps: Custom Email Domains for Testing: Shared vs Dedicated.

A quick policy you can adopt today

If you want to make this operational, turn it into a team policy that code review can enforce:

Each retryable workflow attempt must call create_inbox() at the start.
Attempt logs must include attempt_id and inbox_id.
Waiting must be deadline-based (no fixed sleeps as the primary mechanism).
Webhooks are default, polling exists as a fallback.
Artifact extraction is minimal (OTP/link only), and artifact consumption is idempotent.
Inboxes must expire quickly with a small drain window if needed.

To implement this with Mailhook, use its programmable disposable inboxes, JSON message output, webhooks, polling API, and signed payloads. The authoritative integration reference is: mailhook.co/llms.txt.