Use Disposable Inboxes to Isolate Parallel Test Runs

Parallel CI makes hidden shared state obvious. A signup test that passes locally can fail when eight workers run at once, not because the product is broken, but because every worker is fighting over the same mailbox. One worker reads another worker’s OTP, a retry consumes a stale magic link, or cleanup removes a message that another test has not processed yet.

The clean fix is to treat email as a per-run resource. Instead of sharing one QA inbox, create a disposable inbox for each parallel test run, or more safely, for each test attempt that expects email. Mailhook is built for this pattern: create disposable inboxes via API, receive messages as structured JSON, consume them through real-time webhooks or polling, and keep the email step deterministic for automation and LLM agents. For exact API semantics, use Mailhook’s llms.txt integration reference.

Why shared inboxes break under parallel test runs

Modern test runners are designed to parallelize. Playwright, for example, runs tests in worker processes, and CI systems routinely shard suites across machines. That is great for speed, but it exposes any resource that was quietly global.

Email is one of the worst global resources in a test suite because it is asynchronous, retried by infrastructure, sometimes delayed, and often parsed with loose matchers. A shared mailbox has no built-in ownership model. It can tell you that an email arrived, but not which worker owns it unless you add a reliable routing layer.

Common failure patterns include stale selection, duplicate delivery, wrong-recipient reads, retry collisions, and destructive cleanup. If an automated agent is involved, a shared mailbox also increases the chance that the agent sees unrelated, untrusted email content.

A top-down view of a CI pipeline split into several parallel lanes, with each lane connected to its own temporary email inbox and a separate JSON message event.

The isolation rule: one inbox for the smallest unit of ownership

A disposable inbox gives each parallel execution its own email address and inbox identifier. The practical rule is simple: if two executions can run at the same time or be retried independently, they should not share an inbox.

For critical email flows such as signup verification, password reset, OTP login, and magic-link sign-in, the safest unit is usually one inbox per test attempt. A retry is not the same attempt, because the first attempt may still receive late messages. Reusing the same inbox across retries reintroduces stale message selection.

Inbox scope	Good for	Main risk	Recommendation
Shared QA mailbox	Manual debugging only	Max collisions and stale reads	Avoid for parallel CI
Per CI job	Coarse smoke tests	Workers inside the job can still collide	Use only for non-email assertions
Per worker	Sequential worker-owned flows	Test order can leak state	Acceptable for controlled fixtures
Per test run	Independent tests without retries	Retry can read a prior attempt’s email	Good when retries are disabled
Per test attempt	OTPs, magic links, signup, agents	More inboxes to manage	Best default for reliability

This model changes your mental model from find the right message in a mailbox to read the message from the right inbox. That distinction is what makes parallel test runs predictable.

Pass an inbox descriptor, not just an email string

The email address is only one part of the contract. Your harness should carry an inbox descriptor through the flow so every wait, webhook, log, and assertion can reference the same resource.

Field	Why it matters in parallel CI
email	The address used by the app under test
inbox_id	The stable handle used to retrieve messages from the correct inbox
run_id	Links the inbox to the CI pipeline, shard, or test file
attempt_id	Separates retries from earlier attempts
created_at	Helps debug timing and late arrivals
state	Lets cleanup distinguish active, draining, and closed inboxes
correlation_token	Optional extra matcher when your app can echo a test token

Store this descriptor in the test context, not in a global variable. When a test fails, log identifiers like run_id, attempt_id, and inbox_id rather than dumping raw email bodies into CI logs.

A parallel-safe workflow for disposable inboxes

A reliable workflow has five phases. The important detail is that inbox creation happens before the application sends the email, and every later operation is scoped to that inbox.

Provision the inbox: Create a disposable inbox through your email API and store the returned email address and inbox identifier in the test context.
Trigger the application event: Use the unique email address in the signup, login, password reset, or invite flow under test.
Wait deterministically: Prefer a webhook signal for fast arrival, with a bounded polling fallback for resilience.
Assert on structured JSON: Match by inbox_id first, then sender, subject intent, text content, and extracted artifact.
Consume and clean up: Use the OTP or verification URL once, record the artifact as consumed, and expire or stop using the inbox.

Avoid fixed sleeps. A 10-second sleep can be too short on a slow CI day and too long on a fast one. A deadline-based wait with webhooks or polling is both faster and more reliable.

Provider-neutral implementation sketch

The exact API calls depend on your provider, so treat this as a harness shape rather than copy-paste code. For Mailhook endpoint details and payload fields, refer to the canonical llms.txt file.

async function runEmailVerificationFlow(testInfo) {
  const attemptId = [
    process.env.CI_PIPELINE_ID,
    testInfo.file,
    testInfo.title,
    `worker-${testInfo.workerIndex}`,
    `retry-${testInfo.retry}`
  ].join(':')

  const inbox = await mail.createInbox({
    label: attemptId
  })

  try {
    await app.signUp({
      email: inbox.email,
      name: `qa-${testInfo.workerIndex}`
    })

    const message = await waitForInboxMessage({
      inboxId: inbox.id,
      deadlineMs: 60000,
      match: (m) =>
        hasExpectedSender(m) &&
        mentionsVerificationIntent(m) &&
        containsVerificationArtifact(m)
    })

    const verificationUrl = extractVerificationUrl(message.text)
    await browser.goto(allowlisted(verificationUrl))
    await expectAccountVerified()
  } finally {
    await stopUsingInbox(inbox.id)
  }
}

Two details matter more than the syntax. First, waitForInboxMessage only reads from one inbox. Second, extraction uses the structured message payload, preferably the text body and derived artifacts, rather than scraping a rendered mailbox UI.

Webhook-first, polling fallback in parallel suites

Webhooks are the best default for parallel test runs because they avoid a thundering herd of workers polling every few seconds. A webhook can receive a JSON email event, verify it, route it by inbox_id, and wake the waiting test.

Polling is still useful as a fallback. Network issues, webhook endpoint outages, or local test environments may make push delivery unavailable. The reliable pattern is not webhook or polling. It is webhook first, polling with a deadline as the safety net.

Delivery mode	Best role	Parallelism concern	Guardrail
Webhook	Primary low-latency signal	Spoofing, replay, duplicate delivery	Verify signed payloads, dedupe, route by inbox_id
Polling	Fallback and local development	Excess load and repeated reads	Use per-inbox cursors, backoff, and an overall deadline
Batch processing	High-throughput suites	Overmatching across many messages	Partition by inbox_id and process idempotently

With Mailhook, you can receive email through real-time webhooks or a polling API, and signed payloads help verify webhook authenticity before processing. Your handler should acknowledge quickly, store the normalized JSON, and process assertions asynchronously when possible.

Dedupe is still required, even with isolated inboxes

Disposable inboxes remove cross-worker collisions, but they do not remove the need for idempotency. Email infrastructure and webhooks are commonly designed around at-least-once delivery, which means your code should tolerate duplicates.

Separate dedupe into three layers. At the delivery layer, dedupe webhook attempts so the same HTTP delivery is not processed twice. At the message layer, dedupe the same email if it is fetched by both webhook and polling. At the artifact layer, dedupe the same OTP or verification URL so it is consumed once.

Layer	Example key	What it prevents
Delivery	delivery_id or webhook event id	Reprocessing the same webhook attempt
Message	inbox_id plus message_id	Processing the same email from webhook and polling
Artifact	inbox_id plus artifact hash	Submitting the same OTP or link twice
Attempt	attempt_id plus artifact type	Retry confusion and stale verification

The goal is not to make duplicates impossible. The goal is to make duplicates harmless.

Match narrowly, but let isolation do most of the work

In a shared mailbox, teams often build complicated filters to identify the right email. They match subject lines, timestamps, sender addresses, body text, and sometimes brittle HTML selectors. This complexity is a symptom of missing isolation.

With a disposable inbox per attempt, the first and strongest filter is the inbox_id. After that, use narrow intent checks: expected sender, expected flow type, and the presence of a valid artifact. If your application can include a correlation token in the email body, subject, or metadata, use it as an additional guard, but do not depend on it as the only isolation mechanism.

For OTP extraction, prefer text/plain content when available. For magic links, validate the destination host against an allowlist before navigation. An email is untrusted input, even in test automation.

Make failures easy to debug in CI

Isolated inboxes also improve observability. When a parallel run fails, you should be able to answer four questions quickly: which inbox was used, whether the app sent the email, whether the email arrived, and which artifact was extracted.

A useful CI artifact is a redacted JSON record containing the inbox_id, attempt_id, received_at timestamp, sender, subject, message identifiers, and extraction result. Avoid storing full raw HTML or secrets unless your retention and access controls are designed for that data.

Symptom	Likely cause	What to log
Timeout waiting for email	App did not send, routing failed, or provider delay	inbox_id, email, attempt_id, deadline, send timestamp
Wrong OTP submitted	Reused inbox or stale artifact	attempt_id, message_id, artifact hash, consumed_at
Duplicate webhook processing	Missing idempotency key	delivery id, message id, handler status
Works locally, fails in CI	Parallelism or fixed sleeps	worker id, retry index, timing between trigger and receive
Agent took unsafe action	Raw email exposed to model	minimized view, selected artifact, URL validation result

These logs turn email failures from vague flakes into actionable defects.

LLM agents need inbox isolation even more than tests do

For LLM agents, shared inboxes are not just flaky. They expand the agent’s input surface. If an agent can read unrelated emails, it may see stale instructions, malicious content, or secrets that were never meant for the current task.

Use disposable inboxes as a tool boundary. A safe agent-facing interface can be small: create an inbox, wait for a message in that inbox, extract a typed artifact, and close the inbox. The agent should receive the minimum useful result, such as an OTP or allowlisted verification URL, not an entire mailbox.

Mailhook’s structured JSON email output fits this model because your orchestration layer can filter and minimize the message before exposing anything to the model. Signed webhooks and polling fallback help keep the ingestion layer reliable without asking the agent to reason about mailbox state.

Where Mailhook fits

Mailhook provides programmable, disposable email inboxes through a RESTful API. Received emails can be delivered as structured JSON, pushed through real-time webhooks, or retrieved through polling. The platform also supports instant shared domains, custom domain support, signed payloads for security, and batch email processing.

For parallel test runs, those primitives map directly to the isolation pattern:

Create a disposable inbox for each test attempt that needs email.
Use the returned address in the app under test and keep the inbox_id in your test context.
Receive email as JSON through webhooks, with polling as a fallback.
Verify signed webhook payloads before processing.
Dedupe messages and extracted artifacts so retries are safe.
Use shared domains for quick setup or custom domains when you need allowlisting, governance, or environment separation.

If you are designing an agent or test harness, keep the integration provider-neutral at the interface level, then use Mailhook’s llms.txt as the implementation reference for Mailhook-specific details.

Frequently Asked Questions

Is one disposable inbox per CI worker enough? Sometimes, but it is not the safest default. If a worker runs multiple email-dependent tests or retries a failed test, messages can still collide. One inbox per test attempt gives the strongest isolation.

Should I use disposable inboxes for every test? No. Use reserved non-routable domains for validation-only unit tests that should never receive mail. Use disposable inboxes for end-to-end flows where the application must send and your test must receive a real email.

Are webhooks better than polling for parallel test runs? Webhooks are usually better as the primary path because they reduce latency and avoid many workers polling at once. Polling is still valuable as a fallback when webhook delivery is unavailable or delayed.

How do custom domains help with isolated test runs? Custom domains can help when your application or identity provider requires allowlisted domains, when you need environment-specific routing, or when you want stronger governance. For fast setup, shared domains are often enough.

Can LLM agents safely use disposable inboxes? Yes, if the inbox is isolated per task, webhook payloads are verified, message content is treated as untrusted, and the agent only receives a minimized artifact such as an OTP or validated link.

Make parallel email tests boring

Email-dependent tests should not require a human mailbox, global cleanup scripts, or lucky timing. By creating disposable inboxes for each parallel test attempt, you isolate state, simplify matching, and make CI failures easier to explain.

Start with Mailhook to create disposable inboxes via API and receive emails as JSON, with webhooks, polling, signed payloads, shared domains, and custom domain support available for automation workflows. For implementation details, keep the Mailhook llms.txt reference close to your test harness.