How to Route Emails to the Right Test Inbox

Email routing failures rarely look like routing failures at first. A signup test times out. A password reset flow reads yesterday’s OTP. An LLM agent follows the wrong magic link. In many of these cases, the email did arrive, but the automation could not prove which test inbox owned it.

The fix is to treat routing as a first-class contract. A test email should not disappear into a shared mailbox where code searches by subject line and timestamp. It should be addressed to a specific, disposable inbox resource, delivered as structured data, matched with narrow rules, and consumed once.

For CI, QA automation, and agent workflows, the rule is simple: route by inbox resource first, then match message content second.

Define the right test inbox before you send anything

A test inbox is not just an email address. The address is the public delivery coordinate. The inbox is the resource your test harness or agent can read from.

A reliable harness should create or reserve the inbox before triggering the product action that sends the email. Store the returned descriptor in your test context, then pass the email address into the system under test. From that point onward, every wait, webhook handler, poller, matcher, and assertion should use the inbox identity as the primary key.

Field	Why it matters
`run_id`	Ties the inbox to one CI run, job, or agent session
`test_id`	Identifies the scenario, such as signup, login, or password reset
`attempt_id`	Separates retries so stale emails cannot satisfy new attempts
`inbox_id`	The stable resource your automation reads from
`email`	The routable address used by the application under test
`created_at` and `expires_at`	Bound message selection and cleanup windows

If your test only stores the email string, you will eventually rebuild inbox identity with fragile clues: local-part patterns, timestamps, subject lines, or sender addresses. Those clues are useful as secondary matchers, not as the routing foundation.

Understand the three layers of email routing

Email routing has multiple steps, and each step can fail independently. The SMTP layer delivers to the envelope recipient, which may differ from the visible To header. The distinction is part of SMTP itself, as described in RFC 5321. Automation that routes only from visible headers can misclassify forwarded, BCC, alias, or rewritten messages.

Layer	Routing question	Common failure	Guardrail
Domain routing	Does this domain accept mail for testing?	MX records point to the wrong provider, or the sender blocks the domain	Use a verified shared domain or a correctly configured custom subdomain
Recipient mapping	Which inbox owns this recipient address?	Two tests share an address, or a catch-all swallows a typo	Create one disposable inbox per attempt or use strict recipient keys
Delivery to code	Which consumer receives the message event or JSON record?	A webhook handler routes by header instead of inbox ID	Verify payloads, dedupe deliveries, and enqueue by `inbox_id`

A correct route means all three layers agree: the domain accepts the email, the recipient maps to the intended inbox, and your automation consumes the message from that inbox only.

Use one inbox per test attempt

The most dependable pattern is one inbox per attempt. An attempt is a single execution of one email-dependent step. If CI retries the step, create a new inbox. If an LLM agent restarts a verification action, create a new inbox. If the same test runs in 20 parallel workers, each worker gets its own inbox.

This avoids the two hardest routing bugs: stale selection and parallel races. Stale selection happens when a new attempt accidentally reads an old email. Parallel races happen when two tests share a mailbox and one test consumes the other’s message.

Here is provider-agnostic pseudocode for the pattern. Adapt the exact request and response fields to your inbox provider’s API contract. For Mailhook, use the canonical Mailhook llms.txt integration reference for current endpoint details.

async function runSignupAttempt(ctx) {
  const inbox = await testInboxProvider.createInbox()

  recordInboxDescriptor({
    runId: ctx.runId,
    testId: 'signup-verification',
    attemptId: ctx.attemptId,
    inboxId: inbox.inbox_id,
    email: inbox.email,
    createdAt: inbox.created_at,
    expiresAt: inbox.expires_at
  })

  await app.signup({ email: inbox.email })

  const message = await waitForEmail({
    inboxId: inbox.inbox_id,
    deadlineMs: 90000,
    match: msg => isExpectedSignupEmail(msg, ctx)
  })

  if (!message) {
    throw new Error('verification email not received')
  }

  const artifact = extractVerificationArtifact(message)
  await consumeArtifactOnce({ attemptId: ctx.attemptId, artifact })

  return artifact
}

Notice what the code does not do. It does not search a shared mailbox for the newest matching subject. It does not assume the To header is enough. It does not let a retry reuse the previous inbox.

Choose the right recipient mapping pattern

There are several ways to map an incoming recipient address to a logical test inbox. The best choice depends on scale, domain control, and compatibility with the system you are testing.

Pattern	Best use case	Routing strength	Risk to manage
API-created disposable inbox	CI, QA automation, LLM agents, signup verification	High, because the provider creates an inbox and address together	Store the descriptor and expire it deliberately
Encoded local-part	Custom-domain automation where the recipient can contain a compact routing key	High when the encoding is strict and collision-resistant	Keep length, allowed characters, and normalization rules conservative
Alias table	Legacy systems that require predictable recipient names	Medium to high, depending on table consistency	Alias lifecycle and cleanup must stay in sync with tests
Constrained catch-all	Exploratory testing or migrations from older routing designs	Medium, useful as a fallback	Typos and unexpected recipients can enter the pipeline
Plus-addressing on a shared mailbox	Lightweight manual testing	Low for parallel CI	Many providers normalize or expose plus tags differently, and messages are not isolated

For automated test inboxes, API-created disposable inboxes should be the default. Encoded local-parts and aliases are useful when you operate a custom domain and need address shapes that match product or vendor constraints. Catch-all routing should be constrained with strict local-part validation and aggressive logging.

If you want a deeper comparison of keys, aliases, and catch-all routing, see Mailhook’s guide to routing emails to inboxes.

Match messages inside the inbox, not across all mail

After routing gets the email into the correct inbox, your matcher still needs to pick the correct message. This matters because senders retry, users click resend, SMTP can duplicate deliveries, and webhook systems often use at-least-once delivery.

Use a layered matcher. Start with the inbox, then narrow by recipient and intent. Only then inspect body text or artifacts.

Signal	Trust level	Use it for
`inbox_id`	Highest	Primary routing boundary
Provider delivery ID	High	Webhook dedupe and replay protection
Provider message ID	High	Message-level idempotency
Envelope recipient	High when exposed by provider	Confirming actual SMTP recipient
Visible `To` header	Medium to low	Debugging and secondary confirmation
Sender domain	Medium	Intent matching after inbox isolation
Subject line	Medium to low	Template hint, not the primary key
OTP or magic link	High after extraction	Final artifact, consumed once

A good matcher usually asks: is this message in the expected inbox, received after the attempt started, from the expected sender or flow, and does it contain the expected artifact type? A bad matcher asks: is this the latest email with a subject containing verify?

Route webhook events by inbox ID

Webhooks are ideal for low-latency email testing because your code is notified as soon as the message arrives. They also introduce a new routing surface: your webhook receiver.

A safe webhook receiver should verify the signed payload before processing, dedupe repeated deliveries, and enqueue work by inbox_id. It should not trust unverified headers, and it should not route by visible To unless that value has already been normalized and tied to the provider’s recipient mapping.

The usual flow is straightforward. Receive the raw HTTP request, verify the signature over the raw body, reject stale timestamps if your provider includes them, record the delivery ID for replay protection, then enqueue a small job keyed by inbox ID. Acknowledge quickly and process the JSON message asynchronously.

Mailhook supports real-time webhook notifications and signed payloads, which makes this pattern practical for CI runners and agent backends. If your environment cannot accept inbound webhooks, use a polling API as a fallback with the same inbox ID and matcher rules. For webhook hardening details, read the Mailhook guide on verifying webhook payload authenticity.

Decide whether shared or custom domains change the route

Domain choice affects the first routing layer, not the whole test design.

Shared provider domains are useful when you want to start quickly. The provider gives you routable addresses, and your test harness can focus on creating inboxes, receiving JSON, and asserting artifacts.

Custom domains or subdomains are useful when your organization needs allowlisting, environment separation, domain-level auditability, or compatibility with third-party systems that distrust generic temp domains. A common layout is one subdomain per environment, such as ci.example.com or staging-mail.example.com, with MX records pointed to the inbound provider.

Regardless of domain strategy, keep domain selection in configuration. Your agent or test should not need different routing logic for shared versus custom domains. It should still create or select an inbox, use the returned email address, wait on the inbox ID, and consume the artifact once.

Mailhook supports instant shared domains and custom domain support, so teams can start with shared domains and move specific suites or integrations to owned domains when operational needs appear.

Common misroutes and deterministic fixes

When email tests flake, inspect routing before adding longer sleeps. Longer sleeps hide races. They do not fix ownership.

Symptom	Likely cause	Deterministic fix
Test reads an old OTP	Inbox reused across attempts	Create a fresh inbox per retry and reject messages before `created_at`
Two parallel tests read the same email	Shared mailbox or shared recipient	Use one disposable inbox per worker, test, or attempt
Email appears in logs but not in the expected inbox	Header recipient differs from envelope recipient	Route by provider recipient mapping and inspect envelope data if available
Webhook processes the same message twice	At-least-once delivery or retry	Dedupe by delivery ID and message ID
Catch-all domain captures unexpected addresses	Typo or overly broad local-part policy	Enforce a strict recipient grammar and log rejects
LLM agent clicks the wrong link	Agent saw too much raw email or multiple candidates	Expose only a minimized artifact view and require inbox-scoped selection
Custom domain works locally but not in CI	Environment uses a different domain or stale DNS	Treat domain as config and add a domain smoke test

A useful debugging log includes run_id, test_id, attempt_id, inbox_id, recipient, provider message ID, delivery ID, sender, received timestamp, and artifact hash. It should not include full OTPs, secrets, or unnecessary raw body content unless your retention policy explicitly allows it.

Keep LLM agents out of the routing business

LLM agents should not be asked to browse a mailbox and decide which message looks right. That is a prompt problem waiting to become a security problem.

Give the agent small deterministic tools instead:

create_test_inbox returns an email address and inbox ID.
wait_for_message accepts an inbox ID, intent, and deadline.
extract_verification_artifact returns only the OTP or approved URL.
expire_test_inbox closes or schedules cleanup after the attempt.

The model can decide when to request an inbox or submit a code, but your tool layer should enforce routing, deadlines, dedupe, signature verification, and URL validation. Inbound email is untrusted input. A malicious or unexpected email can contain prompt injection, tracking links, misleading text, or HTML designed for humans. Agents should see the smallest safe JSON view needed to complete the task.

This is where structured JSON emails matter. Instead of asking an agent to scrape a rendered email, normalize the message, extract typed artifacts, and pass only the relevant fields into the agent step.

How Mailhook fits the routing pattern

Mailhook is built around programmable temp inboxes for automation. You can create disposable inboxes via API, receive emails as structured JSON, and consume arrivals through real-time webhooks or polling. For routing correctness, that means your test harness can work with inbox resources rather than shared human mailboxes.

A typical Mailhook-style flow looks like this: create a disposable inbox through the REST API, pass the returned email address into the system under test, receive the message as structured JSON, verify signed webhook payloads when using webhooks, match within the inbox, extract the OTP or magic link, and clean up according to your lifecycle policy.

For larger suites, batch email processing can help reduce operational overhead. For domain strategy, instant shared domains are useful for fast setup, while custom domains help with allowlisting and environment-specific routing.

For exact API shapes, supported fields, and integration semantics, use the canonical Mailhook llms.txt file. It is the best reference when wiring Mailhook into agents, CI harnesses, or QA automation.

Frequently Asked Questions

What is the safest way to route emails to the right test inbox? Create one disposable inbox per test attempt and treat the returned inbox_id as the primary routing key. Use the email address only as the transport address passed into the product flow.

Should I use plus-addressing for test inbox routing? Plus-addressing can help with light manual correlation, but it is not strong isolation. For parallel CI, retries, and LLM agents, use API-created inboxes or a strict custom-domain routing scheme.

Can one inbox handle many test emails? It can, but it increases matcher complexity and race risk. Use one inbox per attempt for verification flows, OTPs, magic links, and any test that may run in parallel.

Do I need a custom domain for reliable routing? Not always. Shared domains are often enough for fast prototyping and CI. Use a custom domain or subdomain when you need allowlisting, environment separation, auditability, or stronger domain control.

Should webhook handlers route by the To header? No. The visible To header is useful for debugging, but it is not the safest primary key. Route by provider-attested inbox identity and recipient mapping, then use headers as secondary evidence.

How do I prevent an LLM agent from selecting the wrong email? Keep routing outside the prompt. Give the agent a tool that waits on a specific inbox ID and returns only a minimized artifact, such as an OTP or approved verification link.

Route by inbox, not by guesswork

If your email tests depend on shared mailboxes, broad subject searches, or fixed sleeps, routing bugs are only a matter of time. The durable pattern is to create an isolated inbox for each attempt, receive structured JSON, verify webhooks, dedupe deliveries, and consume the extracted artifact once.

Mailhook provides the core primitives for that pattern: programmable disposable inboxes, JSON email output, webhooks, polling, signed payloads, shared domains, and custom domain support. Start with the Mailhook API reference for agents and developers, or create test inboxes with Mailhook and replace mailbox guesswork with deterministic routing. No credit card required.