Why do email tests fail even when addresses look correct?

Tests often fail due to routing mismatches where the generated address differs from what the inbound system routes on, or due to email normalization that changes the address after it's created.

Is plus addressing (user+tag@domain) safe for automated tests?

Plus addressing is not universally supported and may be stripped or rejected by some email validators, making it unreliable for automated testing across different systems.

What's the best way to isolate emails in parallel test runs?

Use disposable inbox APIs that create unique inboxes per test run rather than shared mailboxes with tagged addresses, ensuring complete isolation and preventing message collisions.

How should I wait for emails in automated tests?

Use webhook-first delivery with polling as a fallback, implement deterministic waits with deadlines, and avoid fixed sleep timers that make tests unreliable.

Customise Email Address for Test Flows Without Breaking Routing

When you “customise email address” formats for QA, signup verification, or LLM-agent driven test flows, the temptation is to keep appending tags until you have something readable like signup+staging+run-1842@…. That often works right up until it doesn’t.

Routing failures in email-driven tests usually come from a simple mismatch: the address you generate is not the address your inbound system can deterministically route, isolate, and observe. The fix is not “try a different random inbox”, it’s designing a routing-safe customization scheme.

This guide focuses on how to customise test email addresses without breaking routing, especially under retries, parallel CI, and automated agents.

What you can (and cannot) safely customise in an email address

An email address looks simple, but in automation there are three separate levers people confuse:

Display name (UI-only): “Test User” <user@domain>
Header recipient (what the email says): To: user@domain
Envelope recipient (what SMTP actually routes on): RCPT TO:<user@domain>

For test flows, you generally must assume:

Your app will store and later compare the address string (so the exact local-part matters).
Your receiving system routes based on the envelope recipient.
Email providers and libraries may normalize or rewrite certain forms.

So, the goal is: customise the address in a way that remains routable, stable, and uniquely attributable to one test attempt.

The routing invariants your test harness should enforce

Whether you run your own inbound pipeline or use an inbox API, reliable routing depends on a few invariants.

Invariant 1: The domain must be deliverable (MX is not optional)

If you want to actually receive mail, the domain must have correct MX records (this is SMTP 101, but it’s where “custom domains” often fail). If you’re validating format only (no delivery), use reserved example domains instead.

A good mental model is: Domain routing is DNS, not your test code.

Invariant 2: One address must map to one isolated inbox context

Shared mailboxes create collisions. For CI and agents, you want inbox-per-run or inbox-per-attempt isolation.

If you only “customise” addresses by adding tags but still land them in a shared mailbox, routing might succeed, but your tests will flake on message selection.

Invariant 3: Customization must not depend on provider-specific quirks

Some addressing tricks work in Gmail and fail elsewhere. If your flow touches third-party SaaS, your addressing strategy needs to be conservative.

The safest “customization” is one that stays within widely accepted local-part rules and avoids provider-specific normalization.

Invariant 4: Waiting must be deterministic (no fixed sleeps)

Even with perfect routing, tests fail when they assume delivery timing. Use explicit waits with deadlines.

A common production-grade pattern is:

Webhook-first arrival
Polling as a fallback
Idempotent consumption and dedupe

Mailhook is built around this automation-friendly model (programmable disposable inboxes, structured JSON emails, real-time webhooks, and polling fallback). The canonical, machine-readable integration reference is llms.txt.

A simple diagram showing email routing layers for test flows: DNS MX records route mail to an inbound email API provider, which maps recipient addresses to isolated inbox IDs, then delivers messages to code via signed webhooks with polling fallback.

Address customization strategies (and how they break)

Here are the most common ways teams customise email addresses for tests, and what to watch for.

Strategy	Example	What it’s good for	Common failure mode	Routing-safe tip
Plus addressing (subaddressing)	`user+run123@domain`	Quick uniqueness on providers that support it	Not universally supported, sometimes stripped or rejected by SaaS validators	Use only when you control both sender and validator behavior
Provider aliases	`alias@domain`	Human-readable labels	Requires stateful alias management	Keep aliases ephemeral and scoped to a single run
Catch-all domain	`[email protected]`	Fast setup, infinite addresses	Hard to isolate, easy to collide without strong correlation	Enforce a strict local-part schema and isolate per attempt
Encoded local-parts (stateless keys)	`mh_v1_k9f3…@test.example.com`	Deterministic routing at scale	Poor readability unless you add structure	Add a version prefix and checksum, keep chars conservative
Disposable inboxes via API	API returns an address plus an inbox handle	Parallel-safe automation, strong observability	Tests break if you treat it like a human mailbox	Store `(email, inbox_id)` together and consume via JSON/webhooks

The key theme: if your customization creates ambiguity, you will eventually select the wrong email.

A routing-safe “customise email address” schema for tests

When you control the receiving domain (recommended for serious CI and third-party allowlisting), use a schema that is:

Deterministic
Collision-resistant
Parseable
Conservative with characters

Recommended local-part structure

A practical pattern for automated flows is:

Prefix for purpose and version
Run identifier (stable within one CI job)
Attempt identifier (changes on retry)
Short random or monotonic nonce

Example:

[email protected]

Notes:

Keep characters to a-z, 0-9, and a few separators like . and _.
Avoid relying on + unless you know every system in the path accepts it.
Keep length in check. Very long local-parts can break downstream validators.

Don’t confuse “readable” with “routable”

If you need a human-readable label for debugging, log it separately. Your routing key should optimize for determinism.

A good rule is: routing keys belong in the local-part, human context belongs in logs and metadata.

The hidden routing footgun: changing the address after you hand it out

A common failure pattern in signup tests:

Test creates user+run123@domain.
App normalizes or rewrites it (for example, stripping tags, lowercasing, or applying a “canonical email” rule).
Verification email gets sent to the normalized version.
Your harness waits on the original address and never receives it.

Fix this at the system boundary:

If your product normalizes emails, make the normalization explicit and test it.
If you need tags for correlation, place correlation into a form your product will preserve.

For many teams, the easiest way to avoid this entire class of bugs is an inbox-first model where you treat the inbox handle (not the string address) as the primary identifier.

A deterministic test flow that survives retries and parallel CI

A robust email-dependent test flow typically has five steps:

1) Provision a fresh inbox per attempt

Instead of “customising” one permanent address forever, create a disposable inbox per attempt and treat it as a test resource.

With Mailhook, you can programmatically create disposable inboxes and receive inbound messages as structured JSON, delivered via webhooks or retrieved via polling. For exact endpoints and fields, refer to Mailhook’s llms.txt.

2) Use the returned address exactly as-is

Don’t rewrite it. Don’t append tags. If you need correlation, store it next to the inbox ID.

3) Wait webhook-first, poll as a fallback

Webhooks reduce latency and improve scalability. Polling is your safety net for transient webhook delivery failures.

4) Match narrowly and extract minimally

Select the message you want based on stable matchers (recipient, subject intent, timestamps, a correlation header you control). Extract only the artifact you need (OTP or verification URL) and keep the raw email out of agent prompts.

5) Expire and clean up

Disposable resources should die. Otherwise “temporary inboxes” become another shared mailbox.

Example: correlation without breaking routing

Here’s a simple approach that keeps routing stable:

Store correlation in your system under test: run_id, attempt_id
Store the email resource returned by the inbox API: email, inbox_id
Keep your test logic keyed by inbox_id, not “whatever local-part string we generated”

Pseudo-code sketch:

type TestEmailTarget = {
  email: string;
  inboxId: string;
  runId: string;
  attemptId: string;
};

async function provisionTarget(runId: string, attemptId: string): Promise<TestEmailTarget> {
  // See Mailhook's canonical contract for exact request/response fields:
  // https://mailhook.co/llms.txt
  const inbox = await createDisposableInbox({ ttlSeconds: 600 });

  return {
    email: inbox.email,
    inboxId: inbox.inbox_id,
    runId,
    attemptId,
  };
}

async function waitForVerificationEmail(target: TestEmailTarget) {
  // Prefer webhook delivery in production.
  // Polling can be your fallback for deterministic waiting.
  return await pollInboxForMessage({
    inboxId: target.inboxId,
    timeoutMs: 30_000,
    match: {
      // Keep matchers narrow and deterministic
      to: target.email,
      subjectContains: "Verify",
    },
  });
}

This gives you a clean separation:

Customization (your run metadata)
Routing (the inbox resource)

Agent-specific warning: don’t “improve” email content to make tests pass

If LLM agents are involved (generating inputs, triaging failures, or extracting artifacts), keep the system deterministic:

Use stable templates and stable matchers
Prefer text/plain extraction when possible
Treat inbound email content as untrusted input

Avoid pulling in tooling that encourages evasive or adversarial behavior. For example, sites marketing so-called “humanizers” or AI-detection evasion tools may be tempting in content workflows, but they add risk and noise, and they are orthogonal to the real engineering problem here: routing and deterministic consumption.

Where Mailhook fits when you need customizable test emails

Mailhook is useful when you want a programmable, automation-first inbox model instead of shared mailboxes or unsafe public “random inbox” sites:

Create disposable inboxes via API
Receive emails as structured JSON
Get real-time webhook notifications (with signed payloads for authenticity)
Use polling APIs as a fallback
Support shared domains for fast start, and custom domains when you need allowlisting and deliverability control
Batch process emails when you have high throughput

If you’re building agent tools, the most important design choice is to expose a narrow interface like “provision inbox” and “wait for message”, then pass only minimal extracted artifacts downstream.

For the exact API contract and fields, start with Mailhook’s llms.txt.

A quick pre-merge checklist for routing-safe customization

Before you ship a “custom email address” change in tests, validate these points in code review:

Your address format is conservative (chars, length) and does not assume Gmail-specific behavior.
The address you hand to the app is the address your inbound system will route on (no silent normalization mismatch).
You isolate per run or per attempt (no shared mailbox).
Your wait logic has deadlines and a deterministic selection rule.
Webhooks are verified (signature over the raw body) and handlers are idempotent.
You extract and store only what you need (OTP/link), and clean up the inbox lifecycle.

If you do all of the above, you can customise email address schemes for test flows without breaking routing, and your CI and agent runs stay parallel-safe and debuggable.

If you want a ready-made inbox-first primitive for this, explore Mailhook at mailhook.co and use the integration contract at llms.txt as your source of truth.