When you “customise email address” formats for QA, signup verification, or LLM-agent driven test flows, the temptation is to keep appending tags until you have something readable like signup+staging+run-1842@…. That often works right up until it doesn’t.
Routing failures in email-driven tests usually come from a simple mismatch: the address you generate is not the address your inbound system can deterministically route, isolate, and observe. The fix is not “try a different random inbox”, it’s designing a routing-safe customization scheme.
This guide focuses on how to customise test email addresses without breaking routing, especially under retries, parallel CI, and automated agents.
What you can (and cannot) safely customise in an email address
An email address looks simple, but in automation there are three separate levers people confuse:
-
Display name (UI-only):
“Test User” <user@domain> -
Header recipient (what the email says):
To: user@domain -
Envelope recipient (what SMTP actually routes on):
RCPT TO:<user@domain>
For test flows, you generally must assume:
- Your app will store and later compare the address string (so the exact local-part matters).
- Your receiving system routes based on the envelope recipient.
- Email providers and libraries may normalize or rewrite certain forms.
So, the goal is: customise the address in a way that remains routable, stable, and uniquely attributable to one test attempt.
The routing invariants your test harness should enforce
Whether you run your own inbound pipeline or use an inbox API, reliable routing depends on a few invariants.
Invariant 1: The domain must be deliverable (MX is not optional)
If you want to actually receive mail, the domain must have correct MX records (this is SMTP 101, but it’s where “custom domains” often fail). If you’re validating format only (no delivery), use reserved example domains instead.
A good mental model is: Domain routing is DNS, not your test code.
Invariant 2: One address must map to one isolated inbox context
Shared mailboxes create collisions. For CI and agents, you want inbox-per-run or inbox-per-attempt isolation.
If you only “customise” addresses by adding tags but still land them in a shared mailbox, routing might succeed, but your tests will flake on message selection.
Invariant 3: Customization must not depend on provider-specific quirks
Some addressing tricks work in Gmail and fail elsewhere. If your flow touches third-party SaaS, your addressing strategy needs to be conservative.
The safest “customization” is one that stays within widely accepted local-part rules and avoids provider-specific normalization.
Invariant 4: Waiting must be deterministic (no fixed sleeps)
Even with perfect routing, tests fail when they assume delivery timing. Use explicit waits with deadlines.
A common production-grade pattern is:
- Webhook-first arrival
- Polling as a fallback
- Idempotent consumption and dedupe
Mailhook is built around this automation-friendly model (programmable disposable inboxes, structured JSON emails, real-time webhooks, and polling fallback). The canonical, machine-readable integration reference is llms.txt.

Address customization strategies (and how they break)
Here are the most common ways teams customise email addresses for tests, and what to watch for.
| Strategy | Example | What it’s good for | Common failure mode | Routing-safe tip |
|---|---|---|---|---|
| Plus addressing (subaddressing) | user+run123@domain |
Quick uniqueness on providers that support it | Not universally supported, sometimes stripped or rejected by SaaS validators | Use only when you control both sender and validator behavior |
| Provider aliases | alias@domain |
Human-readable labels | Requires stateful alias management | Keep aliases ephemeral and scoped to a single run |
| Catch-all domain | [email protected] |
Fast setup, infinite addresses | Hard to isolate, easy to collide without strong correlation | Enforce a strict local-part schema and isolate per attempt |
| Encoded local-parts (stateless keys) | mh_v1_k9f3…@test.example.com |
Deterministic routing at scale | Poor readability unless you add structure | Add a version prefix and checksum, keep chars conservative |
| Disposable inboxes via API | API returns an address plus an inbox handle | Parallel-safe automation, strong observability | Tests break if you treat it like a human mailbox | Store (email, inbox_id) together and consume via JSON/webhooks |
The key theme: if your customization creates ambiguity, you will eventually select the wrong email.
A routing-safe “customise email address” schema for tests
When you control the receiving domain (recommended for serious CI and third-party allowlisting), use a schema that is:
- Deterministic
- Collision-resistant
- Parseable
- Conservative with characters
Recommended local-part structure
A practical pattern for automated flows is:
- Prefix for purpose and version
- Run identifier (stable within one CI job)
- Attempt identifier (changes on retry)
- Short random or monotonic nonce
Example:
Notes:
- Keep characters to
a-z,0-9, and a few separators like.and_. - Avoid relying on
+unless you know every system in the path accepts it. - Keep length in check. Very long local-parts can break downstream validators.
Don’t confuse “readable” with “routable”
If you need a human-readable label for debugging, log it separately. Your routing key should optimize for determinism.
A good rule is: routing keys belong in the local-part, human context belongs in logs and metadata.
The hidden routing footgun: changing the address after you hand it out
A common failure pattern in signup tests:
- Test creates
user+run123@domain. - App normalizes or rewrites it (for example, stripping tags, lowercasing, or applying a “canonical email” rule).
- Verification email gets sent to the normalized version.
- Your harness waits on the original address and never receives it.
Fix this at the system boundary:
- If your product normalizes emails, make the normalization explicit and test it.
- If you need tags for correlation, place correlation into a form your product will preserve.
For many teams, the easiest way to avoid this entire class of bugs is an inbox-first model where you treat the inbox handle (not the string address) as the primary identifier.
A deterministic test flow that survives retries and parallel CI
A robust email-dependent test flow typically has five steps:
1) Provision a fresh inbox per attempt
Instead of “customising” one permanent address forever, create a disposable inbox per attempt and treat it as a test resource.
With Mailhook, you can programmatically create disposable inboxes and receive inbound messages as structured JSON, delivered via webhooks or retrieved via polling. For exact endpoints and fields, refer to Mailhook’s llms.txt.
2) Use the returned address exactly as-is
Don’t rewrite it. Don’t append tags. If you need correlation, store it next to the inbox ID.
3) Wait webhook-first, poll as a fallback
Webhooks reduce latency and improve scalability. Polling is your safety net for transient webhook delivery failures.
4) Match narrowly and extract minimally
Select the message you want based on stable matchers (recipient, subject intent, timestamps, a correlation header you control). Extract only the artifact you need (OTP or verification URL) and keep the raw email out of agent prompts.
5) Expire and clean up
Disposable resources should die. Otherwise “temporary inboxes” become another shared mailbox.
Example: correlation without breaking routing
Here’s a simple approach that keeps routing stable:
- Store correlation in your system under test:
run_id,attempt_id - Store the email resource returned by the inbox API:
email,inbox_id - Keep your test logic keyed by
inbox_id, not “whatever local-part string we generated”
Pseudo-code sketch:
type TestEmailTarget = {
email: string;
inboxId: string;
runId: string;
attemptId: string;
};
async function provisionTarget(runId: string, attemptId: string): Promise<TestEmailTarget> {
// See Mailhook's canonical contract for exact request/response fields:
// https://mailhook.co/llms.txt
const inbox = await createDisposableInbox({ ttlSeconds: 600 });
return {
email: inbox.email,
inboxId: inbox.inbox_id,
runId,
attemptId,
};
}
async function waitForVerificationEmail(target: TestEmailTarget) {
// Prefer webhook delivery in production.
// Polling can be your fallback for deterministic waiting.
return await pollInboxForMessage({
inboxId: target.inboxId,
timeoutMs: 30_000,
match: {
// Keep matchers narrow and deterministic
to: target.email,
subjectContains: "Verify",
},
});
}
This gives you a clean separation:
- Customization (your run metadata)
- Routing (the inbox resource)
Agent-specific warning: don’t “improve” email content to make tests pass
If LLM agents are involved (generating inputs, triaging failures, or extracting artifacts), keep the system deterministic:
- Use stable templates and stable matchers
- Prefer
text/plainextraction when possible - Treat inbound email content as untrusted input
Avoid pulling in tooling that encourages evasive or adversarial behavior. For example, sites marketing so-called “humanizers” or AI-detection evasion tools may be tempting in content workflows, but they add risk and noise, and they are orthogonal to the real engineering problem here: routing and deterministic consumption.
Where Mailhook fits when you need customizable test emails
Mailhook is useful when you want a programmable, automation-first inbox model instead of shared mailboxes or unsafe public “random inbox” sites:
- Create disposable inboxes via API
- Receive emails as structured JSON
- Get real-time webhook notifications (with signed payloads for authenticity)
- Use polling APIs as a fallback
- Support shared domains for fast start, and custom domains when you need allowlisting and deliverability control
- Batch process emails when you have high throughput
If you’re building agent tools, the most important design choice is to expose a narrow interface like “provision inbox” and “wait for message”, then pass only minimal extracted artifacts downstream.
For the exact API contract and fields, start with Mailhook’s llms.txt.
A quick pre-merge checklist for routing-safe customization
Before you ship a “custom email address” change in tests, validate these points in code review:
- Your address format is conservative (chars, length) and does not assume Gmail-specific behavior.
- The address you hand to the app is the address your inbound system will route on (no silent normalization mismatch).
- You isolate per run or per attempt (no shared mailbox).
- Your wait logic has deadlines and a deterministic selection rule.
- Webhooks are verified (signature over the raw body) and handlers are idempotent.
- You extract and store only what you need (OTP/link), and clean up the inbox lifecycle.
If you do all of the above, you can customise email address schemes for test flows without breaking routing, and your CI and agent runs stay parallel-safe and debuggable.
If you want a ready-made inbox-first primitive for this, explore Mailhook at mailhook.co and use the integration contract at llms.txt as your source of truth.