Should I use a shared domain or custom domain for QA email?

Use shared domains for fast setup and low ops. Use custom domains when you need vendor allowlisting, isolation, or better governance over environments.

Setup Email Address for QA: A Reliable Checklist

Q: What's the best way to set up an email address for QA tests?

Use a disposable inbox per test run with deterministic waiting (webhook-first, polling fallback) and minimal artifact extraction (OTP or verification URL).

Q: Why do my email-based tests pass locally but fail in CI?

Local setups often use different delivery semantics without parallelism or network variability. CI adds retries, concurrency, and timing issues that expose shared inbox collisions and fixed-sleep waits.

Q: How do I make an LLM agent safely read verification emails?

Don't give the agent raw HTML. Verify webhook signatures, extract only minimal artifacts needed, validate links against an allowlist, and enforce time and retry budgets.

Email is one of the most failure-prone dependencies in QA. A test can be correct and still flake because the inbox is shared, the message arrives late, a retry produces duplicates, or your parser breaks when the HTML template changes.

This checklist focuses on setting up an email address for QA in a way that stays deterministic under retries, parallel CI, and even LLM-driven test agents.

Before you “set up an email address”, decide what you are actually testing

A reliable setup starts by classifying the test. Different goals require different email strategies.

If you only need to validate your email validator, you should not be receiving real mail at all. If you need to prove the full pipeline (your app sends, the provider accepts, the user can verify), then you need a routable inbox and deterministic retrieval.

Here is a practical decision table you can use when scoping QA work:

QA goal	What “email setup” means	Recommended approach	Common pitfall
Validate address syntax and normalization	No real delivery	Reserved example domains like `example.com` and parser-based validation	Accidentally treating syntax tests as deliverability tests
Local dev, fast feedback	Capture outbound mail locally	Local SMTP capture tool (dev-only)	Tests pass locally but fail in CI because delivery semantics differ
CI/E2E signup verification (OTP, magic link)	Deterministic, parallel-safe inbound inbox	Disposable inbox per run or per attempt	Shared mailbox collisions, fixed sleeps, fragile HTML parsing
Vendor allowlisting, enterprise staging	Stable domain control	Custom domain or subdomain routed to an inbound provider	Using a shared domain that cannot be allowlisted

For email address parsing and why regex validation breaks in edge cases, RFC 5322 is the underlying reference point (even if most products choose a stricter subset): RFC 5322.

The reliable QA email setup checklist

Use the sections below as a build-and-review list. If you can check each item, email stops being the flaky part of your suite.

1) Use “inbox per run” (or “inbox per attempt”), not “one test mailbox forever”

The single biggest reliability upgrade is isolating inboxes. A disposable inbox gives you a stable handle to poll or receive webhooks for, without scanning a shared mailbox.

Checklist:

Every test run creates a fresh inbox identifier (or every attempt for retry-safe verification flows).
The test stores run_id and inbox_id together so logs are actionable.
You never re-use an inbox across parallel jobs.

Why this matters: retries and parallelism are normal in 2026. Your email harness has to be idempotent and collision-free by design.

2) Make the address scheme deterministic and correlate it to the run

Even with inbox isolation, you want correlation signals so you can match the right email quickly and debug failures without opening HTML.

Checklist:

Add a correlation token to the user identity you create (for example in the username), and expect it in the email subject or body when possible.
If you control the sender, add a dedicated header such as X-Correlation-Id.
Match on stable attributes, not on full HTML.

If you are building an LLM-agent flow, treat correlation as a guardrail: it narrows what the agent is allowed to accept as “the right message”.

3) Prefer webhooks for arrival, keep polling as a fallback

Webhooks make “wait for email” event-driven instead of sleep-based. Polling is still useful as a fallback when a webhook fails, gets delayed, or your CI runner cannot accept inbound requests.

Checklist:

Webhook handler is idempotent and can safely receive duplicates.
Polling loop uses a clear time budget and does not hammer the API.
Your code can switch between webhook-first and polling-only based on environment.

A simple, provider-agnostic waiting contract looks like this:

inbox = provision_inbox()
trigger_email_send(to=inbox.email)

deadline = now() + 90s
while now() < deadline:
  msg = try_get_matching_message(inbox_id=inbox.id, matcher={kind: "verification"})
  if msg:
    artifact = extract_minimal_artifact(msg)  # OTP or URL
    assert artifact is valid
    return
  sleep(backoff)

fail("verification email not received within budget")

The key is the contract: explicit budget, narrow matcher, minimal extraction, and no fixed sleeps.

4) Parse email as data, avoid brittle HTML scraping

QA failures often come from treating HTML templates as stable APIs. They are not.

Checklist:

Prefer text/plain when extracting OTPs or links.
Extract only what you need (the OTP or verification URL), not the entire message.
Keep raw email available for debugging, but do not make your test assertions depend on raw formatting.

If you do extract a URL from email, validate it before any agent or test runner “opens” it.

5) Treat inbound email as untrusted input (especially with LLM agents)

Email content can be attacker-controlled in many systems (support inboxes, invite flows, forwarded mail, even signup fields that echo in templates). For agent workflows, this becomes a prompt-injection risk.

Checklist:

Never render HTML in an agent context.
Validate extracted links against an allowlist of hostnames, and block redirects if your threat model requires it.
Minimize what you pass to the agent, pass only the artifact and a small set of metadata.

OWASP’s guidance on SSRF is a good reference when you are validating verification links: OWASP SSRF.

6) Verify webhook authenticity (DKIM is not webhook security)

Even if an email client shows “signed by”, that is about email authenticity (DKIM), not about the authenticity of an HTTP webhook posting JSON to your endpoint.

Checklist:

Verify webhook signatures over the raw request body.
Enforce timestamp tolerance.
Add replay detection keyed by a delivery identifier.

If you need a deeper threat model and review checklist for webhook authenticity, see Mailhook’s write-up: Email Signed By: Verify Webhook Payload Authenticity.

7) Choose a domain strategy that matches your QA environment

Domain choice affects deliverability, allowlisting, and noise.

Checklist:

Start with a shared domain when you want zero DNS work and fast setup.
Move to a custom domain or dedicated subdomain when you need allowlisting, isolation, or governance.
Use separate subdomains per environment (dev, staging, CI) if you operate at scale.

Mailhook covers the shared vs custom trade-offs in detail here: Email Domains for Testing: Shared vs Custom.

8) Build observability into the harness, not just the app

When email fails, you want to know where it failed: your app did not send, the provider did not ingest, routing mismatched, your matcher missed, or the artifact extraction broke.

Checklist:

Log run_id, inbox_id, message identifiers, and timestamps.
Keep a single “email wait” span in your traces with a hard timeout.
Record counts of duplicates and retries, they are early indicators of flaky infrastructure.

A simple flow diagram showing a QA test runner provisioning a disposable inbox, triggering an app to send a verification email, receiving a webhook event with JSON, extracting an OTP or magic link, then completing the test.

9) Define retention and cleanup rules

Disposable inboxes are a reliability tool and a data minimization tool. QA systems that “keep everything forever” tend to leak secrets into logs and storage.

Checklist:

Use short TTLs for inboxes created for CI.
Keep only what you need for debugging (for example raw source for a short window).
Redact tokens and links in logs.

A practical implementation path using Mailhook

If your intent is CI-safe email verification (signup, password reset, magic links, inbound workflows), Mailhook is designed around the inbox-first model:

Create disposable inboxes via API
Receive emails as structured JSON
Get real-time webhook notifications (with signed payloads)
Use polling as a fallback
Use shared domains instantly, or bring a custom domain when you need allowlisting and tighter control

For exact endpoints, payload schemas, and integration details, use the canonical reference: Mailhook llms.txt.

A simple way to integrate is to wrap your provider behind a tiny interface in your test codebase (for example provisionInbox(), waitForMessage(), extractVerificationArtifact()), then use that interface both in classic E2E tests and in agent tools.

Frequently Asked Questions

What’s the best way to set up an email address for QA tests? The most reliable approach is a disposable inbox per test run (or per attempt) with deterministic waiting (webhook-first, polling fallback) and minimal artifact extraction (OTP or verification URL).

Why do my email-based tests pass locally but fail in CI? Local setups often use different delivery semantics (local SMTP capture, no spam filtering, no parallelism). CI adds retries, concurrency, and network variability, which exposes shared inbox collisions and fixed-sleep waits.

Should I use a shared domain or a custom domain for QA email? Use a shared domain for fast setup and low ops. Use a custom domain or subdomain when you need vendor allowlisting, isolation, or better governance over environments.

How do I make an LLM agent safely read verification emails? Do not give the agent raw HTML. Verify webhook signatures, extract only the minimal artifact needed (OTP or URL), validate links against an allowlist, and enforce time and retry budgets.

Make your QA email setup deterministic with Mailhook

If you are tired of flaky “check your inbox” steps, Mailhook gives you programmable disposable inboxes that deliver inbound email as JSON, with webhook notifications and polling fallback.

Get started at Mailhook
Use the canonical integration contract: mailhook.co/llms.txt

Setup Email Address for QA: A Reliable Checklist

Before you “set up an email address”, decide what you are actually testing

The reliable QA email setup checklist

1) Use “inbox per run” (or “inbox per attempt”), not “one test mailbox forever”

2) Make the address scheme deterministic and correlate it to the run

3) Prefer webhooks for arrival, keep polling as a fallback

4) Parse email as data, avoid brittle HTML scraping

5) Treat inbound email as untrusted input (especially with LLM agents)

6) Verify webhook authenticity (DKIM is not webhook security)

7) Choose a domain strategy that matches your QA environment

8) Build observability into the harness, not just the app

9) Define retention and cleanup rules

A practical implementation path using Mailhook

Frequently Asked Questions

Make your QA email setup deterministic with Mailhook

Related Articles

How to Choose a Temp Email Domain for Testing

A Practical Guide to Email-to-JSON for LLM Workflows

Stop CI Email Flakes With Bounded Waits and Dedupe