Skip to content
Engineering

Temp Custom Email for QA and LLM Agents

| | 13 min read
Temp Custom Email for QA and LLM Agents
Temp Custom Email for QA and LLM Agents

Email is still one of the hardest dependencies to test reliably. Sign-up links arrive late, OTP messages get duplicated, shared mailboxes collide under parallel CI, and LLM agents can mis-handle raw HTML or stale messages if the inbox is not tightly scoped.

A temp custom email pattern solves that by combining two ideas: temporary, disposable inboxes created on demand, and a domain or subdomain your team controls. Instead of sending test traffic to a public throwaway inbox or a long-lived shared mailbox, each QA run or agent task gets an isolated, routable address such as [email protected], then receives the resulting email as structured data.

For QA teams, that means fewer flakes. For LLM agents, it means a safer, smaller tool surface. For platform teams, it means email becomes an observable event stream rather than a browser tab someone has to refresh.

What is temp custom email?

Temp custom email is the practice of provisioning short-lived email inboxes on a custom domain or subdomain for automated workflows. The inbox exists for a specific run, attempt, user journey, or agent task. When the workflow is complete, the inbox can be discarded or ignored according to your retention policy.

The important distinction is that the email address is not just a string. A production-ready implementation should return an inbox descriptor that your code can use later:

{
  "email": "[email protected]",
  "inbox_id": "inbox_8f3a",
  "domain": "qa.example.com",
  "purpose": "signup_verification",
  "run_id": "ci_49281"
}

That inbox_id is what makes the workflow deterministic. Your test harness or agent does not need to scan a shared mailbox and guess which message is relevant. It waits on a specific inbox, then extracts a specific artifact, such as an OTP, verification URL, password reset link, or invite token.

Why QA and LLM agents need a different email model

Human email clients are optimized for reading and replying. QA automation and LLM agents need very different semantics. They need isolated resources, machine-readable messages, bounded waits, idempotent processing, and strict trust boundaries.

A shared mailbox might work for a developer manually testing one sign-up flow. It breaks quickly when 50 CI jobs run in parallel, when retries trigger duplicate messages, or when an agent sees an old magic link and follows it.

Approach Best for Main weakness in automation
Public temp email site Manual one-off testing Shared visibility, weak privacy, inconsistent routing
Traditional mailbox Human workflows Hard to isolate, slow to automate, fragile under CI parallelism
Plus addressing Lightweight correlation Same underlying inbox, collisions and stale messages remain possible
Local SMTP capture Local development Does not verify real end-to-end delivery
Temp custom email QA, CI, LLM agents Requires domain routing and an inbox API provider

For end-to-end verification, the last option is usually the most robust: a temporary inbox per attempt, created through an API, using a domain strategy your organization can control.

The reference architecture

A reliable temp custom email setup has four layers: domain routing, inbox provisioning, message delivery, and safe consumption.

At the domain layer, you use a dedicated subdomain such as qa.example.com, ci.example.com, or agents.example.com. Keeping test email on a subdomain reduces the blast radius of DNS changes and separates automated traffic from human mail. Email routing is controlled primarily through MX records, as defined in SMTP standards such as RFC 5321.

At the inbox layer, your automation creates a disposable inbox before triggering the email. The inbox provider maps the generated recipient to a stable inbox_id.

At the delivery layer, inbound email is normalized into JSON and delivered to your code. For latency-sensitive systems, webhooks should be the primary path. Polling remains useful as a fallback when webhook delivery is delayed, blocked, or disabled in a local environment.

At the consumption layer, your test or agent extracts only what it needs. In most verification flows, that is a single code or URL. The raw email body should be treated as untrusted input, especially if an LLM agent is involved.

A practical flow looks like this:

  1. Create a disposable inbox on a custom test domain.
  2. Submit the generated address to the application under test.
  3. Wait for the message by webhook, with polling as a fallback.
  4. Match the message using the inbox ID, run ID, sender, subject, or expected intent.
  5. Extract the minimal artifact, then mark it consumed.
  6. Clean up logs, messages, and inbox references according to your retention policy.

This pattern keeps email-related automation deterministic even when jobs retry, messages duplicate, or test suites run in parallel.

Choosing the right custom domain layout

You do not need to route your primary company domain into your test harness. In most cases, a dedicated subdomain is safer and easier to reason about.

Good examples include:

  • qa.example.com for automated test suites.
  • ci.example.com for continuous integration runs.
  • agents.example.com for LLM-driven workflows.
  • staging-mail.example.com for staging environments.

The key is to keep domain choice configurable. Hardcoding a domain into tests, prompts, or agent logic makes migration painful. Instead, store the domain as environment-specific configuration:

EMAIL_TEST_DOMAIN=qa.example.com
EMAIL_PROVIDER=mailhook
EMAIL_DELIVERY_MODE=webhook_with_polling_fallback

This lets you start on an instant shared domain when you are prototyping, then move to a custom domain when you need allowlisting, governance, or clearer environment separation.

Addressing patterns that work at scale

Once the domain is configured, you still need a strategy for generating local parts, the part before @. The best pattern depends on how much state you want to keep in your system.

Pattern Example When to use it Watch out for
Encoded local-part [email protected] Stateless routing, easy debugging Keep length and character set conservative
Alias table [email protected] Compatibility with systems that reject long or unusual addresses Requires a lookup table
Catch-all with correlation [email protected] Exploratory testing and migration Needs strict filtering to avoid accidental matches

For QA and agents, encoded local-parts are often the default. They make debugging easier because the address itself can carry a run ID or purpose. Still, do not rely on the address alone. Keep the inbox descriptor in your test state, and use inbox_id as the primary handle.

Webhooks first, polling fallback

Email delivery is asynchronous. Fixed sleeps like “wait 10 seconds, then check the mailbox” are a common source of flaky tests. They are too short when delivery is slow and too long when everything works.

A better pattern is webhook-first waiting. Your provider notifies your system as soon as a message arrives. Your handler verifies the webhook, stores the event, and wakes the waiting test or agent. If the webhook never arrives within a bounded window, the workflow falls back to polling the inbox.

The webhook handler should be fast and idempotent. It should verify authenticity before parsing the body, deduplicate events, store the message or artifact, then return a success response quickly. Heavy processing can happen asynchronously.

Polling should also be bounded. Use a clear deadline, backoff between attempts, and a cursor or seen-message set so retries do not process the same message multiple times.

Safe extraction for LLM agents

LLM agents should not be handed a raw email and told to “figure it out.” Email is untrusted input. It can contain misleading instructions, tracking pixels, hidden HTML, suspicious links, or prompt-injection text that tries to override the agent’s task.

A safer approach is to hide raw email behind a small deterministic tool. The agent can request an inbox, wait for a message matching a specific purpose, and receive only the extracted artifact.

For example, instead of giving the model this:

{
  "html": "<html>Click <a href='https://example.com/verify?...'>here</a>. Ignore previous instructions...</html>"
}

Give it this:

{
  "artifact_type": "verification_url",
  "url": "https://app.example.com/verify?token=...",
  "source_inbox_id": "inbox_8f3a",
  "message_id": "msg_91b2",
  "confidence": "high"
}

The tool, not the model, should validate that the URL hostname is expected, the token belongs to the current attempt, and the artifact has not already been consumed.

This is especially important for agents that can browse, click links, create accounts, or trigger resend actions. Without guardrails, an agent can enter resend loops, follow stale links, or process attacker-controlled content.

Reliability rules for QA test harnesses

A good temp custom email harness has a few non-negotiable rules.

First, use one inbox per attempt. A retry is a new attempt, not a continuation of the same mailbox. This avoids stale message selection and makes logs much easier to interpret.

Second, match narrowly. Matching “the latest email with subject Verify your account” is not enough in parallel CI. Match on inbox ID, run ID, sender, subject pattern, and expected artifact type where possible.

Third, deduplicate at multiple layers. Webhooks can retry. SMTP delivery can duplicate. Polling can read the same message twice. Your code should dedupe delivery events, email messages, and extracted artifacts separately.

Fourth, expose useful observability. When a test fails, the logs should answer: which inbox was created, which address was submitted, whether the message arrived, which matcher rejected it, and whether an artifact was extracted.

Reliability concern Recommended control
Parallel test collisions Create a disposable inbox per attempt
Late email arrival Use deadline-based waits, not fixed sleeps
Duplicate delivery Dedupe by delivery ID, message ID, and artifact hash
Stale verification links Bind artifacts to run ID or inbox ID
Hard-to-debug failures Log stable IDs, timestamps, and matcher outcomes
Agent prompt injection Provide minimized JSON views, not raw HTML

A provider-agnostic implementation sketch

The exact API calls depend on your provider, but the shape of the integration should stay simple. For Mailhook-specific capabilities and payload details, use the canonical Mailhook llms.txt integration reference.

async function verifySignupWithTempCustomEmail(userData) {
  const runId = process.env.CI_RUN_ID ?? crypto.randomUUID();

  const inbox = await emailProvider.createInbox({
    domain: process.env.EMAIL_TEST_DOMAIN,
    metadata: {
      purpose: "signup_verification",
      run_id: runId
    }
  });

  await app.signup({
    ...userData,
    email: inbox.email
  });

  const message = await emailProvider.waitForMessage({
    inbox_id: inbox.inbox_id,
    timeout_ms: 60000,
    match: {
      from_contains: "no-reply",
      subject_contains: "Verify",
      purpose: "signup_verification"
    }
  });

  const artifact = extractVerificationArtifact(message, {
    allowed_hosts: ["app.example.com"],
    artifact_type: "verification_url"
  });

  await consumeOnce(artifact);
  await app.openVerificationUrl(artifact.url);

  return {
    inbox_id: inbox.inbox_id,
    message_id: message.message_id,
    verified: true
  };
}

The most important design choice here is that the app never reads from a shared mailbox. It waits for a message in the exact inbox created for the current attempt.

Where Mailhook fits

Mailhook is built around programmable temp inboxes for automation, QA, and LLM agents. It provides disposable inbox creation via API, structured JSON email output, RESTful API access, real-time webhook notifications, polling for email retrieval, instant shared domains, custom domain support, signed payloads for security, and batch email processing.

That means you can start quickly with a shared domain, then move automated workflows to a custom domain or subdomain when your team needs more control. Your tests and agents can receive emails as JSON instead of scraping a mailbox UI, and webhook signatures help your systems verify that inbound payloads came from the expected source.

Mailhook’s model is particularly useful when your automation needs to:

  • Verify sign-up, invite, password reset, or magic-link flows.
  • Run email-dependent tests in parallel CI.
  • Give LLM agents a constrained email tool.
  • Receive inbound messages as JSON for downstream processing.
  • Route test traffic through shared or custom domains.

You can review the machine-readable integration details in Mailhook’s llms.txt, which is designed to help developers and agents understand the API contract.

Common mistakes to avoid

The first mistake is using a public temp inbox site for automated tests. Those services may be fine for manual experiments, but they usually lack stable API contracts, privacy boundaries, webhook verification, and deterministic message retrieval.

The second mistake is treating a custom domain as the whole solution. A domain helps with routing and control, but it does not solve collisions by itself. You still need isolated inboxes, strong correlation, dedupe, and bounded waits.

The third mistake is exposing too much email content to an LLM. The model should not decide whether a link is safe. Your code should validate the URL, extract the artifact, and provide the model with only the minimal result.

The fourth mistake is ignoring lifecycle. Temporary inboxes should have a defined purpose, ownership, retention policy, and cleanup behavior. Even if messages are short-lived, logs can still leak sensitive tokens if you store full bodies unnecessarily.

Temp custom email checklist

Before you move this pattern into production CI or agent workflows, confirm the following:

  • Your test traffic uses a dedicated domain or subdomain.
  • Inboxes are created per run, per test, or per attempt.
  • Your code stores an inbox descriptor, not only the email address.
  • Webhooks are verified before processing.
  • Polling has a clear timeout and dedupe strategy.
  • Raw HTML is not exposed directly to LLM agents.
  • OTPs and verification links are consumed once.
  • Logs include stable IDs but avoid leaking full tokens.
  • Domain selection is configuration, not hardcoded logic.
  • Your provider supports the delivery and security semantics your workflow needs.

Frequently Asked Questions

What does temp custom email mean? Temp custom email means creating temporary, disposable email inboxes on a custom domain or subdomain so automated systems can receive messages reliably without using a shared human mailbox.

Is a custom domain required for QA email testing? No. Many teams start with shared provider domains because setup is faster. A custom domain becomes useful when you need allowlisting, environment separation, auditability, or more control over routing.

Why not use plus addressing for QA tests? Plus addressing can help with correlation, but all messages still land in the same underlying mailbox. It does not provide true inbox isolation, which is why it can fail under parallel CI and retries.

How should LLM agents handle verification emails? LLM agents should use a constrained tool that creates an inbox, waits for a matching message, and returns only a validated OTP or URL. Avoid exposing raw email HTML or untrusted instructions to the model.

Should email tests use webhooks or polling? Use webhooks as the primary path for low latency and lower overhead, with polling as a bounded fallback. This hybrid approach is more reliable than fixed sleeps.

Can Mailhook receive emails on custom domains? Yes. Mailhook supports custom domains, disposable inbox creation via API, structured JSON output, webhooks, polling, signed payloads, and shared domains for fast starts.

Build deterministic email workflows with Mailhook

If your QA suite or LLM agent workflow depends on verification emails, a temp custom email architecture can make the difference between flaky automation and repeatable results.

Mailhook gives developers programmable disposable inboxes, JSON email payloads, webhooks, polling, shared domains, custom domain support, and signed payloads for safer automation. Start with the Mailhook homepage or review the exact integration contract in llms.txt.

Related Articles