Instant Email Address Setup for CI and Agent Workflows

An instant email address is useful in CI only when it behaves like an API resource, not like a random mailbox someone checks manually. For signup tests, passwordless login, OTP validation, vendor onboarding, and LLM agent workflows, the address must be created on demand, routed reliably, observed by code, and cleaned up when the run ends.

That is the core shift: stop treating email as a human inbox, and start treating it as a short-lived event stream. With Mailhook, you can create disposable inboxes via API, receive emails as structured JSON, use webhooks or polling to wait for messages, and support both instant shared domains and custom domains when your workflow needs more control. For exact integration details, keep the Mailhook llms.txt reference close at hand.

What an instant email address setup needs to solve

A traditional shared mailbox is usually the root cause of flaky email tests. Multiple CI jobs compete for the same messages, retries pick up stale verification links, agents see more context than they need, and debugging requires logging into a mailbox after the test has already failed.

A reliable instant email address setup for CI and agent workflows should provide five properties:

Provisioning on demand: The pipeline or agent can create a fresh address when it needs one.
Isolation: Each test attempt, agent session, or workflow gets its own inbox boundary.
Machine-readable messages: Code receives normalized JSON instead of scraping rendered HTML.
Deterministic waiting: Webhooks and polling replace fixed sleeps.
Lifecycle control: The inbox can expire or be cleaned up after the workflow is complete.

Workflow problem	Fragile approach	Reliable approach
Parallel CI jobs	One shared mailbox	Disposable inbox per run or attempt
OTP and magic links	Scrape newest email globally	Match inside a specific inbox
LLM agent input	Give the model raw email HTML	Return only the extracted artifact
Debugging failures	Search a mailbox manually	Log inbox IDs, message IDs, and matcher results
Cleanup	Leave messages forever	Use TTLs, expiry, and retention rules

This matters even more for AI agents. An LLM agent should not browse a mailbox the way a human does. It should call a small tool, receive a constrained response, and act only on the artifact the workflow requested, such as an OTP or a verification link.

Reference architecture: CI runner or agent, inbox API, JSON email

A practical architecture has four moving parts.

The CI runner or agent creates an inbox through an API. The system under test sends email to the generated address. Mailhook receives and normalizes the message into structured JSON. Your harness receives that message through a webhook, or fetches it with polling if the webhook path is unavailable.

The important detail is that your code stores an inbox descriptor, not just the email address. The descriptor should include the generated email, the provider inbox identifier, timestamps or TTL information if available, and your own run or attempt identifier.

Descriptor field	Why it matters
`email`	The address passed into the signup, login, or verification flow
`inbox_id`	The stable handle used to retrieve messages from the right inbox
`domain`	Useful when switching between shared and custom domains
`run_id`	Connects the inbox to the CI job or agent session
`attempt_id`	Separates retries from the original failed attempt
`expires_at`	Helps cleanup jobs avoid retaining inboxes longer than needed

Mail delivery itself still follows standard SMTP routing concepts, including MX records and envelope recipients, as described in RFC 5321. The difference is that an API inbox provider abstracts most of that routing into a programmable interface your automation can use.

Build the CI harness around the inbox lifecycle

The safest setup is to make email creation part of the test fixture, not an afterthought inside the assertion. When a CI job starts a workflow that will send email, it should create a new disposable inbox first, pass the generated address into the product flow, then wait for a matching message with a clear deadline.

For most verification tests, prefer one inbox per attempt. If a test retries, create a new inbox rather than reusing the previous one. That removes ambiguity around stale messages and makes failures easier to inspect.

Here is provider-neutral pseudocode that shows the shape of the harness. Check the Mailhook llms.txt reference for exact REST semantics, payloads, and authentication details.

// Pseudocode, not a generated SDK.
type InboxDescriptor = {
  email: string;
  inboxId: string;
  expiresAt?: string;
};

async function runSignupVerificationAttempt(ctx) {
  const inbox: InboxDescriptor = await email.createInbox({
    purpose: "signup-verification",
    ttlSeconds: 600,
    metadata: {
      runId: ctx.runId,
      attemptId: ctx.attemptId
    }
  });

  await app.startSignup({
    email: inbox.email,
    correlationId: ctx.attemptId
  });

  const message = await email.waitForMessage({
    inboxId: inbox.inboxId,
    timeoutMs: 90_000,
    matcher: {
      subjectIncludes: "Verify",
      fromDomain: "example-app.com"
    }
  });

  const verificationUrl = extractAllowedLink(message.json, {
    host: "app.example.com",
    pathPrefix: "/verify"
  });

  await browser.goto(verificationUrl);
  await email.expireInbox({ inboxId: inbox.inboxId });
}

The core idea is simple: create, trigger, wait, extract, consume, expire. Your CI system should attach the normalized message JSON or a sanitized summary as a test artifact when a run fails. Avoid logging full raw email bodies or secrets unless your retention policy explicitly allows it.

Use webhooks first, then polling as a safety net

For CI and agent workflows, waiting is often where flakiness enters. A fixed sleep is both slow and unreliable. A five-second sleep fails when delivery takes six seconds, while a sixty-second sleep wastes time when delivery takes one second.

A better pattern is webhook-first delivery with bounded polling fallback. Mailhook supports real-time webhook notifications and a polling API, which lets you combine low latency with deterministic recovery.

Delivery method	Best use	Implementation rule
Webhook	Low-latency CI and event-driven agent runs	Verify the signed payload before parsing or processing
Polling	Fallback, local development, or blocked inbound webhooks	Use deadlines, cursors or seen IDs, and backoff
Hybrid	Production CI and autonomous agents	Accept webhook events, then poll if no event arrives before the deadline

Webhook handlers should be fast. Verify the signature, record the delivery event, enqueue processing, and acknowledge quickly. If a webhook is missed, delayed, or rejected, the polling path can still retrieve the message before the workflow times out.

For deeper implementation guidance on polling semantics, cursors, and dedupe, see Pull Email with Polling: Cursors, Timeouts, and Dedupe. For webhook verification order, see Signed Webhooks for Email: What to Verify First.

Keep LLM agents behind a small email tool contract

Agents should not receive open-ended mailbox access. Email is untrusted input. It can contain misleading text, tracking URLs, HTML, hidden instructions, and links that should never be followed automatically.

Instead, expose a small set of deterministic tools. The LLM asks for a task-level outcome, and your tool code enforces the security and matching rules.

Agent tool	Input	Output	Guardrail
`create_instant_email_address`	Purpose, TTL, domain preference	Email address and inbox ID	The agent does not choose arbitrary routing rules
`wait_for_email_artifact`	Inbox ID, matcher, timeout	OTP, magic link, or typed artifact	The model does not receive raw HTML by default
`expire_email_inbox`	Inbox ID	Expiry status	The session cannot reuse stale inboxes
`get_email_debug_summary`	Inbox ID and run ID	Sanitized message metadata	Secrets and full bodies stay out of prompts

This tool boundary is what makes email safe for agent workflows. The LLM can coordinate the task, but code handles signature verification, URL allowlisting, OTP extraction, deduplication, and lifecycle cleanup.

For link-following workflows, validate that extracted URLs belong to expected hosts and paths before returning them to the agent. If your tool fetches URLs server-side, apply SSRF protections and network egress controls. The OWASP SSRF Prevention Cheat Sheet is a useful reference for that layer.

Choose shared domains for speed, custom domains for control

The fastest way to start is with instant shared domains. They remove DNS work from the first integration and are ideal for prototypes, internal CI, and agent experiments where the receiving domain does not need to be allowlisted.

Custom domains are useful when your system under test or a third-party service enforces domain allowlists, environment separation, or governance rules. In that case, you typically route a dedicated subdomain to the inbox provider and keep the domain choice configurable in CI.

Domain option	Use it when	Trade-off
Instant shared domain	You need a working address immediately	Less control over domain identity
Custom subdomain	You need allowlisting, isolation, or environment labels	Requires DNS setup and verification
Multiple domains	You run high-volume or environment-specific workflows	Requires stronger configuration discipline

If you are planning a custom-domain setup, the detailed walkthrough in Custom Domains for Temp Inboxes: What to Set Up covers the DNS and routing considerations in more depth.

Reliability rules that prevent CI flakes

Once inbox creation is automated, most failures come from matching, duplication, or unsafe retries. The goal is not just to receive an email. The goal is to receive the right email once, extract the right artifact, and make the test outcome repeatable.

Use these rules as your baseline:

Create a fresh disposable inbox for each state-changing attempt.
Match inside the inbox first, then narrow by subject, sender, timestamp, and correlation data.
Prefer text/plain or structured fields when extracting OTPs and links.
Treat webhook delivery as at-least-once, and make processing idempotent.
Deduplicate at the delivery, message, and artifact layers.
Keep raw email out of LLM prompts unless a human-approved debugging mode is active.
Expire or clean up inboxes after the test, with a short drain window for late arrivals.

The phrase “latest matching message” is not enough if you are searching across a shared mailbox. It becomes much safer when “latest” is scoped to the inbox created for the current attempt.

Scale the pattern for many tests or many agents

At small scale, a single CI job might create one inbox, wait for one message, and expire it. At larger scale, hundreds of test workers or agent sessions may create addresses concurrently.

The scalable version still uses the same primitives, but centralizes control. A test fixture or agent tool service should own inbox creation, domain selection, webhook ingestion, polling fallback, and cleanup. That keeps individual tests and prompts simple.

Mailhook supports batch email processing, which is useful when many messages arrive across many disposable inboxes. Pair that with your own concurrency budgets so CI does not overload the system under test by triggering too many verification emails at once.

For high-throughput workflows, track queue depth and time-to-first-message rather than only pass or fail. If arrival times drift upward, you can adjust timeouts, reduce parallelism, or inspect third-party sender delays before the suite starts flaking.

Observability: log identifiers, not secrets

Email-driven failures are painful when the only visible output is “verification code not found.” Your harness should log the identifiers needed to reconstruct the event path without exposing sensitive content.

Log field	Why it helps	Safe logging note
`run_id`	Connects email events to the CI job	Safe
`attempt_id`	Separates retries	Safe
`inbox_id`	Identifies the isolated mailbox	Usually safe, treat as internal
`email_domain`	Shows shared vs custom routing	Safe
`message_id`	Helps dedupe and debug selection	Usually safe
`delivery_id`	Helps trace webhook retries	Usually safe
`received_at`	Exposes delivery latency	Safe
`artifact_hash`	Confirms OTP or link extraction without logging the secret	Prefer hash over raw value
`matcher_result`	Explains why a message was accepted or rejected	Avoid full body excerpts

This style of logging works well for both CI and agents. The human debugging the run can see what happened, while the model receives only the minimal artifact required for the next action.

How Mailhook maps to this setup

Mailhook is built around programmable temporary inboxes for automated systems, including LLM agents and QA automation. The important pieces line up directly with the setup described above.

Requirement	Mailhook capability
Create addresses on demand	Disposable inbox creation via API
Start quickly	Instant shared domains and no credit card required
Use controlled domains	Custom domain support
Consume messages as data	Structured JSON email output
Receive messages quickly	Real-time webhook notifications
Recover deterministically	Polling API for emails
Secure webhook ingestion	Signed payloads
Handle larger workflows	Batch email processing

The exact endpoint behavior, request shapes, webhook signature details, and machine-readable integration notes should be taken from Mailhook’s llms.txt. That reference is especially useful when adding Mailhook as a tool for an LLM runtime, because it gives agents and developers a canonical contract to follow.

Setup checklist

Before you ship the workflow, confirm the following:

CI creates an inbox before triggering the email-sending action.
The test stores the inbox ID, not only the address.
Retries create fresh inboxes or use a clearly isolated attempt boundary.
Webhook payloads are verified before JSON parsing or processing.
Polling has a total deadline, backoff, and dedupe logic.
Artifact extraction returns only the OTP, verification URL, or typed result needed.
URLs are allowlisted before a browser, test runner, or agent follows them.
Logs include run and message identifiers, but not raw secrets.
Inboxes expire or are cleaned up according to a retention policy.
Domain choice is configuration, so you can move from shared to custom domains without rewriting agent logic.

Frequently Asked Questions

What is an instant email address in CI? It is an email address created programmatically at test time, usually backed by a disposable inbox. In CI, the address should be tied to an inbox ID so code can retrieve the correct messages without using a shared mailbox.

Should I create one inbox per CI run or per test attempt? For email that changes application state, such as signup verification or passwordless login, one inbox per attempt is the safest default. It prevents stale links, duplicate OTPs, and retry collisions.

Are webhooks better than polling for agent workflows? Webhooks are better for low-latency event delivery, while polling is useful as a deterministic fallback. A hybrid setup gives agents and CI jobs both speed and reliability.

Can an LLM agent read the full email? It usually should not. Treat inbound email as untrusted input. Give the agent a minimized artifact, such as an OTP or allowlisted verification link, and keep raw content inside trusted tool code.

When do I need a custom domain? Use a custom domain or subdomain when the system under test requires allowlisting, when you need environment separation, or when domain identity is part of the workflow. Use shared domains when speed and simplicity matter most.

Where can I find Mailhook’s exact API contract? Use the Mailhook llms.txt file as the canonical machine-readable reference for integration details, especially when wiring Mailhook into LLM tools or automated CI harnesses.

Turn email into a deterministic CI and agent primitive

If your tests or agents depend on verification emails, do not make them log into a mailbox, sleep blindly, or parse raw HTML. Create a disposable inbox, receive structured JSON, verify delivery safely, extract the minimum artifact, and expire the inbox when the workflow ends.

Mailhook provides the primitives for that setup: API-created disposable inboxes, JSON email output, webhooks, polling, signed payloads, shared domains, custom domains, and batch processing. Start with an instant email address on a shared domain, then move to custom domains when your CI or agent workflows need more control.