Skip to content
Engineering

Build Email Verification Flows Without Real User Inboxes

| | 13 min read
Build Email Verification Flows Without Real User Inboxes
Build Email Verification Flows Without Real User Inboxes

Real user inboxes are a bad dependency for email verification flows. They are slow to automate, hard to isolate, risky for privacy, and almost impossible to make deterministic when CI, QA, or an LLM agent is running multiple attempts in parallel.

The better pattern is not to skip verification. It is to verify through a synthetic, programmable inbox that behaves like a real recipient from your application’s point of view, while giving your automation a clean API contract: create an inbox, receive the email as JSON, extract the OTP or magic link, and complete the flow.

That shift turns email verification from “log into a mailbox and hope the right message is there” into a controlled system boundary.

What “without real user inboxes” actually means

Building email verification flows without real user inboxes does not mean bypassing your product’s verification logic. Your application should still generate and send a real verification email through the same code path your users rely on.

The difference is the recipient. Instead of sending to a human Gmail, Outlook, Workspace, or shared QA mailbox, the flow sends to a disposable email address created specifically for that test, agent task, signup attempt, or client operation.

A good synthetic verification flow still validates the important things:

  • Your app creates the pending user or session correctly.
  • Your email provider receives the request to send a verification email.
  • The inbound email reaches a routable address.
  • The verification artifact, such as an OTP or magic link, can be extracted safely.
  • The artifact completes the same verification endpoint a user would complete.
  • The inbox and test identity are cleaned up afterward.

This is especially useful for AI agents and LLM workflows. Agents should not be given credentials to a real mailbox, a browser session with personal email, or broad access to raw email history. They need a narrow tool that can wait for the specific verification email and return the minimum artifact required to continue.

Why real user inboxes break automated verification

Human inboxes are optimized for people, not automation. They contain old messages, promotions, forwarded mail, threaded conversations, anti-bot prompts, unpredictable UI changes, and authentication flows that are unrelated to the test.

For email verification, those properties create avoidable failure modes.

Real inbox failure mode What happens in practice Better alternative
Shared mailbox collisions Two runs read the same OTP or magic link Create one disposable inbox per attempt
Stale message selection Automation grabs yesterday’s verification email Match by inbox ID, timestamp, and correlation token
UI-driven scraping Browser selectors break when the mailbox UI changes Consume structured JSON emails via API
Human data exposure Tests or agents see unrelated personal messages Use isolated temporary inboxes
Authentication friction MFA, device checks, and login challenges fail CI Use API-based inbox access
Poor cleanup Test messages accumulate indefinitely Set lifecycle rules and expire inboxes

This matters even more when autonomous agents are involved. If an LLM can inspect a real mailbox, unrelated messages may influence its behavior. If it can click arbitrary links, it may follow malicious or irrelevant content. Synthetic inboxes let you make the email surface small, typed, auditable, and temporary.

The core architecture: verification as an event-driven inbox flow

A robust verification flow has four resources: a synthetic user, a disposable inbox, an inbound message, and a verification artifact. The artifact is the only thing your test or agent usually needs.

The high-level flow looks like this:

  1. Create a disposable inbox through an API.
  2. Register or initiate verification using the generated email address.
  3. Receive the email through a webhook or polling API.
  4. Parse the message as structured JSON, not a rendered mailbox page.
  5. Extract the OTP or magic link using narrow rules.
  6. Submit the artifact to your app’s verification endpoint.
  7. Expire the inbox and clean up the synthetic user.

Mailhook is built for this model: it provides programmable disposable inboxes via API, structured JSON email output, real-time webhooks, polling access, signed payloads, shared domains, custom domain support, and batch email processing. For exact integration details, use the canonical Mailhook llms.txt reference, which is designed to be readable by developers and LLM agents.

A developer workflow diagram showing an application sending a verification email to a disposable API inbox, the inbox delivering structured JSON to automation, and the automation submitting an OTP or magic link back to the application.

Design the inbox as a first-class resource

The most common mistake is treating the email address as just a string. For reliable automation, the address should be part of an inbox descriptor that your system stores for the duration of the attempt.

A practical descriptor includes fields like:

Field Purpose
inbox_id Stable handle used to retrieve and correlate messages
email Address passed into your application under test
created_at Helps debug timing and stale messages
expires_at Defines the inbox lifecycle and cleanup window
run_id or attempt_id Ties the inbox to a CI job, agent task, or signup attempt
domain_strategy Records whether the address used a shared or custom domain

This descriptor should be passed through your test harness or agent toolchain. Do not rely on global mailboxes, global search, or subject-only matching. The inbox itself is the primary boundary.

Choose the right domain strategy

For early development, shared provider domains are usually the fastest way to get started. They avoid DNS work and make it easy to create disposable inboxes immediately.

For staging, enterprise integrations, or systems that require allowlisting, a custom domain or subdomain is often better. A custom domain gives you more control over routing, separation, and policy. For example, you might use a dedicated subdomain for verification testing instead of your primary production domain.

Scenario Recommended domain approach Why
Local development Shared domain or local capture Fast setup and low operational overhead
CI verification tests Shared domain or dedicated test subdomain Deterministic inbox creation and parallel safety
Enterprise staging Custom subdomain Easier allowlisting and environment separation
LLM-agent workflows Shared or custom, behind a narrow tool Keeps domain choice out of agent reasoning
Deliverability experiments Controlled sender and recipient domains Lets you test authentication and routing intentionally

The key is to keep the domain as configuration. Your verification harness should not care whether the address came from a shared domain today and a custom domain later.

Receive emails through webhooks, with polling as a fallback

Verification flows should not use fixed sleeps. Waiting 30 seconds and then checking a mailbox may pass on a quiet day and fail under load. Instead, use an event-driven wait.

A webhook-first design gives your automation low-latency delivery when the message arrives. A polling fallback gives you resilience if a webhook endpoint is temporarily unavailable, your local tunnel drops, or a test worker restarts.

The basic rule is simple: webhooks are the primary arrival signal, polling is the recovery path.

For webhook consumers, verify the request before processing it. Signed payloads help ensure the inbound JSON was delivered by the expected provider and was not tampered with in transit. Your handler should acknowledge quickly, deduplicate deliveries, and process the message asynchronously when possible.

For polling consumers, use deadlines rather than endless loops. A verification step should have a clear time budget, a cursor or seen-message set, and narrow matchers so it does not accidentally consume an unrelated message.

Extract only the verification artifact

Once the email arrives, avoid handing the whole message to a test runner or LLM unless you truly need it. Most verification flows need exactly one of two artifacts: an OTP code or a magic link.

For OTPs, prefer a deterministic extraction pipeline over a single broad regex. Look at the intended sender, subject, text body, and expected code shape. Then return the code and the message metadata needed for auditability.

For magic links, validate the link before using it. Check the host, scheme, path, and expected environment. Do not let an agent click arbitrary URLs from an email. In many systems, the safest pattern is for code to extract and validate the link, then pass only the approved URL or token to the next step.

A minimal artifact object can look like this:

{
  "type": "otp",
  "value": "123456",
  "inbox_id": "inb_...",
  "message_id": "msg_...",
  "received_at": "2026-05-14T21:11:13Z"
}

For a magic link, the type might be verification_url and the value should be a URL that has already passed allowlist checks.

A provider-neutral verification harness

The following pseudocode shows the shape of a verification flow without depending on a real user inbox. Adapt the endpoint names to your provider and application.

async function verifySyntheticUser(app, inboxApi, userFactory) {
  const attemptId = crypto.randomUUID();

  const inbox = await inboxApi.createInbox({
    purpose: "signup_verification",
    metadata: { attempt_id: attemptId }
  });

  const user = await userFactory.createPendingUser({
    email: inbox.email,
    attempt_id: attemptId
  });

  await app.requestSignupVerification({
    user_id: user.id,
    email: inbox.email
  });

  const message = await inboxApi.waitForMessage({
    inbox_id: inbox.inbox_id,
    timeout_ms: 60000,
    match: {
      to: inbox.email,
      subject_contains: "Verify",
      after: inbox.created_at
    }
  });

  const artifact = extractVerificationArtifact(message, {
    expected_host: "app.example.com",
    expected_type: "otp_or_link"
  });

  await app.submitVerification({
    user_id: user.id,
    artifact: artifact.value
  });

  await inboxApi.expireInbox({ inbox_id: inbox.inbox_id });

  return {
    user_id: user.id,
    inbox_id: inbox.inbox_id,
    message_id: message.message_id,
    verified: true
  };
}

In production-grade test infrastructure, wrap this with idempotency keys. If the same attempt retries, it should not create duplicate users, consume the same OTP twice, or process the same webhook delivery twice.

LLM-agent safe verification tools

If an LLM agent needs to complete a signup, passwordless login, or client onboarding step, do not give it a general-purpose mailbox tool. Give it a narrow verification tool.

A safe tool contract might expose only these operations:

  • create_verification_inbox(purpose, ttl) returns an email address and inbox ID.
  • wait_for_verification_artifact(inbox_id, expected_sender, expected_type) returns an OTP or approved URL.
  • expire_verification_inbox(inbox_id) closes the inbox when the task is done.

The agent does not need raw MIME, full HTML, unrelated messages, mailbox credentials, or access to historical email. It needs a typed artifact and enough metadata to explain what happened.

For AI agents, also add hard constraints outside the model:

  • Enforce a maximum wait time.
  • Limit resend attempts.
  • Validate magic-link domains in code.
  • Verify webhook signatures before messages enter the tool context.
  • Store message IDs and artifact hashes for deduplication.
  • Return minimal text to the model to reduce prompt-injection risk.

This pattern gives the agent the ability to complete verification while keeping the trust boundary in deterministic code.

Common mistakes when replacing real inboxes

Moving away from real user inboxes is straightforward, but a few mistakes can reintroduce flakiness.

Mistake Why it fails Fix
Reusing one inbox for many attempts Old messages and parallel runs collide Create a new inbox per attempt
Matching only by subject Many verification emails share the same subject Match by inbox, recipient, timestamp, sender, and purpose
Rendering HTML to extract links HTML changes and can contain unsafe content Prefer structured JSON and text content
Letting agents inspect full emails Untrusted content can steer the model Return only minimized artifacts
Ignoring duplicate deliveries Webhooks and mail delivery can retry Deduplicate by delivery, message, and artifact
Never expiring inboxes Test data accumulates and becomes noisy Use TTLs and cleanup jobs
Skipping webhook verification Attackers or bugs can spoof inbound events Verify signed payloads before parsing

A good verification harness should fail loudly when the expected email does not arrive, when more than one valid artifact is found, or when the artifact points to an unexpected domain.

Where Mailhook fits

Mailhook gives developers and agents the primitives needed to build verification flows without real user inboxes:

  • Disposable inbox creation via API for per-attempt isolation.
  • Structured JSON email output so automation does not scrape mailbox UI.
  • Real-time webhook notifications for low-latency message arrival.
  • Polling API support for fallback waits and recovery paths.
  • Signed payloads for webhook authenticity checks.
  • Shared domains for fast setup and custom domain support for controlled environments.
  • Batch email processing for higher-volume workflows.

The result is a verification flow that still tests the real product behavior, but removes the fragile human-mailbox dependency. If you are wiring this into an agent framework, CI runner, or QA harness, start from the Mailhook llms.txt integration reference so both developers and LLM tools use the same contract.

FAQ

Can I test email verification without sending a real email? You can unit test token generation without sending email, but end-to-end verification should send a real message to a controlled inbox. Disposable API inboxes let you test the real path without using a real user mailbox.

Should I create one inbox per user or one inbox per attempt? For CI, QA, and LLM agents, one inbox per attempt is usually safer. It prevents stale messages, retry collisions, and parallel test races.

Are disposable inboxes safe for production users? They are best used for automation, QA, staging, synthetic users, and controlled workflows. Do not replace real user ownership checks with disposable addresses for actual customer identity unless that is an intentional product policy.

What is better for verification emails, webhooks or polling? Use webhooks as the primary path because they are fast and event-driven. Keep polling as a fallback for retries, local development, and recovery from missed webhook deliveries.

How do I keep LLM agents from reading unsafe email content? Put email handling behind a narrow tool. Verify inbound webhooks, parse messages as JSON, extract only the OTP or approved verification URL, and return that minimized artifact to the agent.

Build verification flows your automation can trust

Real user inboxes are the wrong abstraction for automated email verification. A disposable inbox API gives you isolation, deterministic retrieval, structured JSON, and a safer interface for LLM agents.

With Mailhook, you can create programmable temporary inboxes, receive verification emails as JSON, handle arrival through webhooks or polling, and keep the entire flow testable without exposing real mailbox data. Review the Mailhook llms.txt reference to connect the exact API contract to your CI, QA, or agent workflow.

Related Articles