Skip to content
Engineering

Setup an Email Flow for Automation: Provision, Wait, Extract

| | 8 min read
Setup an Email Flow for Automation: Provision, Wait, Extract
Setup an Email Flow for Automation: Provision, Wait, Extract

Email is still the default “out-of-band” channel for signups, password resets, invite links, and one-time passcodes, which makes it the most common source of flaky automation. The core problem is not generating an address, it is building a deterministic flow that survives retries, parallel runs, and hostile content.

A reliable setup an email flow for automation comes down to three stages:

  • Provision an isolated inbox resource (not just a string address)
  • Wait for delivery with explicit semantics (webhook-first, polling fallback)
  • Extract a minimal artifact (OTP, magic link, attachment) as structured data

Below is a practical blueprint you can lift into CI harnesses and LLM agent toolchains.

The “Provision, Wait, Extract” mental model

Think of inbound email as an event stream attached to a short-lived resource.

  • Provision gives you isolation and a stable handle.
  • Wait turns nondeterministic delivery time into a bounded, observable operation.
  • Extract prevents brittle HTML scraping and reduces the chance your agent follows malicious instructions.

This model also makes ownership clear: your application sends an email, your automation consumes a specific inbox, and your workflow pulls out one narrow piece of truth.

A simple three-step flow diagram labeled Provision, Wait, Extract. Provision creates a disposable inbox and returns an email address plus inbox_id. Wait shows webhook delivery as the primary path and polling as a fallback path with a deadline. Extract shows pulling an OTP or verification link from structured JSON and passing only that artifact onward.

Step 1: Provision an inbox resource (and treat it like infrastructure)

If your automation starts with “generate an email address,” you are already exposed to collisions and ambiguous reads. Prefer provisioning an inbox and returning a descriptor object that includes both the address and an inbox identifier.

What to store from provisioning:

  • email: the address you hand to the system under test
  • inbox_id: the handle you will read from
  • attempt_id (your own): a correlation ID for the run/test/agent attempt
  • expires_at (if provided by your inbox provider): so cleanup is not optional

Domain choice: shared now, custom later

Most teams start with a provider’s shared domains because it is fast. You switch to a custom domain (or subdomain) when you need allowlisting, environment isolation, or deliverability control.

One practical rule: keep the domain strategy configurable so you can migrate without rewriting the wait/extract code.

A provisioning contract you can reuse

Even if you swap providers, keep your internal interface stable:

export type ProvisionedInbox = {
  email: string;
  inboxId: string;
  attemptId: string;
};

export interface EmailFlowProvider {
  provisionInbox(input: { attemptId: string }): Promise<ProvisionedInbox>;
}

That interface becomes your test fixture, your agent tool, and your production automation primitive.

Step 2: Wait with explicit semantics (webhook-first, polling fallback)

Waiting is where most “email automation” breaks:

  • fixed sleeps that are either too short (flakes) or too long (slow pipelines)
  • polling loops without deadlines (hung runs)
  • webhook handlers that are not idempotent (duplicate processing)

A robust waiting strategy has two layers:

  • Webhook-first for low latency and better parallel safety
  • Polling fallback to recover from webhook outages, misconfig, or transient network issues

Webhook-first: verify, ack fast, process async

If you receive email events via webhooks, treat the HTTP request as untrusted input.

Minimum expectations for a production-grade handler:

  • Verify authenticity (for example, signed payloads when your provider supports them)
  • Enforce replay protection (timestamp tolerance plus a dedupe key)
  • Ack quickly, then push the event into a queue for parsing and extraction

This is especially important for agent workflows where “prompt injection by email” is a real operational risk.

Polling fallback: bounded time, cursoring, and dedupe

Polling does not need to be fancy, it needs to be correct:

  • an overall deadline (for example, 60 seconds)
  • a short interval with backoff
  • a way to avoid reprocessing the same message (cursor or seen-IDs)

Provider-agnostic polling sketch:

import time

class Timeout(Exception):
    pass

def wait_for_message(list_messages, match_fn, deadline_seconds=60):
    started = time.time()
    seen = set()
    backoff = 0.5

    while True:
        if time.time() - started > deadline_seconds:
            raise Timeout("email wait exceeded deadline")

        msgs = list_messages()  # should be scoped to a single inboxId
        for m in msgs:
            msg_id = m.get("message_id") or m.get("id")
            if msg_id and msg_id in seen:
                continue
            if msg_id:
                seen.add(msg_id)

            if match_fn(m):
                return m

        time.sleep(backoff)
        backoff = min(backoff * 1.5, 3.0)

The critical part is not the loop, it is the constraint: poll a single, isolated inbox and stop after a deadline.

Step 3: Extract a minimal artifact from structured email

Once you have “a message,” resist the temptation to hand the entire email body to an agent or to parse HTML with fragile selectors.

Instead:

  • Prefer structured JSON output for stable fields (from, to, subject, timestamps, message_id)
  • Prefer text/plain when you must parse content
  • Extract a single artifact that your workflow needs, then discard the rest

Typical artifacts:

  • OTP (numeric code)
  • verification URL (magic link)
  • attachment (PDF, CSV)

Safe extraction rules that hold up in 2026

Email is a hostile medium. Treat extracted artifacts as untrusted until validated.

If you extract a link:

  • Enforce an allowlist of hosts you expect
  • Reject non-HTTPS
  • Block link-local and private IP ranges to reduce SSRF exposure
  • Consider checking for open redirects before handing the URL to a browser automation step

If you extract an OTP:

  • Validate length and charset
  • Bind the OTP to the attempt (store attempt_id plus an artifact hash)
  • Use consume-once semantics so retries do not double-submit

A minimal JSON shape for extraction

Your extraction code should be able to work with a compact, stable representation.

Field Why it matters Used for
inbox_id Ensures isolation Scoping reads and audits
message_id Stable identity Idempotency and dedupe
received_at Ordering and deadlines Selecting “latest matching” safely
subject Lightweight matcher Filtering before body parsing
text Safer than HTML OTP/link extraction
artifacts (derived) Downstream contract Pass to agent/test steps

If your provider delivers emails already normalized as JSON, extraction becomes deterministic and easier to test.

Failure modes, and what each stage should guarantee

This table is a useful design review tool. If a failure mode is not addressed in the stage where it belongs, it will surface as flakes later.

Stage Guarantee you want Common failure mode Mitigation
Provision Isolation per attempt Cross-test collisions Inbox-per-attempt, store attempt_id
Wait Bounded, observable arrival Fixed sleeps, hanging runs Webhook-first, polling fallback, deadlines
Extract Minimal, deterministic output HTML drift, injection JSON-first, text/plain, minimal artifact
All Retry-safe processing Duplicate deliveries Idempotency keys at message and artifact layers

Where Mailhook fits (and how to keep it agent-friendly)

Mailhook is built for exactly this resource-based model: you can create disposable inboxes via API, receive inbound emails as structured JSON, and consume deliveries via real-time webhooks (with signed payloads) or via a polling API when you need a fallback. It also supports shared domains for quick starts and custom domains when you need control.

For the canonical integration contract and up-to-date endpoint details, use the project’s llms.txt: mailhook.co/llms.txt.

A practical “tool surface” for an LLM agent stays small:

  • provision_inbox(attempt_id) -> { email, inbox_id }
  • wait_for_email(inbox_id, matcher, deadline) -> message_json
  • extract_artifact(message_json, kind=otp|link) -> { artifact }
  • expire_inbox(inbox_id)

Keeping the tool surface narrow is a security feature: it limits what the model can do if it receives a malicious email.

If you are automating onboarding or verification flows for regulated organizations, minimizing exposed content and retention matters even more. For example, workflows that touch client communications in legal contexts (think firms like Henlin Gibson Henlin) benefit from extracting only the required artifact and storing stable IDs for auditability, rather than persisting full message bodies.

Implementation tips that save hours in CI and agent runs

Make the inbox lifecycle explicit

Even if your provider supports automatic expiry, your code should act as if cleanup is part of correctness:

  • record when an inbox was provisioned
  • stop waiting after a deadline
  • expire or stop using the inbox after success

Log identifiers, not content

For debuggability without leaking secrets, log:

  • attempt_id, inbox_id, message_id
  • timestamps and wait durations
  • which matcher selected the message

Avoid logging full bodies by default, especially in shared CI logs.

Batch when the workflow is high-volume

If you run many parallel attempts, optimize by batching reads and processing events asynchronously. Mailhook supports batch email processing, which can help when you are draining many inboxes or doing large verification runs.

A concise “done” checklist

Your email automation flow is production-ready when:

  • Provision returns an inbox descriptor (email plus inbox handle)
  • Wait is webhook-first, with polling fallback and an overall deadline
  • Webhooks are verified (signatures, replay checks) before processing
  • Extraction returns a minimal artifact (OTP/link) and validates it
  • Processing is idempotent (message-level and artifact-level)
  • Cleanup is explicit (expiry, retention rules, and safe logging)

If you implement these guarantees, email stops being a flaky side channel and becomes a predictable automation primitive.

Related Articles