Skip to content
Engineering

Email Infrastructure for AI Agents: Events, Idempotency, TTLs

| | 11 min read
Email Infrastructure for AI Agents: Events, Idempotency, TTLs
Email Infrastructure for AI Agents: Events, Idempotency, TTLs

If you are building AI agents that can sign up for services, verify accounts, reset passwords, or complete onboarding flows, email is not “just another integration.” It is part of your runtime infrastructure. It has latency, retries, duplicates, and hostile input risks. And unlike most APIs, email arrives through a delivery pipeline you do not control.

This post breaks email infrastructure for AI agents into three primitives you can design and reason about:

  • Events (how mail arrives, how you model delivery, and how you make it observable)
  • Idempotency (how you survive retries and duplicates without agent loops)
  • TTLs (how you bound state, cost, and risk with explicit lifecycles)

Along the way, we will point to concrete patterns you can implement whether you use Mailhook or roll your own inbound email stack.

Email infrastructure for agents is an event system, not a mailbox

Most teams start by treating email like a human UI: “create an address, wait, open the inbox, read the latest message.” That framing breaks as soon as you introduce:

  • Parallel agent runs (multiple attempts at the same task)
  • Retries at every layer (SMTP, provider ingestion, webhooks, your queue, your worker)
  • Untrusted content (prompt injection, malicious links, spoofed headers)

For agents, email is better modeled as an event stream attached to a short-lived resource.

A practical resource vocabulary looks like this:

Resource What it represents Why agents care
Inbox An isolated container for a single attempt or run Prevents collisions and makes selection deterministic
Message A normalized email record (headers, bodies, attachments) Lets you process email as data (JSON), not HTML
Delivery event A provider delivery attempt to you (webhook) or your pull retrieval Explains duplicates, retries, and ordering
Artifact The minimal thing the agent needs (OTP, magic link) Shrinks the prompt surface and reduces risk

Mailhook’s product is built around these primitives: you create disposable inboxes via API, receive inbound emails as structured JSON, and consume them via real-time webhooks (with signed payloads) or a polling API. The canonical integration contract for agents and automation is documented in llms.txt.

Events: define arrival semantics before you write agent logic

A reliable agent does not “check the inbox.” It waits for an event with clear semantics.

Prefer push delivery, but design for at-least-once

Webhooks are the natural fit for event delivery because they are low latency and avoid polling costs. The catch is that webhooks are almost always at-least-once:

  • Providers retry on timeouts or non-2xx responses
  • Your gateway may retry requests upstream
  • Your own handler might crash after partially processing

So the correct mental model is: “I will receive duplicates, and I will receive retries.”

A good webhook handler therefore:

  • Verifies authenticity (signature, timestamp tolerance)
  • Writes an idempotent record keyed by stable IDs
  • Acknowledges quickly (2xx)
  • Defers heavy processing to async workers

If you want a deeper checklist for webhook authenticity and replay defense, see Mailhook’s guidance on verifying signed webhook payloads.

Add a polling fallback for determinism

Even in an event-first design, polling is an essential fallback for:

  • Misconfigured webhooks
  • Temporary downstream outages
  • Networks that block inbound webhook traffic

The key is to make polling deterministic by tying it to an inbox identifier and using cursors, deadlines, and dedupe. (If you implement this, avoid “sleep 10 seconds then fetch latest.” Use a deadline and stop conditions.)

Mailhook supports both webhooks and polling so you can build a hybrid receive path that is resilient in CI and agent runs.

A simple architecture diagram showing an AI agent runner creating a disposable inbox, triggering an app flow, receiving email events via webhook into an event handler, and optionally using a polling API fallback. The final output is a minimal extracted artifact (OTP or verification link) returned to the agent.

Idempotency: the difference between “works in dev” and “safe for agents”

Idempotency is not one decision. For email-driven automation you need it at multiple layers, because duplicates can be introduced at multiple layers.

Layer 1: inbox provisioning idempotency

Agents and CI runners retry. If “create inbox” is not idempotent, you can leak inboxes and create hard-to-debug races.

Pattern:

  • If your system may call “create inbox” twice for the same attempt, include a client-generated idempotency key (for example: attempt_id).
  • Store the mapping attempt_id -> inbox_id so retries return the same inbox descriptor.

If you do not implement provisioning idempotency, the next best alternative is to design your orchestrator so that attempt IDs are minted once and passed through the workflow.

Layer 2: webhook delivery idempotency

Your webhook receiver should treat each delivery as an event with its own identifier. Your storage should enforce a uniqueness constraint so processing is naturally idempotent.

Pattern:

  • Persist inbound messages with stable IDs (message-level)
  • Persist deliveries separately (delivery-level)
  • Enforce uniqueness on the delivery identifier

Even if you do not expose “delivery IDs” to the agent, your infrastructure should have them for dedupe and observability.

A simple rule of thumb:

  • Message ID answers: “what email was this?”
  • Delivery ID answers: “which delivery attempt is this webhook?”

Layer 3: artifact consumption idempotency (the agent-facing one)

This is where agent systems often fail.

An agent that receives the same OTP email twice can:

  • Submit twice and lock an account
  • Resend verification repeatedly (bot loop)
  • Consume a stale link after a retry

Instead of “process message,” define “consume artifact” as the idempotent operation.

Pattern:

  • Extract the artifact deterministically (OTP or URL)
  • Compute an artifact_hash from the extracted value and context
  • Store artifact_hash in a uniqueness-constrained table
  • If already consumed, return the previous result (or a safe no-op)

This makes your system robust even if the email arrives twice, the webhook retries, and the agent repeats the tool call.

A compact idempotency map you can implement

Layer Idempotency key Stored where Failure it prevents
Provisioning attempt_id Inbox table Duplicate inboxes, leaked state
Delivery delivery_id Delivery table Webhook retry double-processing
Message message_id Message table Duplicate message ingestion
Artifact artifact_hash Artifact table OTP submit twice, link clicked twice

If you want to see how inbox-first APIs usually expose these concepts as endpoints and semantics, Mailhook’s blog post on read email API semantics is a good reference.

TTLs: lifecycle is a feature, not a cleanup job

Agents create state aggressively. If you do not have explicit TTLs, you will eventually accumulate:

  • Old inboxes with sensitive content
  • Confusing old messages that match loose selectors
  • Higher storage cost and slower queries

A disposable inbox should have an explicit lifecycle with an expiration time you can reason about.

Think in states: Active, Draining, Closed

A practical lifecycle model:

  • Active: inbox receives mail and emits events
  • Draining: inbox is no longer used for new work, but you accept late arrivals for a short grace period
  • Closed: inbox is sealed and eligible for deletion (or tombstoned)

Why “draining” matters: SMTP and provider pipelines can delay delivery. If you hard-delete instantly at TTL, you create flakiness that agents will “solve” by retrying, which amplifies load.

Mailhook covers this concept in depth in its guide to TTLs, cleanup, and drain windows.

TTL defaults (pragmatic starting points)

Your TTL should be a function of user experience and expected latency, not a random constant. Here is a sensible starting table for automation and agent runs:

Flow type Suggested inbox TTL Suggested drain window Notes
OTP verification 10 to 20 minutes 2 to 5 minutes Short-lived artifacts, avoid reuse
Magic link login 15 to 30 minutes 5 minutes Links often expire quickly
Password reset 30 to 60 minutes 10 minutes Reset links can have longer validity
Third-party vendor onboarding 1 to 4 hours 15 minutes Expect slow delivery and retries

These are not universal truths. Measure your real arrival latency and tune TTLs accordingly.

TTLs as a safety boundary for LLMs

TTLs are also security controls:

  • Less time for someone to exploit a leaked address
  • Less time for delayed malicious content to arrive
  • Less sensitive data stored long-term

When agents are involved, treat inbound email as untrusted input and keep retention intentionally short.

Security considerations specific to AI agents

Email is a high-risk input for autonomous systems because it mixes content, links, and implied instructions.

Three practical guardrails that work well in production:

  • Minimize the agent view: do not pass full HTML to the model. Pass the smallest extracted artifact and provider-attested metadata.
  • Constrain link handling: only allow the agent to open URLs that match allowlisted domains and safe paths, and block open redirects.
  • Verify webhook payloads: signature verification and replay detection should happen before any parsing or extraction.

If you are comparing solutions that give each agent an email identity (not just disposable inboxes), it is worth reading this independent review of MailMolt, which discusses isolation, monitoring, and prompt injection risks: MailMolt Review: The AI Agent Email Identity Tool Nobody Has Written About Yet.

A reference flow: event-first, idempotent, TTL-bounded

Below is a provider-agnostic sketch you can adapt. It focuses on the three primitives from this post.

Webhook ingestion (idempotent by default)

You want a fast webhook handler that only authenticates and persists.

// Pseudocode
function handleInboundEmailWebhook(req) {
  verifySignature(req.rawBody, req.headers) // fail closed

  const event = parseJson(req.body)
  const { delivery_id, message_id, inbox_id, received_at } = event

  db.transaction(() => {
    db.insertInto("deliveries")
      .values({ delivery_id, message_id, inbox_id, received_at })
      .onConflictDoNothing("delivery_id")

    db.insertInto("messages")
      .values({ message_id, inbox_id, normalized_json: event })
      .onConflictDoNothing("message_id")
  })

  return { status: 200 }
}

Artifact extraction (agent tool surface)

Keep the tool surface narrow:

  • Input: inbox_id, attempt_id
  • Output: otp or verification_url and a few stable IDs

Extraction should be idempotent:

function extractVerificationArtifact({ inbox_id, attempt_id }) {
  const msg = db.queryLatestMatchingMessage({ inbox_id, purpose: "verify" })
  const artifact = deriveArtifact(msg) // deterministic

  const artifact_hash = hash(attempt_id + ":" + artifact.value)

  const inserted = db.insertInto("artifacts")
    .values({ artifact_hash, inbox_id, message_id: msg.message_id, artifact })
    .onConflictDoNothing("artifact_hash")

  return db.getArtifactByHash(artifact_hash)
}

Lifecycle enforcement (TTL plus drain)

Model lifecycle explicitly and enforce it in code paths:

  • Reject new work on closed inboxes
  • Allow reads during draining
  • Garbage collect after closed plus retention

This is what turns “cleanup” into a predictable part of the system.

Where Mailhook fits (without changing your architecture)

Mailhook is designed to be the inbound email layer behind the patterns described above:

  • Create disposable inboxes via API
  • Receive normalized email as structured JSON
  • Use real-time webhooks (with signed payloads) and a polling API for fallback
  • Use shared domains for fast start, or custom domain support when you need allowlisting and control
  • Batch processing support for higher throughput workflows

If you are implementing agent tools, start from the canonical contract in mailhook.co/llms.txt. It is the fastest way to align your tool calls and data model with the platform’s supported semantics.

Frequently Asked Questions

What is “email infrastructure” for AI agents, exactly? It is the set of primitives that make email reliable and safe for automation: event delivery (webhooks or polling), stable identifiers, dedupe and idempotency, and explicit lifecycle controls (TTLs, drain windows).

Why do AI agents need idempotency more than traditional services? Agents retry autonomously, can call tools repeatedly, and can loop when they receive ambiguous results. Without idempotency at the artifact level (OTP, verification URL), an agent can double-submit, resend, or lock accounts.

How do TTLs reduce flakiness in CI and agent runs? TTLs prevent inbox reuse and stale message selection, while drain windows absorb late deliveries. Together they make “the right message” deterministic under retries.

Should I use polling or webhooks for agent email? Prefer webhooks for event delivery, but keep polling as a deterministic fallback. A hybrid approach is usually the most resilient.

Build agent-friendly email flows with Mailhook

If you want to stop treating email as a fragile UI and start treating it as reliable infrastructure, Mailhook provides the primitives you need: disposable inboxes, JSON-first messages, webhook events with signed payloads, and polling for fallback.

Get the exact integration contract in llms.txt, then explore Mailhook at mailhook.co.

Related Articles