Disposable inboxes solve one problem (reliable, isolated email receipt for tests and agents), but they create another: lifecycle management. If you do not manage inbox lifecycle deliberately, you end up with flaky runs (late emails landing in the “wrong” attempt), slow CI (poll loops that never stop), and a quiet pileup of retained messages you no longer need.
This post is a practical, engineering-first guide to three lifecycle primitives you can standardize across your stack:
- TTLs (time to live) so an inbox has an explicit “active until” boundary.
- Drain windows so you can safely handle late arrivals and provider retries without re-opening the workflow.
- Cleanup so you delete or tombstone inboxes and messages predictably, with good auditability.
When you implement these well, disposable inboxes become a deterministic resource your CI and LLM agents can use like any other tool: provision, consume, finalize, forget.
The lifecycle vocabulary (and why each piece exists)
Most teams conflate “expiry” and “cleanup”. Separate the concepts and your system gets simpler.
TTL: how long an inbox is allowed to be “active”
Inbox TTL is the interval during which you expect the workflow to complete, for example, “wait for the verification email for up to 90 seconds.” TTL is not about how long SMTP might retry, it is about how long your workflow stays open.
A good TTL policy is:
- Short by default (seconds to minutes) for OTPs and magic links.
- Explicitly tied to the attempt (inbox-per-attempt or inbox-per-run).
- Enforced by code (deadlines, not sleeps).
Drain window: accept late arrivals, but do not re-open the workflow
A drain window is a short period after the TTL where you still ingest messages that arrive late (or are re-delivered), but you treat them as:
- Debug evidence,
- Dedup candidates,
- Or inputs for idempotent completion, if the attempt already succeeded.
This matters because email delivery is not perfectly punctual. SMTP retries, greylisting, and queue delays exist. The retry model is part of the SMTP ecosystem (see RFC 5321 for delivery semantics and retry behavior).
Drain windows let you be strict (attempt is closed) without being blind (you still collect late signals).
Cleanup: removing data and resources on a schedule
Cleanup is the process that:
- Deletes messages you no longer need.
- Deletes or expires inboxes you no longer need.
- Writes a final “tombstone” record so a late webhook or polling client does not resurrect state.
In practice, cleanup is where you enforce privacy and cost controls.
Tombstone: the “do not process again” marker
A tombstone is a small record that outlives the inbox TTL and message retention. It exists to answer the question:
“If I receive an event for inbox X later, should I process it?”
The answer is almost always “no”, but you still want that decision to be deterministic.
A simple inbox state machine you can implement today
Treat inboxes like a tiny state machine with monotonic transitions. This is the backbone of reliable lifecycle enforcement.
| State | Time bound | What you allow | What you reject | Typical storage |
|---|---|---|---|---|
| Active |
created_at to active_until (TTL) |
Webhook/poll receipt, matching, artifact extraction | Nothing yet, but you still dedupe | Full normalized JSON, minimal artifacts |
| Draining |
active_until to drain_until
|
Ingest and dedupe late messages, attach for debug | Triggering new sends for this attempt, agent re-processing | Often store normalized JSON, but do not advance workflow state |
| Closed | After drain_until
|
Idempotent “already done” responses, safe 404-style semantics | Any processing that could mutate the attempt | Tombstone + minimal metadata |

Why this matters for LLM agents
Agents are great at “trying again”, which is exactly what you do not want after TTL. A state machine provides a hard boundary: the agent can request status, but cannot accidentally re-run a verification loop by re-consuming old mail.
Picking TTLs and drain windows (pragmatic defaults)
The “right” values depend on your system, but you can standardize a default policy and override per workflow.
| Workflow type | Inbox TTL (active) | Drain window | Message retention (debug) | Notes |
|---|---|---|---|---|
| OTP / sign-up verification in CI | 60 to 180 seconds | 5 to 15 minutes | 1 to 24 hours | Short TTL keeps tests fast, drain captures late mail and dupes for debugging |
| Password reset / magic link in CI | 2 to 5 minutes | 15 to 30 minutes | 24 hours | Longer TTL if the app under test has slower async jobs |
| LLM agent “wait for email” tool | 2 to 10 minutes | 30 to 60 minutes | 1 to 7 days | Agents may be interrupted, drain avoids surprises after resumption |
| Human-in-the-loop staging ops | 15 to 60 minutes | 1 to 6 hours | 7 to 30 days | More slack, but be strict about retention and access |
Two rules keep you honest:
- TTL is your workflow SLA, not the email network’s maximum retry horizon.
- Drain window is for late arrivals and duplicates, not for extending the attempt.
The “Inbox Lifecycle Controller” pattern
If you have more than one workflow, centralize lifecycle logic. You want one component that defines how inboxes are created, waited on, drained, and finalized.
A minimal controller typically does four things:
1) Provision with lifecycle metadata
When you create an inbox via an API provider, store lifecycle fields alongside your attempt:
inbox_idemailcreated_atactive_untildrain_untilstate
If your provider returns expiry metadata (many inbox-first APIs do), persist it, but still compute your own workflow deadlines. Treat provider expiry as a safety net, not your only control.
2) Receive email (webhook-first, polling as a fallback)
For deterministic systems, you typically want:
- Webhooks for low latency and event-driven ingestion.
- Polling for resilience when webhooks fail, or when running locally.
If you use Mailhook, you can receive emails as structured JSON, via real-time webhooks or a polling API, and you can validate authenticity using signed payloads. Use the canonical integration contract here: llms.txt.
Related reading (if you are implementing the hybrid pattern): Webhook-first, polling fallback.
3) Enforce deadlines in the waiter, not in the test
Your “wait for email” helper should take a deadline and stop. Never sprinkle sleep(10) in tests or agent loops.
A good waiter contract:
- Stops at
active_until. - Returns a clear timeout result with debug metadata.
- Optionally continues ingestion into a drain queue until
drain_until.
4) Finalize and cleanup deterministically
Finalization is a state transition plus optional retention decisions:
- Store the minimal artifact you need (OTP, verification URL, message_id) for audit.
- Drop or redact the rest.
- Mark inbox state
DrainingthenClosed.
Cleanup strategies that do not fall apart at scale
Cleanup is easy to describe and easy to get subtly wrong.
Strategy A: Time-based garbage collection (GC)
Run a periodic job that:
- Finds inbox records with
drain_until < nowandstate != Closed. - Marks them
Closed. - Deletes messages older than your retention policy.
This is the simplest approach and works well if your database can handle scans by time index.
Strategy B: Event-driven cleanup with delayed jobs
When you create an inbox record, enqueue two delayed jobs:
- One at
active_until(transition toDraining). - One at
drain_until(transition toClosed, trigger deletion).
This reduces scanning, but you must handle job retries and idempotency.
Strategy C: Opportunistic cleanup on access
Whenever a client requests inbox status or polls messages:
- If
now > active_until, refuse to advance the workflow. - If
now > drain_until, return a tombstone response.
This helps when background jobs lag, but it should complement, not replace, GC.
Drain windows in practice: what you actually do during draining
“Draining” should be boring. You want to capture evidence and prevent mutation.
During drain:
- Ingest webhooks and poll results normally, but route messages to a “late mail” path.
- Deduplicate aggressively (delivery-level and message-level), because this is where duplicates show up.
- Do not re-run business actions (do not click links, do not submit OTPs again).
- Attach late messages as debugging artifacts to CI runs or incident tickets.
If you need to compute “did we already complete this attempt?”, your invariant is:
- The attempt completion should be idempotent, and keyed by an artifact-level identifier (for example, an OTP hash or a verification link token), not by “first email seen.”
For more on dedupe and stable records, see: Dedup, normalize, store.
Security and agent-safety constraints (lifecycle edition)
Lifecycle controls are also security controls.
Do not let an agent read an inbox forever
If an agent can keep checking an inbox indefinitely, it can:
- Trigger resend loops,
- Consume unintended messages,
- Or be prompt-injected by late, hostile content.
Deadlines and drain windows are the mechanical fix.
Verify webhook authenticity before you store anything
If you accept webhooks, verify signatures on the raw request body and fail closed. Mailhook supports signed payloads, which you should validate before persisting or dispatching downstream.
Background: Verify webhook payload authenticity.
Keep tombstones small, but keep them long enough
A tombstone record should include:
inbox_idstate = Closedclosed_at- Optional reason codes (timeout, success, cancelled)
- Optional dedupe keys you still need to reject replay
Do not keep full message bodies in tombstones.
A concrete data model for lifecycle management
You can implement the controller with a single table (or collection) and strong indices.
| Field | Type | Purpose |
|---|---|---|
inbox_id |
string | Provider handle for retrieval and correlation |
email |
string | The routable address you used |
attempt_id |
string | Your workflow attempt identifier |
state |
enum | Active, Draining, Closed |
active_until |
timestamp | TTL boundary |
drain_until |
timestamp | Drain boundary |
created_at |
timestamp | Observability and debugging |
closed_at |
timestamp nullable | Tombstone time |
Index active_until, drain_until, and attempt_id. Most cleanup becomes “find by time, update state, delete messages.”
How Mailhook fits into this pattern
Mailhook gives you the inbox primitives you need to implement lifecycle controls cleanly:
- Create disposable inboxes via API.
- Receive inbound email as structured JSON.
- Get real-time webhook notifications (with signed payloads).
- Use polling as a fallback (and for drain/backfill paths).
- Batch email processing for higher-throughput pipelines.
Your controller still owns workflow deadlines (TTL) and what you retain, but the ingestion and JSON normalization become straightforward.
If you are wiring this into an agent tool, start with the canonical contract: Mailhook llms.txt.
Frequently Asked Questions
What is the difference between an inbox TTL and message retention? Inbox TTL is the workflow deadline for “this attempt is allowed to complete.” Message retention is how long you keep email content for debugging, audit, or replay. They should be configured independently.
How long should a drain window be? Long enough to capture late deliveries and duplicates (often 5 to 30 minutes for CI-style verification flows), but short enough that you do not accidentally extend the workflow.
Should my agent keep polling during the drain window? Usually no. Let your ingestion layer keep collecting late arrivals, but prevent the agent from taking actions after active_until.
Do I need to delete inboxes, or can I just let them expire? Letting them expire is fine if you also enforce deadlines in your own code and have a cleanup policy for stored messages and metadata. A tombstone record is still helpful for deterministic “closed” behavior.
How do I prevent late emails from affecting retries in CI? Use inbox-per-attempt, enforce active_until in the waiter, and make completion idempotent at the artifact level. Late messages should be stored for debug during draining, not used to mutate state.
Build a deterministic inbox lifecycle with Mailhook
If you are building agent tools, QA automation, or verification harnesses, treat inboxes as short-lived resources with explicit TTLs, drain windows, and cleanup. Mailhook provides programmable disposable inboxes, JSON email output, signed webhooks, and polling fallback so you can implement this pattern without running mail servers.
Get the exact API semantics here: Mailhook llms.txt, then explore the platform at Mailhook.