Manage Inbox Lifecycle: TTLs, Cleanup, and Drain Windows

Q: What is the difference between an inbox TTL and message retention?

Inbox TTL is the workflow deadline for 'this attempt is allowed to complete.' Message retention is how long you keep email content for debugging, audit, or replay. They should be configured independently.

Q: How do I prevent late emails from affecting retries in CI?

Use inbox-per-attempt, enforce active_until in the waiter, and make completion idempotent at the artifact level. Late messages should be stored for debug during draining, not used to mutate state.

Disposable inboxes solve one problem (reliable, isolated email receipt for tests and agents), but they create another: lifecycle management. If you do not manage inbox lifecycle deliberately, you end up with flaky runs (late emails landing in the “wrong” attempt), slow CI (poll loops that never stop), and a quiet pileup of retained messages you no longer need.

This post is a practical, engineering-first guide to three lifecycle primitives you can standardize across your stack:

TTLs (time to live) so an inbox has an explicit “active until” boundary.
Drain windows so you can safely handle late arrivals and provider retries without re-opening the workflow.
Cleanup so you delete or tombstone inboxes and messages predictably, with good auditability.

When you implement these well, disposable inboxes become a deterministic resource your CI and LLM agents can use like any other tool: provision, consume, finalize, forget.

The lifecycle vocabulary (and why each piece exists)

Most teams conflate “expiry” and “cleanup”. Separate the concepts and your system gets simpler.

TTL: how long an inbox is allowed to be “active”

Inbox TTL is the interval during which you expect the workflow to complete, for example, “wait for the verification email for up to 90 seconds.” TTL is not about how long SMTP might retry, it is about how long your workflow stays open.

A good TTL policy is:

Short by default (seconds to minutes) for OTPs and magic links.
Explicitly tied to the attempt (inbox-per-attempt or inbox-per-run).
Enforced by code (deadlines, not sleeps).

Drain window: accept late arrivals, but do not re-open the workflow

A drain window is a short period after the TTL where you still ingest messages that arrive late (or are re-delivered), but you treat them as:

Debug evidence,
Dedup candidates,
Or inputs for idempotent completion, if the attempt already succeeded.

This matters because email delivery is not perfectly punctual. SMTP retries, greylisting, and queue delays exist. The retry model is part of the SMTP ecosystem (see RFC 5321 for delivery semantics and retry behavior).

Drain windows let you be strict (attempt is closed) without being blind (you still collect late signals).

Cleanup: removing data and resources on a schedule

Cleanup is the process that:

Deletes messages you no longer need.
Deletes or expires inboxes you no longer need.
Writes a final “tombstone” record so a late webhook or polling client does not resurrect state.

In practice, cleanup is where you enforce privacy and cost controls.

Tombstone: the “do not process again” marker

A tombstone is a small record that outlives the inbox TTL and message retention. It exists to answer the question:

“If I receive an event for inbox X later, should I process it?”

The answer is almost always “no”, but you still want that decision to be deterministic.

A simple inbox state machine you can implement today

Treat inboxes like a tiny state machine with monotonic transitions. This is the backbone of reliable lifecycle enforcement.

State	Time bound	What you allow	What you reject	Typical storage
Active	`created_at` to `active_until` (TTL)	Webhook/poll receipt, matching, artifact extraction	Nothing yet, but you still dedupe	Full normalized JSON, minimal artifacts
Draining	`active_until` to `drain_until`	Ingest and dedupe late messages, attach for debug	Triggering new sends for this attempt, agent re-processing	Often store normalized JSON, but do not advance workflow state
Closed	After `drain_until`	Idempotent “already done” responses, safe 404-style semantics	Any processing that could mutate the attempt	Tombstone + minimal metadata

A simple inbox lifecycle diagram with four labeled boxes connected left to right: Create Inbox, Active (TTL), Drain Window, Cleanup and Tombstone. Each box has a small note underneath: receive via webhooks/polling, accept late arrivals and dedupe, delete messages, keep tombstone metadata.

Why this matters for LLM agents

Agents are great at “trying again”, which is exactly what you do not want after TTL. A state machine provides a hard boundary: the agent can request status, but cannot accidentally re-run a verification loop by re-consuming old mail.

Picking TTLs and drain windows (pragmatic defaults)

The “right” values depend on your system, but you can standardize a default policy and override per workflow.

Workflow type	Inbox TTL (active)	Drain window	Message retention (debug)	Notes
OTP / sign-up verification in CI	60 to 180 seconds	5 to 15 minutes	1 to 24 hours	Short TTL keeps tests fast, drain captures late mail and dupes for debugging
Password reset / magic link in CI	2 to 5 minutes	15 to 30 minutes	24 hours	Longer TTL if the app under test has slower async jobs
LLM agent “wait for email” tool	2 to 10 minutes	30 to 60 minutes	1 to 7 days	Agents may be interrupted, drain avoids surprises after resumption
Human-in-the-loop staging ops	15 to 60 minutes	1 to 6 hours	7 to 30 days	More slack, but be strict about retention and access

Two rules keep you honest:

TTL is your workflow SLA, not the email network’s maximum retry horizon.
Drain window is for late arrivals and duplicates, not for extending the attempt.

The “Inbox Lifecycle Controller” pattern

If you have more than one workflow, centralize lifecycle logic. You want one component that defines how inboxes are created, waited on, drained, and finalized.

A minimal controller typically does four things:

1) Provision with lifecycle metadata

When you create an inbox via an API provider, store lifecycle fields alongside your attempt:

inbox_id
email
created_at
active_until
drain_until
state

If your provider returns expiry metadata (many inbox-first APIs do), persist it, but still compute your own workflow deadlines. Treat provider expiry as a safety net, not your only control.

2) Receive email (webhook-first, polling as a fallback)

For deterministic systems, you typically want:

Webhooks for low latency and event-driven ingestion.
Polling for resilience when webhooks fail, or when running locally.

If you use Mailhook, you can receive emails as structured JSON, via real-time webhooks or a polling API, and you can validate authenticity using signed payloads. Use the canonical integration contract here: llms.txt.

Related reading (if you are implementing the hybrid pattern): Webhook-first, polling fallback.

3) Enforce deadlines in the waiter, not in the test

Your “wait for email” helper should take a deadline and stop. Never sprinkle sleep(10) in tests or agent loops.

A good waiter contract:

Stops at active_until.
Returns a clear timeout result with debug metadata.
Optionally continues ingestion into a drain queue until drain_until.

4) Finalize and cleanup deterministically

Finalization is a state transition plus optional retention decisions:

Store the minimal artifact you need (OTP, verification URL, message_id) for audit.
Drop or redact the rest.
Mark inbox state Draining then Closed.

Cleanup strategies that do not fall apart at scale

Cleanup is easy to describe and easy to get subtly wrong.

Strategy A: Time-based garbage collection (GC)

Run a periodic job that:

Finds inbox records with drain_until < now and state != Closed.
Marks them Closed.
Deletes messages older than your retention policy.

This is the simplest approach and works well if your database can handle scans by time index.

Strategy B: Event-driven cleanup with delayed jobs

When you create an inbox record, enqueue two delayed jobs:

One at active_until (transition to Draining).
One at drain_until (transition to Closed, trigger deletion).

This reduces scanning, but you must handle job retries and idempotency.

Strategy C: Opportunistic cleanup on access

Whenever a client requests inbox status or polls messages:

If now > active_until, refuse to advance the workflow.
If now > drain_until, return a tombstone response.

This helps when background jobs lag, but it should complement, not replace, GC.

Drain windows in practice: what you actually do during draining

“Draining” should be boring. You want to capture evidence and prevent mutation.

During drain:

Ingest webhooks and poll results normally, but route messages to a “late mail” path.
Deduplicate aggressively (delivery-level and message-level), because this is where duplicates show up.
Do not re-run business actions (do not click links, do not submit OTPs again).
Attach late messages as debugging artifacts to CI runs or incident tickets.

If you need to compute “did we already complete this attempt?”, your invariant is:

The attempt completion should be idempotent, and keyed by an artifact-level identifier (for example, an OTP hash or a verification link token), not by “first email seen.”

For more on dedupe and stable records, see: Dedup, normalize, store.

Security and agent-safety constraints (lifecycle edition)

Lifecycle controls are also security controls.

Do not let an agent read an inbox forever

If an agent can keep checking an inbox indefinitely, it can:

Trigger resend loops,
Consume unintended messages,
Or be prompt-injected by late, hostile content.

Deadlines and drain windows are the mechanical fix.

Verify webhook authenticity before you store anything

If you accept webhooks, verify signatures on the raw request body and fail closed. Mailhook supports signed payloads, which you should validate before persisting or dispatching downstream.

Background: Verify webhook payload authenticity.

Keep tombstones small, but keep them long enough

A tombstone record should include:

inbox_id
state = Closed
closed_at
Optional reason codes (timeout, success, cancelled)
Optional dedupe keys you still need to reject replay

Do not keep full message bodies in tombstones.

A concrete data model for lifecycle management

You can implement the controller with a single table (or collection) and strong indices.

Field	Type	Purpose
`inbox_id`	string	Provider handle for retrieval and correlation
`email`	string	The routable address you used
`attempt_id`	string	Your workflow attempt identifier
`state`	enum	Active, Draining, Closed
`active_until`	timestamp	TTL boundary
`drain_until`	timestamp	Drain boundary
`created_at`	timestamp	Observability and debugging
`closed_at`	timestamp nullable	Tombstone time

Index active_until, drain_until, and attempt_id. Most cleanup becomes “find by time, update state, delete messages.”

How Mailhook fits into this pattern

Mailhook gives you the inbox primitives you need to implement lifecycle controls cleanly:

Create disposable inboxes via API.
Receive inbound email as structured JSON.
Get real-time webhook notifications (with signed payloads).
Use polling as a fallback (and for drain/backfill paths).
Batch email processing for higher-throughput pipelines.

Your controller still owns workflow deadlines (TTL) and what you retain, but the ingestion and JSON normalization become straightforward.

If you are wiring this into an agent tool, start with the canonical contract: Mailhook llms.txt.

Frequently Asked Questions

What is the difference between an inbox TTL and message retention? Inbox TTL is the workflow deadline for “this attempt is allowed to complete.” Message retention is how long you keep email content for debugging, audit, or replay. They should be configured independently.

How long should a drain window be? Long enough to capture late deliveries and duplicates (often 5 to 30 minutes for CI-style verification flows), but short enough that you do not accidentally extend the workflow.

Should my agent keep polling during the drain window? Usually no. Let your ingestion layer keep collecting late arrivals, but prevent the agent from taking actions after active_until.

Do I need to delete inboxes, or can I just let them expire? Letting them expire is fine if you also enforce deadlines in your own code and have a cleanup policy for stored messages and metadata. A tombstone record is still helpful for deterministic “closed” behavior.

How do I prevent late emails from affecting retries in CI? Use inbox-per-attempt, enforce active_until in the waiter, and make completion idempotent at the artifact level. Late messages should be stored for debug during draining, not used to mutate state.

Build a deterministic inbox lifecycle with Mailhook

If you are building agent tools, QA automation, or verification harnesses, treat inboxes as short-lived resources with explicit TTLs, drain windows, and cleanup. Mailhook provides programmable disposable inboxes, JSON email output, signed webhooks, and polling fallback so you can implement this pattern without running mail servers.

Get the exact API semantics here: Mailhook llms.txt, then explore the platform at Mailhook.