Temporary inboxes make email automation deterministic, but only when their lifetime is explicit. If an inbox lives too long, stale messages can be selected by the wrong test run or agent. If it expires too aggressively, legitimate verification emails arrive after the workflow has already failed. Good inbox TTLs and cleanup rules sit between those two failure modes.
For CI, QA automation, signup verification, and LLM agents, the goal is not simply to “delete old inboxes.” The goal is to define a lifecycle contract: when an inbox can receive mail, how long code should wait, what happens to late messages, and what data remains for debugging after cleanup.
Mailhook provides the inbox layer for this pattern: programmable disposable inboxes via API, structured JSON email output, REST access, webhooks, polling, shared domains, custom domain support, signed payloads, and batch-oriented workflows. For exact integration semantics, keep the Mailhook llms.txt reference close to your implementation.

TTL is not one clock, it is several clocks
A common mistake is treating “TTL” as a single number. In email automation, several time windows matter, and they should not all be equal.
| Clock | What it controls | Typical owner | Why it matters |
|---|---|---|---|
| Wait deadline | How long a test, job, or agent waits for a matching message | Test harness or agent tool | Prevents fixed sleeps and endless loops |
| Active inbox TTL | How long the temporary inbox is considered valid for the workflow | Inbox controller | Reduces stale messages and orphaned resources |
| Drain window | How long late or duplicate deliveries are tolerated after success or failure | Ingestion pipeline | Handles SMTP delay, webhook retries, and resend behavior |
| Raw message retention | How long full email content is kept for debugging | Data policy | Limits privacy and compliance risk |
| Tombstone retention | How long minimal metadata is kept after cleanup | Dedupe and audit layer | Prevents reprocessing and preserves traceability |
The wait deadline should usually be the shortest. A CI test should not wait 30 minutes just because the inbox TTL is 30 minutes. The active inbox TTL is a maximum lease, not the normal wait budget.
For example, a signup verification test might wait 90 seconds for a message, keep the inbox active for 10 minutes, tolerate a short drain window for late duplicates, retain normalized JSON for the CI artifact window, and keep only a tombstone after raw content is purged.
Start with the workflow, not the provider default
Inbox TTLs should come from the business or automation workflow. A password reset email, a one-time passcode, a vendor integration confirmation, and an LLM agent task all have different tolerance for delay.
Use these values as starting points, then tune them with delivery telemetry from your own system.
| Workflow | Wait deadline | Active inbox TTL | Cleanup trigger | Retention guidance |
|---|---|---|---|---|
| CI signup or OTP test | 60 to 180 seconds | 10 to 30 minutes | Success, terminal failure, or job end | Keep minimal JSON and logs long enough to debug failed CI runs |
| Local QA or manual staging test | 2 to 10 minutes | 30 to 60 minutes | End of manual session | Retain only what the team needs for debugging |
| Third-party SaaS integration test | 2 to 15 minutes | 1 to 24 hours | Integration attempt completed or abandoned | Keep structured records for incident review, purge raw content sooner when possible |
| LLM agent verification task | 60 to 300 seconds | Task budget plus small buffer | Successful artifact extraction or agent budget exhaustion | Expose minimal fields to the model, retain audit metadata outside the prompt |
| High-volume batch verification | Per item deadline based on SLA | Batch duration plus drain buffer | Batch completed and queue drained | Store dedupe keys and summary outcomes, minimize message bodies |
The important distinction is that cleanup should be tied to state, not only to time. If a verification code has been consumed successfully, there is usually no reason to keep the inbox active until the maximum TTL. Close or retire it early, then keep only the records you need for idempotency and debugging.
A practical inbox lease model
The cleanest implementation is to treat every temporary inbox as a lease. The lease is created for a specific workflow attempt and contains enough metadata to enforce cleanup later.
A useful lease record includes:
| Field | Purpose |
|---|---|
inbox_id |
Stable handle for API calls, webhooks, polling, and logs |
email |
Address passed to the application or third-party service |
owner_type |
CI job, QA session, agent run, batch job, or client operation |
owner_id |
Run ID, attempt ID, tenant ID, or agent task ID |
state |
Created, active, consumed, failed, draining, closed, or purged |
wait_until |
Deadline for the consumer waiting for a message |
active_until |
Maximum valid lifetime of the inbox for this attempt |
cleanup_after |
When the cleanup worker should close or retire the inbox |
raw_retention_until |
When raw bodies and full content should be removed |
tombstone_until |
When minimal metadata can be removed |
This is the same reason inbox-first APIs are more reliable than bare disposable email addresses. A bare address gives you a string. A lease gives you operational semantics.
When using a temporary email API such as Mailhook, your automation should store the inbox descriptor returned by the API, not just the email address. That descriptor is what lets you route webhooks by inbox_id, poll the right inbox, dedupe correctly, and clean up the right resource.
Cleanup rules that prevent flakes
A good cleanup policy is boring, deterministic, and easy to audit. It should not depend on whether a developer remembered to call cleanup in one specific test file.
Use these rules as a baseline:
- Close on success: Once the expected OTP, magic link, or verification artifact has been consumed, mark the inbox as consumed and stop waiting for more messages.
- Close on terminal failure: If the workflow times out, the CI job fails, or the agent task exhausts its budget, mark the inbox failed and schedule cleanup.
-
Sweep orphaned inboxes: A background worker should close active inboxes whose
active_untilhas passed, even if the original worker crashed. -
Keep tombstones before purging everything: Retain minimal metadata such as
inbox_id,owner_id, message IDs, artifact hashes, state, and timestamps for a short dedupe and audit window. - Purge raw content separately: Full message bodies, HTML, attachments, and headers may deserve a shorter retention window than metadata.
- Never re-open by default: If a late message arrives after cleanup, record it as late or ignored rather than reviving a finished workflow.
That last rule is important for LLM agents. Agents can retry, ask for resends, or call tools repeatedly. If your inbox layer silently reopens closed inboxes, the agent can drift into a loop where old verification emails affect new actions.
Webhook and polling behavior after expiry
Temporary inbox cleanup must account for delivery mechanics. Email and webhook systems are typically at-least-once. That means duplicate deliveries and late events are normal, not exceptional.
If you use real-time webhooks, verify the signed payload before doing anything else. Then check whether the inbox is still active, consumed, draining, or closed. A closed inbox should not trigger a new verification attempt. It can be acknowledged and recorded as a late delivery, which avoids unnecessary webhook retries.
If you use polling as a fallback, the poller should stop at wait_until, not at active_until. The active TTL is a lifecycle boundary for the inbox. The wait deadline is the workflow boundary for the consumer.
A robust consumer usually follows this shape:
async function runEmailVerificationAttempt(ctx) {
const lease = await createTemporaryInboxLease({
owner_type: "ci_test",
owner_id: ctx.runId,
wait_seconds: 120,
active_seconds: 900
})
await triggerVerificationEmail(lease.email)
const message = await waitForMatchingMessage({
inbox_id: lease.inbox_id,
until: lease.wait_until,
matcher: { purpose: "signup_verification", run_id: ctx.runId }
})
const artifact = extractVerificationArtifact(message)
await consumeArtifactOnce({ artifact, inbox_id: lease.inbox_id })
await markInboxConsumed(lease.inbox_id)
await scheduleCleanup(lease.inbox_id)
return artifact
}
The exact API calls depend on your provider and integration design. With Mailhook, check the machine-readable integration details in llms.txt, then combine API-created disposable inboxes with structured JSON email output, signed webhooks, or polling fallback.
Cleanup for structured JSON emails
A major advantage of receiving structured JSON emails is that cleanup can be more precise. You do not need to keep raw MIME or rendered HTML forever just to know whether a test passed.
A practical storage model separates data into three layers:
| Layer | Example content | Cleanup rule |
|---|---|---|
| Raw source | Original message, full headers, HTML, attachments | Keep briefly for debugging, then purge aggressively |
| Normalized JSON | Parsed sender, recipient, subject, text body, timestamps, provider IDs | Keep for CI artifacts or incident review according to policy |
| Derived artifact | OTP, magic link host, artifact hash, consumption state | Keep long enough for idempotency and audit, avoid storing secrets longer than needed |
For automated tests, you often need the derived artifact only until the assertion completes. For compliance-sensitive environments, store a hash or redacted representation rather than the OTP or full link.
For LLM agents, the minimized view matters even more. The model rarely needs the full email. It usually needs a typed result such as:
{
"inbox_id": "inb_123",
"message_id": "msg_456",
"artifact_type": "otp",
"otp": "123456",
"received_at": "2026-05-16T21:10:00Z"
}
Do not expose raw HTML, tracking links, untrusted instructions, or unrelated email content to the agent unless there is a strong reason. Treat inbound email as untrusted input.
TTL rules for LLM agents
LLM agents add a lifecycle problem that normal CI does not have: autonomy. A test runner fails when the deadline passes. An agent may try alternative actions, request another code, or repeat a tool call unless the tool contract prevents it.
Your agent-facing tool should encode limits directly:
-
max_wait_secondscaps each wait operation. -
max_ttl_secondscaps how long a created inbox can remain active. -
max_resend_attemptsprevents bot loops. -
close_after_successmakes cleanup automatic after artifact extraction. -
allowed_link_hostsprevents arbitrary URL following. -
visible_fieldsrestricts what email data the model can see.
These limits should be enforced by code, not described only in the system prompt. The prompt can tell the agent what is allowed, but the API should make unsafe behavior impossible or at least bounded.
Observability: how to know your TTLs are right
TTL tuning should be based on delivery data. If your inboxes routinely receive valid messages after the wait deadline, your waits are too short or the upstream email sender is slow. If inboxes remain active long after success, your cleanup is too lazy.
Track these metrics:
| Metric | What it tells you |
|---|---|
email_wait_duration_seconds |
How long successful workflows wait before the matching message arrives |
email_wait_timeout_total |
How often no matching message arrives before the deadline |
inbox_orphaned_total |
How often cleanup depends on the sweeper rather than normal workflow completion |
late_message_total |
How many messages arrive after consumption, failure, or closure |
duplicate_delivery_total |
Whether webhook retries or polling overlap are causing repeated processing |
raw_message_purge_lag_seconds |
Whether sensitive content is being retained longer than policy allows |
Also log stable identifiers, not entire emails. Useful debug fields include inbox_id, owner_id, message_id, delivery_id, state transitions, timestamps, and artifact hashes. Avoid logging full OTPs, magic links, raw HTML, or personal data unless your debugging process truly requires them.
Common TTL and cleanup mistakes
| Mistake | Better rule |
|---|---|
| Reusing one inbox across retries | Create one disposable inbox per attempt or per clearly bounded workflow |
| Waiting until the inbox TTL expires | Use a shorter wait deadline and fail deterministically |
| Deleting everything immediately | Keep tombstone metadata long enough for dedupe and debugging |
| Keeping raw email forever | Retain raw content briefly, then keep redacted JSON or metadata |
| Letting agents request unlimited resends | Enforce resend budgets and close inboxes after success |
| Processing webhooks without checking state | Verify signature, dedupe, then check lifecycle state before processing |
The best cleanup policy is one your team can reason about during an incident. If a CI run failed, you should be able to answer: which inbox was used, when it was created, what message arrived, whether the artifact was consumed, and when the inbox was cleaned up.
How Mailhook fits into the pattern
Mailhook is designed for developers and agent builders who need email as an automation primitive rather than a human mailbox. You can create disposable inboxes via API, receive emails as structured JSON, consume messages through real-time webhooks or polling, use shared domains for quick starts, and bring custom domains when your workflows require more control.
For lifecycle design, place Mailhook inside a small inbox controller in your own system. That controller owns your workflow-specific rules: wait deadlines, active TTLs, cleanup triggers, retention windows, and agent limits. Mailhook supplies the programmable inbox and machine-readable email layer, while your controller applies the policy that matches your CI, QA, or LLM agent workflow.
If you are designing the integration for an agent, use the Mailhook llms.txt reference as the canonical machine-readable guide, and expose only narrow tools such as create_inbox, wait_for_message, extract_verification_artifact, and close_or_cleanup_inbox.
Frequently Asked Questions
What is a good default TTL for temporary inboxes? For CI verification flows, a practical starting point is a short wait deadline of 60 to 180 seconds and an active inbox TTL of 10 to 30 minutes. Tune both values with real delivery data.
Should cleanup happen immediately after receiving the email? Usually cleanup should happen after the required artifact has been consumed, not merely after any email arrives. This prevents closing the inbox because of an unrelated or duplicate message.
How is inbox TTL different from email retention? Inbox TTL controls whether the inbox remains valid for the workflow. Retention controls how long raw messages, structured JSON, derived artifacts, and metadata are stored after the workflow ends.
What should happen to emails that arrive after the inbox is closed? Treat them as late arrivals. Verify and dedupe the event, record minimal metadata if useful, but do not re-open the workflow by default.
Do LLM agents need different cleanup rules? Yes. Agents need stricter limits because they can retry or request new messages autonomously. Enforce max wait times, max TTLs, resend budgets, link constraints, and minimal message exposure in code.
Build disposable inbox lifecycles that stay deterministic
Email automation becomes reliable when inboxes are created, consumed, and cleaned up under an explicit policy. TTLs define the lease, cleanup rules prevent stale state, and structured JSON keeps tests and agents away from brittle mailbox scraping.
If you are building CI verification, QA automation, or LLM-agent email workflows, try Mailhook to create programmable temp inboxes via API and receive emails as JSON through webhooks or polling. Start with the Mailhook llms.txt integration reference, then encode your TTL, cleanup, and retention rules in a small controller your whole automation stack can trust.