Inbox TTLs and Cleanup Rules for Email Automation

Temporary inboxes make email automation deterministic, but only when their lifetime is explicit. If an inbox lives too long, stale messages can be selected by the wrong test run or agent. If it expires too aggressively, legitimate verification emails arrive after the workflow has already failed. Good inbox TTLs and cleanup rules sit between those two failure modes.

For CI, QA automation, signup verification, and LLM agents, the goal is not simply to “delete old inboxes.” The goal is to define a lifecycle contract: when an inbox can receive mail, how long code should wait, what happens to late messages, and what data remains for debugging after cleanup.

Mailhook provides the inbox layer for this pattern: programmable disposable inboxes via API, structured JSON email output, REST access, webhooks, polling, shared domains, custom domain support, signed payloads, and batch-oriented workflows. For exact integration semantics, keep the Mailhook llms.txt reference close to your implementation.

A simple lifecycle timeline for a temporary email inbox with four labeled stages: created, active, cleanup, and retained metadata. The timeline also shows a shorter wait deadline inside the active period and a separate retention period for debug records.

TTL is not one clock, it is several clocks

A common mistake is treating “TTL” as a single number. In email automation, several time windows matter, and they should not all be equal.

Clock	What it controls	Typical owner	Why it matters
Wait deadline	How long a test, job, or agent waits for a matching message	Test harness or agent tool	Prevents fixed sleeps and endless loops
Active inbox TTL	How long the temporary inbox is considered valid for the workflow	Inbox controller	Reduces stale messages and orphaned resources
Drain window	How long late or duplicate deliveries are tolerated after success or failure	Ingestion pipeline	Handles SMTP delay, webhook retries, and resend behavior
Raw message retention	How long full email content is kept for debugging	Data policy	Limits privacy and compliance risk
Tombstone retention	How long minimal metadata is kept after cleanup	Dedupe and audit layer	Prevents reprocessing and preserves traceability

The wait deadline should usually be the shortest. A CI test should not wait 30 minutes just because the inbox TTL is 30 minutes. The active inbox TTL is a maximum lease, not the normal wait budget.

For example, a signup verification test might wait 90 seconds for a message, keep the inbox active for 10 minutes, tolerate a short drain window for late duplicates, retain normalized JSON for the CI artifact window, and keep only a tombstone after raw content is purged.

Start with the workflow, not the provider default

Inbox TTLs should come from the business or automation workflow. A password reset email, a one-time passcode, a vendor integration confirmation, and an LLM agent task all have different tolerance for delay.

Use these values as starting points, then tune them with delivery telemetry from your own system.

Workflow	Wait deadline	Active inbox TTL	Cleanup trigger	Retention guidance
CI signup or OTP test	60 to 180 seconds	10 to 30 minutes	Success, terminal failure, or job end	Keep minimal JSON and logs long enough to debug failed CI runs
Local QA or manual staging test	2 to 10 minutes	30 to 60 minutes	End of manual session	Retain only what the team needs for debugging
Third-party SaaS integration test	2 to 15 minutes	1 to 24 hours	Integration attempt completed or abandoned	Keep structured records for incident review, purge raw content sooner when possible
LLM agent verification task	60 to 300 seconds	Task budget plus small buffer	Successful artifact extraction or agent budget exhaustion	Expose minimal fields to the model, retain audit metadata outside the prompt
High-volume batch verification	Per item deadline based on SLA	Batch duration plus drain buffer	Batch completed and queue drained	Store dedupe keys and summary outcomes, minimize message bodies

The important distinction is that cleanup should be tied to state, not only to time. If a verification code has been consumed successfully, there is usually no reason to keep the inbox active until the maximum TTL. Close or retire it early, then keep only the records you need for idempotency and debugging.

A practical inbox lease model

The cleanest implementation is to treat every temporary inbox as a lease. The lease is created for a specific workflow attempt and contains enough metadata to enforce cleanup later.

A useful lease record includes:

Field	Purpose
`inbox_id`	Stable handle for API calls, webhooks, polling, and logs
`email`	Address passed to the application or third-party service
`owner_type`	CI job, QA session, agent run, batch job, or client operation
`owner_id`	Run ID, attempt ID, tenant ID, or agent task ID
`state`	Created, active, consumed, failed, draining, closed, or purged
`wait_until`	Deadline for the consumer waiting for a message
`active_until`	Maximum valid lifetime of the inbox for this attempt
`cleanup_after`	When the cleanup worker should close or retire the inbox
`raw_retention_until`	When raw bodies and full content should be removed
`tombstone_until`	When minimal metadata can be removed

This is the same reason inbox-first APIs are more reliable than bare disposable email addresses. A bare address gives you a string. A lease gives you operational semantics.

When using a temporary email API such as Mailhook, your automation should store the inbox descriptor returned by the API, not just the email address. That descriptor is what lets you route webhooks by inbox_id, poll the right inbox, dedupe correctly, and clean up the right resource.

Cleanup rules that prevent flakes

A good cleanup policy is boring, deterministic, and easy to audit. It should not depend on whether a developer remembered to call cleanup in one specific test file.

Use these rules as a baseline:

Close on success: Once the expected OTP, magic link, or verification artifact has been consumed, mark the inbox as consumed and stop waiting for more messages.
Close on terminal failure: If the workflow times out, the CI job fails, or the agent task exhausts its budget, mark the inbox failed and schedule cleanup.
Sweep orphaned inboxes: A background worker should close active inboxes whose active_until has passed, even if the original worker crashed.
Keep tombstones before purging everything: Retain minimal metadata such as inbox_id, owner_id, message IDs, artifact hashes, state, and timestamps for a short dedupe and audit window.
Purge raw content separately: Full message bodies, HTML, attachments, and headers may deserve a shorter retention window than metadata.
Never re-open by default: If a late message arrives after cleanup, record it as late or ignored rather than reviving a finished workflow.

That last rule is important for LLM agents. Agents can retry, ask for resends, or call tools repeatedly. If your inbox layer silently reopens closed inboxes, the agent can drift into a loop where old verification emails affect new actions.

Webhook and polling behavior after expiry

Temporary inbox cleanup must account for delivery mechanics. Email and webhook systems are typically at-least-once. That means duplicate deliveries and late events are normal, not exceptional.

If you use real-time webhooks, verify the signed payload before doing anything else. Then check whether the inbox is still active, consumed, draining, or closed. A closed inbox should not trigger a new verification attempt. It can be acknowledged and recorded as a late delivery, which avoids unnecessary webhook retries.

If you use polling as a fallback, the poller should stop at wait_until, not at active_until. The active TTL is a lifecycle boundary for the inbox. The wait deadline is the workflow boundary for the consumer.

A robust consumer usually follows this shape:

async function runEmailVerificationAttempt(ctx) {
  const lease = await createTemporaryInboxLease({
    owner_type: "ci_test",
    owner_id: ctx.runId,
    wait_seconds: 120,
    active_seconds: 900
  })

  await triggerVerificationEmail(lease.email)

  const message = await waitForMatchingMessage({
    inbox_id: lease.inbox_id,
    until: lease.wait_until,
    matcher: { purpose: "signup_verification", run_id: ctx.runId }
  })

  const artifact = extractVerificationArtifact(message)
  await consumeArtifactOnce({ artifact, inbox_id: lease.inbox_id })

  await markInboxConsumed(lease.inbox_id)
  await scheduleCleanup(lease.inbox_id)

  return artifact
}

The exact API calls depend on your provider and integration design. With Mailhook, check the machine-readable integration details in llms.txt, then combine API-created disposable inboxes with structured JSON email output, signed webhooks, or polling fallback.

Cleanup for structured JSON emails

A major advantage of receiving structured JSON emails is that cleanup can be more precise. You do not need to keep raw MIME or rendered HTML forever just to know whether a test passed.

A practical storage model separates data into three layers:

Layer	Example content	Cleanup rule
Raw source	Original message, full headers, HTML, attachments	Keep briefly for debugging, then purge aggressively
Normalized JSON	Parsed sender, recipient, subject, text body, timestamps, provider IDs	Keep for CI artifacts or incident review according to policy
Derived artifact	OTP, magic link host, artifact hash, consumption state	Keep long enough for idempotency and audit, avoid storing secrets longer than needed

For automated tests, you often need the derived artifact only until the assertion completes. For compliance-sensitive environments, store a hash or redacted representation rather than the OTP or full link.

For LLM agents, the minimized view matters even more. The model rarely needs the full email. It usually needs a typed result such as:

{
  "inbox_id": "inb_123",
  "message_id": "msg_456",
  "artifact_type": "otp",
  "otp": "123456",
  "received_at": "2026-05-16T21:10:00Z"
}

Do not expose raw HTML, tracking links, untrusted instructions, or unrelated email content to the agent unless there is a strong reason. Treat inbound email as untrusted input.

TTL rules for LLM agents

LLM agents add a lifecycle problem that normal CI does not have: autonomy. A test runner fails when the deadline passes. An agent may try alternative actions, request another code, or repeat a tool call unless the tool contract prevents it.

Your agent-facing tool should encode limits directly:

max_wait_seconds caps each wait operation.
max_ttl_seconds caps how long a created inbox can remain active.
max_resend_attempts prevents bot loops.
close_after_success makes cleanup automatic after artifact extraction.
allowed_link_hosts prevents arbitrary URL following.
visible_fields restricts what email data the model can see.

These limits should be enforced by code, not described only in the system prompt. The prompt can tell the agent what is allowed, but the API should make unsafe behavior impossible or at least bounded.

Observability: how to know your TTLs are right

TTL tuning should be based on delivery data. If your inboxes routinely receive valid messages after the wait deadline, your waits are too short or the upstream email sender is slow. If inboxes remain active long after success, your cleanup is too lazy.

Track these metrics:

Metric	What it tells you
`email_wait_duration_seconds`	How long successful workflows wait before the matching message arrives
`email_wait_timeout_total`	How often no matching message arrives before the deadline
`inbox_orphaned_total`	How often cleanup depends on the sweeper rather than normal workflow completion
`late_message_total`	How many messages arrive after consumption, failure, or closure
`duplicate_delivery_total`	Whether webhook retries or polling overlap are causing repeated processing
`raw_message_purge_lag_seconds`	Whether sensitive content is being retained longer than policy allows

Also log stable identifiers, not entire emails. Useful debug fields include inbox_id, owner_id, message_id, delivery_id, state transitions, timestamps, and artifact hashes. Avoid logging full OTPs, magic links, raw HTML, or personal data unless your debugging process truly requires them.

Common TTL and cleanup mistakes

Mistake	Better rule
Reusing one inbox across retries	Create one disposable inbox per attempt or per clearly bounded workflow
Waiting until the inbox TTL expires	Use a shorter wait deadline and fail deterministically
Deleting everything immediately	Keep tombstone metadata long enough for dedupe and debugging
Keeping raw email forever	Retain raw content briefly, then keep redacted JSON or metadata
Letting agents request unlimited resends	Enforce resend budgets and close inboxes after success
Processing webhooks without checking state	Verify signature, dedupe, then check lifecycle state before processing

The best cleanup policy is one your team can reason about during an incident. If a CI run failed, you should be able to answer: which inbox was used, when it was created, what message arrived, whether the artifact was consumed, and when the inbox was cleaned up.

How Mailhook fits into the pattern

Mailhook is designed for developers and agent builders who need email as an automation primitive rather than a human mailbox. You can create disposable inboxes via API, receive emails as structured JSON, consume messages through real-time webhooks or polling, use shared domains for quick starts, and bring custom domains when your workflows require more control.

For lifecycle design, place Mailhook inside a small inbox controller in your own system. That controller owns your workflow-specific rules: wait deadlines, active TTLs, cleanup triggers, retention windows, and agent limits. Mailhook supplies the programmable inbox and machine-readable email layer, while your controller applies the policy that matches your CI, QA, or LLM agent workflow.

If you are designing the integration for an agent, use the Mailhook llms.txt reference as the canonical machine-readable guide, and expose only narrow tools such as create_inbox, wait_for_message, extract_verification_artifact, and close_or_cleanup_inbox.

Frequently Asked Questions

What is a good default TTL for temporary inboxes? For CI verification flows, a practical starting point is a short wait deadline of 60 to 180 seconds and an active inbox TTL of 10 to 30 minutes. Tune both values with real delivery data.

Should cleanup happen immediately after receiving the email? Usually cleanup should happen after the required artifact has been consumed, not merely after any email arrives. This prevents closing the inbox because of an unrelated or duplicate message.

How is inbox TTL different from email retention? Inbox TTL controls whether the inbox remains valid for the workflow. Retention controls how long raw messages, structured JSON, derived artifacts, and metadata are stored after the workflow ends.

What should happen to emails that arrive after the inbox is closed? Treat them as late arrivals. Verify and dedupe the event, record minimal metadata if useful, but do not re-open the workflow by default.

Do LLM agents need different cleanup rules? Yes. Agents need stricter limits because they can retry or request new messages autonomously. Enforce max wait times, max TTLs, resend budgets, link constraints, and minimal message exposure in code.

Build disposable inbox lifecycles that stay deterministic

Email automation becomes reliable when inboxes are created, consumed, and cleaned up under an explicit policy. TTLs define the lease, cleanup rules prevent stale state, and structured JSON keeps tests and agents away from brittle mailbox scraping.

If you are building CI verification, QA automation, or LLM-agent email workflows, try Mailhook to create programmable temp inboxes via API and receive emails as JSON through webhooks or polling. Start with the Mailhook llms.txt integration reference, then encode your TTL, cleanup, and retention rules in a small controller your whole automation stack can trust.