Email is still the most common “out-of-band” channel in modern product flows: sign-up verification, OTP codes, password resets, invite links, SSO provisioning, billing notices, and operational notifications. For AI agents and automated QA, the bottleneck is rarely “receiving email” and almost always reading email deterministically: knowing which inbox to read, when a message is complete, how to dedupe retries, and how to extract a small artifact without scraping fragile HTML.
A solid read email API is therefore less about one endpoint that returns a blob, and more about a coherent set of endpoints plus semantics that make automation safe and retryable.
If you are implementing or evaluating an email-to-API layer, use this as a checklist. If you are integrating with Mailhook specifically, the canonical contract is in Mailhook llms.txt.
Start with the resource model: inbox-first, not account-first
For automation, “account-centric” email APIs (login, folders, labels, mailbox state) introduce persistent identity and long-lived state you do not want in CI or agent toolchains.
The cleaner abstraction is:
- Inbox: an isolated container with a lifecycle (created, active, expires).
- Message: a normalized record derived from a raw RFC 5322 email.
- Delivery: an event describing that a message was delivered to your system (webhook delivery attempts, polling reads, retries).
This maps to how email actually arrives: SMTP delivers a raw message (defined by RFC 5322) that may be multipart MIME (see RFC 2045), then your system parses and normalizes it.
Why “inbox-first” is the winning default for agents and CI
Agents and test runners need these invariants:
- Isolation: no cross-talk between parallel runs.
- Deterministic waiting: explicit time budgets, not fixed sleeps.
- Idempotent consumption: safe under retries and duplicate deliveries.
- Machine-readable output: JSON fields, not rendered HTML.
An inbox-first API makes those invariants straightforward.
The endpoint set you actually need
Below is a provider-agnostic endpoint map. Names vary across vendors, but the capabilities are the important part.
| Capability | Typical endpoint shape | What it must guarantee | Why it matters for automation |
|---|---|---|---|
| Create inbox | POST /inboxes |
Returns an address plus an inbox identifier and lifecycle fields | Prevents collisions and makes reads target a specific container |
| Get inbox metadata | GET /inboxes/{inbox_id} |
Lifecycle, expiry, status | Enables time budgeting, cleanup, and debugging |
| Expire or delete inbox |
POST /inboxes/{inbox_id}:expire or DELETE /inboxes/{inbox_id}
|
Clear semantics (immediate vs graceful) | Keeps test data bounded and avoids late-arrival confusion |
| List messages | GET /inboxes/{inbox_id}/messages?cursor=…&limit=… |
Stable pagination and dedupe-friendly IDs | Allows reliable polling and batch processing |
| Get message | GET /messages/{message_id} |
Full normalized JSON, optional raw source access | Enables deterministic parsing and reprocessing |
| Wait (polling convenience) | GET /inboxes/{inbox_id}/messages:wait?timeout=… |
Deadline-based wait, not sleep-based | Simplifies clients and reduces flakiness |
| Webhook delivery | POST https://your-app/webhooks/email |
At-least-once delivery, signed payload, retry policy | Low latency and scalable ingestion |
| Batch retrieval |
POST /messages:batchGet or GET /messages?ids=…
|
Bounded payload size, partial failure semantics | High-throughput CI and backfills |
Two notes that save teams months of pain:
- You need both webhooks and polling, even if you prefer one. Webhooks give latency and scale, polling is the universal fallback when webhooks fail due to firewall rules, downtime, or misconfiguration.
- “List messages” is not enough unless its pagination semantics are explicit and dedupe-friendly.

Inbox lifecycle semantics (the part most APIs under-specify)
A read email API lives or dies on lifecycle design.
Create: return an EmailWithInbox descriptor
The create response should give you more than an email string. The client needs a stable handle and timing.
Minimum fields that unlock deterministic behavior:
-
email(routable recipient) -
inbox_id(the primary handle you read from) created_at-
expires_at(oractive_until) - Optional:
webhook_url(if the provider supports per-inbox hooks)
If you only return an address, clients inevitably start “searching” globally for the right message, which breaks under concurrency.
Expire: immediate vs graceful close
Expiry is not a boolean, it is a contract.
Common production-friendly model:
- Active: new messages are accepted.
- Draining: new messages may still arrive due to SMTP delays, but the inbox is slated for closure.
- Closed: no new messages, and reads may be limited or rejected.
If you support a drain window, document it. If you do not, document that expiry is immediate. Either way, clients need to know what to expect.
Delete: who owns retention?
If an API supports deletion, specify whether deletion:
- Removes message content immediately.
- Leaves a tombstone to prevent re-creation collisions.
- Changes webhook behavior (for example, dropping deliveries vs delivering a final “closed” event).
For QA and agents, predictable cleanup is a feature, not an afterthought.
Message retrieval semantics: IDs, ordering, and pagination
Stable identifiers: message_id vs delivery_id
A robust read email API distinguishes:
-
message_id: the canonical message record in your system. -
delivery_id: a unique identifier for a delivery attempt (especially for webhook retries).
Webhooks are typically at-least-once, so retries happen. If you do not have a delivery-level id, consumers are forced to guess.
Practical rule:
- Dedupe webhook processing on
delivery_id. - Dedupe message handling on
message_id. - Dedupe extracted artifacts (OTP, link) on a derived
artifact_hash(consumer-defined if needed).
Ordering: do not promise what SMTP cannot
Unless you fully control upstream sending and transport, do not promise strict ordering. In practice:
- Multiple messages can arrive out of order.
- The same message can be observed through multiple deliveries.
Your API should either:
- Provide a
received_attimestamp and tell clients to sort, or - Provide a monotonic sequence per inbox (harder), or
- Provide opaque cursors and explicitly warn that ordering is best-effort.
Pagination: opaque cursor tokens win
For GET /inboxes/{id}/messages, prefer opaque server cursors over page numbers.
Cursor-based pagination enables:
- Consistent progress even as messages arrive.
- Easier dedupe.
- Better performance than deep offsets.
If you support filters, keep them conservative and deterministic: since, before, limit, and possibly a narrow matcher on subject or from.
For deeper polling mechanics (cursors, deadlines, dedupe), see Mailhook’s related guide: Pull Email with Polling: Cursors, Timeouts, and Dedupe.
Waiting semantics: avoid “sleep(10)” as an API strategy
Teams often implement email tests like this:
await triggerEmail();
await sleep(10_000);
const msg = await listMessages();
It fails under real conditions (delays, retries, rate limits) and wastes time when delivery is fast.
A good read email API supports deadline-based waiting in one of two ways:
Option A: long-poll endpoint
A dedicated wait endpoint can block for up to timeout:
GET /inboxes/{inbox_id}/messages:wait?timeout=25s
Semantics to document:
- The server returns immediately if a matching message exists.
- The server may hold the connection until timeout.
- The response includes either a message (or message summary) or an explicit “no message yet” outcome.
Option B: client-side polling with explicit deadlines
If you only have list endpoints, publish recommended client semantics:
- Per-request timeout (network).
- Overall deadline (workflow).
- Backoff strategy.
- A cursor or “seen set” approach to avoid reprocessing.
Mailhook’s platform guidance generally follows “webhook-first with polling fallback.” A dedicated explanation is in Temp Email Receive: Webhook-First, Polling Fallback.
Webhook semantics: authenticity, retries, and idempotency
A webhook is part of your read email API surface area, so it needs semantics like any other endpoint.
Delivery guarantees: at-least-once by default
Most systems should assume:
- Delivery is at-least-once.
- Retries can occur minutes later.
- Your handler must be idempotent.
Define:
- Retry window (how long retries may happen).
- Backoff schedule (even approximate).
- Whether multiple event types exist (received, parsed, attachment_ready).
Authenticity: sign the payload you send
Email-level authenticity signals (SPF, DKIM, DMARC) do not prove that an HTTP webhook request is authentic.
Your webhook should be verifiable using:
- A signature header computed over the raw request body.
- A timestamp header with an allowed clock skew.
- A delivery identifier for replay detection.
If you want a concrete checklist, see: Email Signed By: Verify Webhook Payload Authenticity.
“Ack fast, process async” is not optional
Webhook handlers should:
- Verify signature.
- Enqueue work.
- Return 2xx quickly.
Long-running parsing or link fetching inside the handler increases retries and duplicate pressure.
JSON output semantics: what “read email” should mean
A read email API is valuable because it normalizes a messy input format into a stable contract.
At minimum, your JSON should separate:
- Identity: message ids, inbox id, received timestamps.
- Routing: envelope recipient, to/cc, from.
-
Content: subject,
textandhtmlbodies, headers. - Artifacts: extracted OTPs and verification links (optional but very useful).
- Provenance: raw source reference, parsing warnings.
A concise schema proposal for agents and QA is covered in Email to JSON: A Minimal Schema for Agents and QA.
Important semantic choices to document
1) text/plain vs text/html precedence: Automation should prefer text/plain when available. If you expose HTML, treat it as untrusted.
2) Header normalization rules: Duplicate headers exist. Folded headers exist. Normalization must be deterministic.
3) Attachment handling: Decide whether attachments are embedded (base64) or referenced by URL. If referenced, specify URL lifetime and auth.
Error semantics and status codes: make failures actionable
Email workflows fail in many ways: the sender never sent, SMTP delayed, inbox expired, webhook rejected, polling rate-limited.
Make errors explicit and machine-actionable.
| Scenario | Recommended HTTP status | Recommended error shape | Client behavior |
|---|---|---|---|
| Inbox not found | 404 | code: inbox_not_found |
Fail fast, likely a bug |
| Inbox expired | 410 |
code: inbox_gone, include expired_at
|
Provision a new inbox, do not retry |
| Poll too frequent | 429 |
code: rate_limited, include retry_after_ms
|
Back off and continue |
| Cursor invalid | 400 | code: invalid_cursor |
Reset cursor, restart list |
| Webhook signature invalid | 401 or 403 | code: invalid_signature |
Reject and alert |
| Message too large / truncated | 413 or 422 | code: message_too_large |
Use raw reference or enforce limits |
For agents, consider returning a safe “summary” even on partial parsing, but ensure it is clearly marked.
Semantics that matter specifically for LLM agents
If you are building a read email API “for agents,” the biggest risk is not parsing, it is tool safety.
Treat inbound email as hostile input
An agent should not be given:
- Raw HTML that can contain prompt injection instructions.
- Arbitrary links without validation.
- Full message bodies when only an OTP is needed.
A better pattern:
- Provide a minimized view (subject, from, received_at, extracted artifacts).
- Keep raw content available for debugging, but behind explicit, audited access.
Link validation should be part of the consumer contract
If you extract verification links, specify that clients should validate:
- Host allowlist.
- HTTPS scheme.
- No open redirects.
- No internal IP targets (SSRF defense).
This is especially important when an LLM is deciding what to click.
Batch semantics: the difference between “works” and “scales”
Once you run parallel CI or agent fleets, you will want batch capabilities.
Batch semantics to get right:
- Bounded batch size limits.
- Partial success with per-item errors.
- Deterministic ordering rules (or explicit “unordered”).
- Idempotency for batch processing.
Mailhook supports batch email processing (see the feature list in Mailhook llms.txt), which is useful when you ingest at high throughput or replay messages into a downstream system.
A practical “minimum viable” read email API contract
If you want a tight spec that covers most real-world automation, implement these primitives:
- Create inbox (returns email + inbox_id + expires_at).
- List messages in inbox (cursor-based).
- Get message by id (normalized JSON; raw source optional).
- Webhook delivery (signed, at-least-once).
- Explicit expiry and cleanup.
Then publish these semantics clearly:
- Isolation expectations (inbox-per-run or inbox-per-attempt).
- Delivery guarantees and dedupe keys.
- Pagination and ordering assumptions.
- Retention and lifecycle rules.
- Security model for webhooks and agent-facing content.

How Mailhook maps to this endpoint and semantics checklist
Mailhook is built around exactly this inbox-first model:
- Disposable inbox creation via API
- Emails delivered as structured JSON
- REST API access
- Real-time webhook notifications
- Polling API as a fallback
- Shared domains for fast setup, plus custom domain support
- Signed payloads for webhook security
- Batch email processing
If you want the exact endpoints, request/response fields, and signature headers, use the canonical integration reference: Mailhook llms.txt.
For implementation patterns that pair well with these semantics (verification flows, retry-safe tests, webhook-first ingestion), these guides are useful next reads:
- Get Email via API: Retrieve Messages as JSON
- Sign Up Verification Emails: Make Tests Retry-Safe
- Open an Email Programmatically: From Raw to JSON
A final design rule: specify semantics before you ship endpoints
Teams can tolerate different URL shapes. They cannot tolerate ambiguous behavior.
Before you expose a read email API to agents or CI, write down:
- What “delivered” means.
- What identifiers are stable.
- How retries behave.
- How clients should wait.
- How to verify authenticity.
Do that, and your “read email API” becomes a dependable primitive that automation can build on, rather than another flaky integration point.