Email is one of the last “human-first” surfaces many systems still depend on. But if you’re building an AI agent, an LLM toolchain, or a QA harness, you eventually need to open an email programmatically, extract just the useful artifacts (OTP, magic link, invoice ID, reset URL), and move on.
The hard part is that email arrives as a messy, decades-old stack of standards: RFC 5322 headers, MIME multipart bodies, odd encodings, and HTML that was never meant to be parsed by tests (or agents). This guide walks through what “raw email” actually is, why it’s tricky, and how to reliably convert it into a JSON shape your automation can trust.
What it means to “open an email” programmatically
When humans “open an email,” the email client quietly does a lot of work:
- Parses the message format (headers plus body)
- Decodes transfer encodings (base64, quoted-printable)
- Picks a body to display (usually text/plain or HTML)
- Unpacks attachments
- Normalizes dates, addresses, and character sets
Programmatically, you need to decide what “open” means for your workflow. For automation, “open” usually means:
- Locate the right message deterministically (no brittle mailbox searches)
- Parse and normalize it into a stable schema
- Extract a small, verifiable artifact (OTP, link, token)
- Log enough to debug failures without leaking sensitive content
A good mental model is: treat email like an untrusted inbound event, not like a document.
Raw email, the formats you actually receive
Most systems ultimately represent an email as a raw RFC 5322 message: a blob of text and bytes composed of headers and a body. If you need the standards references, start with RFC 5322 (message format) and the MIME family like RFC 2045 (MIME basics).
A “raw” message typically includes:
-
Headers: key/value pairs like
From,To,Subject,Date,Message-ID, plus many others - Body: sometimes plain text, often HTML, frequently multipart with boundaries
- Attachments: represented as MIME parts, commonly base64 encoded
MIME is why “just parse the body” fails
If you only ever saw plain text emails, parsing would be easy. In practice:
- Many messages are
multipart/alternative(both text/plain and text/html) - Some are
multipart/mixed(body plus attachments) - Some contain nested multiparts
- Bodies can be encoded (quoted-printable, base64)
- Character sets vary (UTF-8, ISO-8859-1, and more)
This is why regexing HTML or splitting on blank lines becomes fragile quickly.
From raw to JSON: a normalization pipeline that holds up in automation
A robust “raw to JSON” pipeline has a few clear stages. This is implementation-agnostic: you can do it with a library in your own service, or consume JSON produced by an inbox API.

Stage 1: Parse structure (headers, MIME tree)
At this stage you want to:
- Parse headers safely (handle folded headers, duplicates)
- Build a MIME tree of parts
- Identify candidate bodies (text/plain, text/html)
- Identify attachments (filename, content-type, size)
Stage 2: Decode and normalize
Normalization is where most automation reliability comes from:
- Decode transfer encodings (quoted-printable, base64)
- Normalize line endings
- Convert text to a consistent Unicode representation
- Parse
Dateinto an ISO timestamp (but keep the raw value for debugging) - Normalize address fields into structured objects (name, address)
Stage 3: Choose and sanitize content
For automation and agents, prefer predictable content:
- Prefer text/plain when available
- Keep HTML, but treat it as secondary (good for rendering, risky for parsing)
- Remove or ignore dangerous elements (scripts, weird redirects)
Stage 4: Extract automation artifacts
Instead of “understanding the whole email,” extract what your workflow needs:
- Verification links (and the final target host allowlist)
- OTP candidates (with tight patterns and context checks)
- Key identifiers (order ID, ticket ID)
Stage 5: Emit JSON with stable fields
Your JSON output should support:
- Deterministic matching (message_id, inbox_id, correlation IDs)
- Simple assertions (subject contains, from domain equals)
- Minimal artifact extraction (otp, verification_url)
- Debuggability (raw headers snapshot, received timestamp)
Here’s a helpful way to think about mapping raw email to JSON fields.
| Raw email element | What it looks like | JSON you want for automation | Why it matters |
|---|---|---|---|
| Message-ID header | Message-ID: <abc@domain> |
message_id |
Deduplication and idempotency |
| Date header | Date: Tue, 30 Jan... |
received_at (ISO), date_raw
|
Timing assertions, debugging delays |
| From/To | RFC 5322 address forms |
from: {name, address}, to: [...]
|
Reliable sender checks |
| MIME parts | multipart boundaries |
text, html, attachments[]
|
Avoid parsing the wrong part |
| Transfer encoding | base64, quoted-printable | decoded strings and bytes | Prevent garbage output |
| Links in body | HTML anchors, plain URLs |
links[] (normalized) |
Safer magic-link handling |
Gotchas that break naive “open email” implementations
Even mature teams get burned by the same email edge cases. If you’re building a programmatic “open email” path, design for these up front.
Duplicate and folded headers
Headers can legally repeat, and they can be folded across lines. If you naïvely map headers into a dictionary, you may lose data or parse incorrectly.
Choosing the wrong body
A lot of systems accidentally parse:
- An HTML tracking pixel section instead of the user-visible content
- A footer instead of the OTP line
- A forwarded message inside the email
Prefer text/plain when possible, and be explicit about how you pick the “primary” body.
Encodings and character sets
If you do not consistently decode transfer encoding and charset, you will see:
- Broken Unicode
- Missing punctuation, which can break OTP extraction
- Incorrect comparisons in tests
Time is not a single field
Email timestamps are messy. The Date header is sender-provided and not always trustworthy. Your receiving system’s timestamp is often more useful for latency and timeouts.
HTML parsing is a security boundary
If you run agents against email content, treat HTML as adversarial input. A safe strategy is:
- Extract candidate links, then validate them against allowlists
- Avoid “clicking” unknown URLs in automation
- Keep raw content for audit, but do not feed full HTML into an LLM by default
For deeper reliability guidance on parsing identifiers like Message-ID and related fields, Mailhook has a separate post focused on header parsing: Headers Email Guide: What to Parse for Reliability.
A pragmatic JSON contract for LLM agents
Agents work best with small, structured inputs. Instead of giving an LLM an entire email (especially HTML), provide a compact JSON object that is:
- Deterministic
- Minimal
- Traceable back to the raw message
An example “agent-safe” shape might look like this:
{
"message_id": "<...>",
"received_at": "2026-02-01T20:12:33Z",
"from": {"address": "[email protected]", "name": "Example"},
"to": [{"address": "[email protected]", "name": null}],
"subject": "Your login code",
"text": "Your code is 123456",
"links": ["https://example.com/verify?token=..."],
"attachments": [{"filename": "invoice.pdf", "content_type": "application/pdf", "size": 48211}]
}
You can then add a second layer: a tiny extraction object your tests or agent tools actually consume (for example { "otp": "123456" }). This keeps your workflow simple and reduces LLM exposure to hostile content.
Build it yourself vs consume JSON from an inbox API
You have two broad approaches:
- Parse raw emails yourself (via IMAP/POP, direct SMTP ingest, or provider APIs)
- Use a programmable inbox service that gives you structured JSON and deterministic retrieval
Here’s a decision table that tends to match real-world engineering tradeoffs.
| Approach | Best for | Common pain points | Typical outcome |
|---|---|---|---|
| IMAP mailbox scraping | Quick prototypes | Flaky searches, concurrency collisions, slow polling | Breaks in CI and parallel runs |
| Provider APIs (Gmail/Graph) | Internal tooling with accounts | OAuth, quotas, long-lived identities | Works, but heavy operationally |
| Run your own SMTP capture | Local integration tests | Deliverability differences vs real email | Great locally, incomplete in staging |
| Programmable inbox API with JSON output | QA automation, LLM agents, verification flows | Need to integrate another API | Most deterministic for automation |
If your core need is “open an email programmatically and get JSON,” the key property is machine-readable output that doesn’t require HTML scraping.
Using Mailhook to open an email as JSON (webhook-first, polling fallback)
Mailhook is built around programmable disposable inboxes. Instead of creating a full email account, you create an inbox via API, use the generated address in your workflow, then receive messages as structured JSON.
Relevant Mailhook capabilities (from the product description):
- Disposable inbox creation via API
- Structured JSON email output
- RESTful API access
- Real-time webhook notifications
- Polling API for emails
- Signed payloads for security
- Batch email processing
- Shared domains and custom domain support
Because APIs evolve, the source of truth for endpoints and payloads is Mailhook’s implementation reference. Make sure to review llms.txt before you wire up agent tools or tests:
Reference flow (conceptual)
A reliable automation flow looks like this:
- Create a new inbox for the run (or agent session)
- Trigger the system under test to send an email to that address
- Wait for delivery (prefer webhook, use polling as fallback)
- Consume the JSON payload
- Extract only what you need (OTP/link)
Here is pseudocode that illustrates the shape of the integration without assuming any specific endpoint names:
# Pseudocode: consult https://mailhook.co/llms.txt for exact API fields and routes.
inbox = mailhook.create_inbox(
webhook_url="https://your-service.example/mailhook/webhook"
)
email_address = inbox["address"]
inbox_id = inbox["inbox_id"]
app.trigger_signup(email=email_address)
# Webhook-first: your webhook handler stores the JSON message keyed by inbox_id.
# Polling fallback: wait with timeout and backoff.
message = mailhook.wait_for_message(inbox_id=inbox_id, timeout_seconds=60)
otp = extract_otp(message["text"])
verify_url = extract_allowed_link(message.get("links", []))
assert otp is not None or verify_url is not None
Verify webhook signatures
If you accept inbound webhooks, treat them like any other external request:
- Verify the signature (Mailhook supports signed payloads)
- Use idempotency to handle retries
- Store only what you need, for as long as you need it
Again, the exact signing scheme and headers should come from the contract in llms.txt.
Design tips that make email automation boring (in a good way)
The goal is not to “parse email perfectly,” it’s to make your automation predictable.
Prefer isolation and correlation
If multiple test runs or agent sessions share an inbox, you reintroduce the hardest problem: figuring out which message belongs to which run. Isolated inboxes avoid mailbox searching entirely.
Assert on intent, not presentation
HTML changes constantly. Your assertions should target stable properties:
- Sender domain
- Subject intent
- Presence of a single OTP
- A verification link whose host is in an allowlist
Keep the raw message available for debugging
When something fails, you want to know:
- Did the message arrive?
- What headers did it have?
- Did you parse the correct MIME part?
This is where “raw plus normalized JSON” is helpful. The automation runs on normalized fields, while engineers debug with the raw context.
Where this leaves you
To open an email programmatically in 2026, you have two realistic options:
- Become an email parsing expert (RFC 5322, MIME edge cases, encoding quirks, security pitfalls)
- Use an inbox abstraction that already does the normalization and gives you JSON that your tests and agents can consume
If your primary need is agent workflows and QA reliability, the winning strategy is usually: treat email like an event stream, isolate inboxes per run, and consume structured JSON.
If you want to implement this with Mailhook, start with the contract in Mailhook llms.txt and design your tools around deterministic waits (webhook-first, polling fallback) and minimal artifact extraction.