How to Debug Missing Emails in CI Pipelines Fast

A missing email in CI is rarely a single problem. More often, the message was sent to the wrong recipient, routed to the wrong inbox, delivered after the test deadline, accepted by the inbox provider but dropped by your webhook handler, or found by polling but rejected by an overly strict matcher.

The fastest way to debug it is to stop asking “where is the email?” and start proving one checkpoint at a time. In a reliable CI pipeline, every email-dependent test should leave enough evidence to answer three questions within minutes:

Was the exact inbox created for this attempt?
Did the application send to that exact address?
Did the message reach the inbox API, and did the CI consumer process it?

This guide gives you a practical runbook for debugging missing emails in CI pipelines fast, plus the instrumentation to make the next failure self-explanatory.

Start with the email delivery checkpoint map

Before changing timeouts or rerunning the test, map the flow. A CI email test usually crosses six checkpoints:

Checkpoint	What you need to prove	Typical evidence
Inbox provisioned	The test created or reserved an isolated inbox	`inbox_id`, email address, creation time, expiry time
Address used	The app used the exact generated email address	App request log, user record, test input snapshot
Send accepted	The app or email provider accepted the send job	Send event, queue job ID, provider message ID
Domain routable	The recipient domain routes inbound mail correctly	MX lookup, domain config, provider domain status
Message received	The inbox provider received and stored the message	Message JSON, `message_id`, `received_at`, recipient fields
Consumer processed	Webhook or polling found the right message and extracted the artifact	Webhook delivery ID, polling transcript, matcher result, OTP or link extraction log

Debugging is fast when you move left to right. If you skip straight to parsing, but the app never sent the email, you waste time. If you keep resending, but the webhook handler is rejecting signed payloads, you create duplicates and hide the original failure.

This is also why disposable, API-created inboxes are easier to debug than shared mailboxes. With Mailhook, a test can create a disposable inbox via API, receive emails as structured JSON, consume them through real-time webhooks or a polling API, and correlate everything to a concrete inbox resource. For exact integration fields and API semantics, use the canonical Mailhook llms.txt reference.

The 10-minute triage runbook

When a CI job fails with “email not received,” do not rerun it first. Preserve the failure. A rerun may pass and erase the conditions you need to diagnose.

Minute 0 to 2: freeze the attempt context

Every failure should start with a small evidence bundle. If your test harness does not already emit one, add it before deeper debugging.

Capture these fields:

Field	Why it matters
`ci_run_id` and `attempt_id`	Separates retries and parallel jobs
`test_name`	Identifies the failing flow and expected template
`inbox_id`	Lets you query the correct inbox instead of searching globally
`email_address`	Confirms the exact recipient given to the app
`created_at` and `expires_at`	Reveals expiry and late-arrival issues
`correlation_token`	Helps match the right message when templates are similar
`send_triggered_at`	Establishes when the app should have sent the email
`deadline_at`	Shows whether the test budget was too short

The inbox_id is especially important. A bare email address is not enough in parallel CI because retries can reuse similar addresses, old messages can match broad filters, and multiple workers can race on the same mailbox. An inbox-first model gives you a concrete resource to inspect.

Minute 2 to 4: query the inbox API directly

Use polling as a diagnostic tool even if your normal pipeline is webhook-first. This separates “message never arrived” from “message arrived but the webhook path failed.”

If polling finds the message, the sending and inbound routing path worked. Focus on webhook delivery, signature verification, queue processing, matchers, and parsing.

If polling does not find the message, focus on the app send path, the recipient address, domain routing, provider acceptance, and message timing.

A provider-agnostic diagnostic loop looks like this:

async function diagnoseInbox(mailClient, ctx) {
  const started = Date.now();
  const deadlineMs = 60_000;
  const seen = [];

  while (Date.now() - started < deadlineMs) {
    const page = await mailClient.listMessages({
      inboxId: ctx.inbox_id,
      since: ctx.created_at
    });

    for (const message of page.messages) {
      seen.push({
        message_id: message.message_id,
        received_at: message.received_at,
        from: message.from,
        subject: message.subject,
        recipients: message.recipients
      });
    }

    const match = page.messages.find((message) =>
      message.text?.includes(ctx.correlation_token) ||
      message.subject?.includes(ctx.expected_subject_hint)
    );

    if (match) {
      return { status: "received", match, seen };
    }

    await sleep(2000);
  }

  return { status: "not_received_before_deadline", seen };
}

The point is not to replace your production waiting strategy. The point is to collect hard evidence. For production-grade polling, use cursors, bounded timeouts, and dedupe rules. Mailhook’s guide to polling with cursors, timeouts, and dedupe covers that pattern in more depth.

Minute 4 to 6: confirm the app sent to the exact recipient

Many “missing email” failures are really “wrong recipient” failures. The UI may show one address while the backend sends to another. A test may create a fresh inbox, but the application may reuse an old user record. A signup retry may silently suppress email because the account already exists.

Check the application logs for the envelope recipient, not only the To: header. In SMTP, routing is based on the envelope recipient, while headers are message content. The distinction is part of the SMTP model described in RFC 5321. In automated tests, logging only the visible To: header can mislead you.

Look for these send-path failures:

Symptom	Likely cause	Fast check	Fix
No send event exists	The app never triggered email	Search logs by `attempt_id` or user ID	Assert the send job was enqueued before waiting
Send event uses old address	User fixture reused stale data	Compare app user email to `email_address`	Create a new user per attempt or reset state
Send was suppressed	Rate limit, duplicate account, cooldown, feature flag	Check provider or app suppression reason	Make test accounts unique and log suppression decisions
Send accepted after test timed out	Async queue delay	Compare `send_triggered_at`, send accepted time, and CI deadline	Start waiting after trigger and use a realistic deadline
Header recipient differs from envelope recipient	Template or provider rewriting	Log envelope recipient at send boundary	Match on inbox recipient, not display headers

If you use LLM agents to execute signup or login flows, also log the tool calls that submitted the email address. Agents can accidentally paste an old address, include trailing punctuation, or retry a previous step. Keep the model-facing tool contract small and deterministic, for example create_inbox, submit_email, and wait_for_message.

Minute 6 to 7: check custom domain routing

If you use a shared Mailhook domain, you can usually skip DNS debugging. If you use a custom domain, missing messages often come from MX mistakes, DNS propagation, or sending to the wrong subdomain.

Run an MX lookup from the CI environment or a shell with similar DNS behavior:

dig MX inbound.example.test

You are checking that the test domain or subdomain points to the intended inbound provider. Also verify that the test sent to the same domain you configured. It is common to configure ci-mail.example.com and accidentally send to mail-ci.example.com.

For custom-domain test setups, keep the domain as configuration rather than hardcoding it into tests or agent prompts. That makes it easier to switch between shared domains, staging subdomains, and dedicated domains without changing test logic.

Minute 7 to 8: inspect webhook delivery separately from message receipt

A webhook-first pipeline is the right default for low-latency CI, but webhooks introduce their own failure modes. The key debugging move is to separate inbound receipt from webhook consumption.

If polling shows the message but the test still failed, inspect the webhook path:

Webhook symptom	Likely cause	What to inspect
No webhook request reached your service	URL misconfigured, network block, tunnel down	Webhook endpoint config, ingress logs, CI network rules
Webhook returned non-2xx	Handler crash, schema mismatch, dependency failure	HTTP status, exception logs, payload sample ID
Signature verification failed	Raw body changed before verification, wrong secret, stale timestamp	Raw-body capture, signing headers, secret version
Handler returned 2xx but test did not unblock	Async queue dropped event or wrong correlation key	Queue logs, idempotency key, inbox ID mapping
Duplicate webhook deliveries confused the test	At-least-once delivery without dedupe	Delivery ID dedupe, message ID dedupe, artifact consume-once logic

Treat webhook verification as a gate. Verify first, then parse. If you need implementation details, see Mailhook’s signed webhook verification guide, and use the llms.txt integration reference for the current Mailhook contract.

During debugging, do not disable signature verification permanently. Instead, log a safe verification result, such as signature_valid=false, timestamp_skew_ms=..., and secret_version=.... Never log webhook secrets.

Minute 8 to 10: test the matcher and parser against the actual JSON

If the message exists but your test reports “missing,” the matcher may be too narrow, too broad, or pointed at the wrong field.

Common matcher failures include:

Failure	Example	Better approach
Subject-only matching	Template changed from “Verify your email” to “Your login code”	Match by inbox ID plus correlation token plus sender hint
HTML-only extraction	OTP exists in text, but parser scrapes a changed HTML node	Prefer structured JSON and `text/plain` when possible
Single regex for OTP	Regex captures a support ticket number instead of the code	Score candidates by context, length, and nearby words
Ignoring late messages	Message arrived 5 seconds after deadline	Record `received_at` and classify as late, not missing
Broad shared-inbox search	Test picks another worker’s verification email	Use one disposable inbox per attempt

For CI and agents, parse the email as data. Mailhook returns structured JSON emails, which means your test can assert on fields and extracted artifacts instead of rendering HTML or scraping a human mailbox. This reduces brittle failures and makes artifacts safe to store as CI evidence.

Use the “arrived vs processed” split to choose the next fix

Once you run the diagnostic poll, most failures fall into one of four buckets.

Result	Meaning	Next fix
No message in inbox, no app send event	The email was never sent	Fix the app trigger, fixture state, or email queue
No message in inbox, send event exists	Delivery or routing issue	Check recipient, suppression, custom domain MX, sender provider logs
Message in inbox, webhook absent or failed	Ingestion-to-consumer issue	Fix webhook URL, signature verification, handler status, queue processing
Message in inbox, webhook processed, test failed	Matcher or extraction issue	Tighten correlation, update parser, store artifact-level debug output

This split prevents the most common anti-pattern: increasing the CI timeout for every failure. More time only helps if the message arrives late. It does not fix a wrong address, an expired inbox, a failing webhook signature check, or a parser that rejects the actual message.

Build a CI artifact bundle for every email failure

Fast debugging depends on artifacts. When an email wait fails, attach a compact JSON file to the CI job. Keep it safe, but make it complete enough for diagnosis.

A useful artifact contains:

Inbox descriptor: inbox_id, email address, domain, creation time, expiry time
Attempt metadata: CI run ID, worker ID, test name, retry number, correlation token
Send evidence: app event ID, send accepted time, envelope recipient, suppression reason if present
Receive evidence: messages seen by polling, message IDs, received times, sender, subject, recipient summary
Webhook evidence: delivery IDs, HTTP statuses, verification result, handler correlation key
Matcher evidence: which rules ran, which fields were checked, why candidates were rejected
Timing evidence: trigger time, first poll time, webhook time, deadline, late-arrival classification

Avoid storing raw secrets, full auth links, session tokens, or complete email bodies unless your retention policy allows it. For LLM-agent workflows, store a minimized view such as “OTP extracted,” “verification URL host,” or “candidate rejected because correlation token missing.” Do not hand untrusted raw email content directly to a model.

If your team already uses Mailhook, this artifact can be built around the inbox ID and structured JSON message output. You can receive low-latency events through webhooks, fall back to polling for deterministic waits, and use signed payloads to prove the event came from the expected source.

Prevent “missing email” bugs before they happen

The best debugging strategy is a harness that makes failures obvious. For CI pipelines, these invariants remove most ambiguity.

Use one inbox per attempt

One inbox per test attempt is the simplest way to eliminate collisions. Do not reuse a shared mailbox across parallel workers. Do not reuse the same inbox across retries unless you explicitly model the retry relationship and dedupe old messages.

A good attempt descriptor includes the inbox ID, email address, attempt ID, and expiry. Mailhook’s programmable temp inboxes are designed for this style of automation: create the inbox via API, use it for one flow, consume structured JSON, then let the workflow clean up according to your lifecycle policy.

Prefer webhook-first, keep polling as fallback

Webhooks give fast feedback and reduce wasteful polling. Polling gives you deterministic recovery when the webhook consumer is unavailable or when you need to debug the receipt path. The strongest pattern is webhook-first with polling fallback.

In CI, this means the test can unblock when a verified webhook arrives, but it can also poll the inbox until a deadline. Both paths should use the same matcher and idempotency rules so they cannot process the same artifact twice.

Make timeouts explicit and classify late arrivals

A fixed sleep hides timing information. Replace sleep(30000) with “wait until deadline and record what happened.” If a message arrives after the deadline, mark it as late. Late is different from missing, and it points to different fixes.

For example, a late email may require queue tuning, a larger test budget for a specific flow, or a resend policy. A truly missing email points to routing, send suppression, or inbox lifecycle problems.

Match narrowly, but explain rejections

A good matcher is narrow enough to avoid another worker’s email, but observable enough to explain why it rejected a candidate. Use layered signals: inbox ID, recipient, correlation token, sender hint, subject hint, and expected artifact type.

Do not rely on a single subject line or a single regex. Email templates change. Security footers change. HTML structure changes. Structured JSON plus conservative artifact extraction gives you a more stable base.

A practical Mailhook-based debug flow

With Mailhook, the debugging flow can be compact:

Create a disposable inbox for the CI attempt using the API.
Store the returned inbox descriptor in the test context.
Trigger the application email using the returned email address.
Wait webhook-first, verifying signed payloads before processing.
Poll the inbox as a fallback and as a diagnostic path.
Extract only the required artifact, such as an OTP or magic link, from structured JSON.
Attach a minimized evidence bundle if the wait fails.

Mailhook supports the primitives this flow needs: disposable inbox creation via API, structured JSON email output, REST access, real-time webhook notifications, polling, shared domains, custom domain support, signed payloads, and batch email processing. For exact request shapes, webhook signing details, and current API semantics, reference Mailhook’s llms.txt.

Frequently Asked Questions

Why do emails pass locally but go missing in CI? CI adds parallelism, retries, different network paths, colder queues, and stricter time budgets. A local test may reuse a mailbox safely because only one process is running, while CI workers can collide, select stale messages, or time out before an async email job completes.

Should I use webhooks or polling when debugging missing emails? Use both, but for different purposes. Webhooks are best for low-latency delivery in the normal path. Polling is the fastest diagnostic check to prove whether the message reached the inbox provider even if your webhook handler failed.

How long should a CI test wait for an email? There is no universal timeout. Use a deadline based on your app’s send queue and provider behavior, then record whether the message was received before or after the deadline. Avoid fixed sleeps because they hide whether the problem is latency, routing, or processing.

What if the email arrived, but OTP extraction failed? Treat that as a parser or matcher failure, not a missing-email failure. Store the structured message fields, candidate OTPs, rejection reasons, and the final extraction decision. Prefer text content and structured JSON over brittle HTML scraping.

Can an LLM agent debug missing emails safely? Yes, if the agent uses constrained tools and minimized outputs. Let tools create inboxes, wait for messages, and return typed artifacts or diagnostic statuses. Do not expose raw untrusted email HTML directly to the model, and verify signed webhooks before processing events.

Turn missing-email failures into explainable CI evidence

If your CI pipeline still treats email as a shared mailbox and a fixed sleep, missing emails will remain hard to debug. The faster pattern is to give every attempt its own programmable inbox, receive messages as structured JSON, use webhooks with polling fallback, and attach enough evidence to classify the failure immediately.

Mailhook provides those building blocks for developers, QA automation, and LLM-agent workflows: disposable inboxes via API, JSON email output, real-time webhooks, polling, signed payloads, shared domains, and custom domain support. Start with the Mailhook homepage or use the llms.txt integration reference to wire the exact contract into your CI harness.