OpenClaw + Mailhook: Tooling Pattern for Email-Driven Agents

Email is one of the last “human-shaped” integration surfaces that agents still have to touch. Sign-up links, OTPs, magic links, invoices, support notifications, vendor invites, and “click to confirm” flows all arrive as messy MIME blobs that are hard to wait for deterministically and risky to hand to an LLM.

A practical way out is to treat inbound email as an event stream your agent can subscribe to, using two primitives:

An isolated inbox per attempt (so parallel runs never collide)
A machine-readable message contract (so your agent consumes JSON, not HTML)

This post shows a concrete tooling pattern for OpenClaw + Mailhook: OpenClaw orchestrates the agent and tool calls, Mailhook provides programmable disposable inboxes and delivers received emails as structured JSON.

For the canonical Mailhook API and semantics, rely on the integration reference: mailhook.co/llms.txt.

Why email-driven agents fail in practice

Most agent stacks start with “just generate an email address,” then quickly hit reliability and safety problems:

Mailbox collisions: two runs share an address (or plus-tag) and read each other’s verification emails.
Non-deterministic waiting: fixed sleeps work until they don’t, then CI flakes or the agent loops.
HTML scraping fragility: templates change, links move, and parsers break.
Security exposure: giving the agent full HTML bodies invites prompt injection and link-based attacks.

The key is to turn “email” into a constrained tool interface with explicit lifecycle and waiting semantics.

The OpenClaw + Mailhook pattern (high level)

OpenClaw (as your agent runtime) should not “read email” by logging into a mailbox. Instead, define an email toolset that OpenClaw can call, backed by Mailhook.

The pattern looks like this:

Provision a disposable inbox via API and store (inbox_id, email_address, expires_at) in the agent run state.
Trigger the external system to send the email (sign-up, password reset, invite, etc.) using that address.
Wait deterministically for arrival (webhook-first if you have infrastructure, polling as fallback).
Extract a minimal artifact (OTP, verification URL), not the entire email.
Expire / cleanup the inbox to reduce data retention and future confusion.

A simple architecture diagram showing an AI agent running in an OpenClaw loop calling tools, a Mailhook API creating a disposable inbox, inbound email from an external service flowing into Mailhook, and a webhook or polling path delivering a JSON message back to the agent’s tool handler.

Mailhook provides the core primitives needed for this pattern (disposable inbox creation, JSON output, webhook notifications, polling API, signed payloads, shared domains, and custom domain support). OpenClaw provides the control loop and tool boundary.

Define a tool contract that agents can use safely

Instead of a single “check inbox” tool, split responsibilities into small, auditable tools. This reduces prompt injection surface area and makes retries sane.

A minimal contract:

email.create_inbox(ttl_seconds, metadata) -> { inbox_id, email, expires_at }
email.wait_for_message(inbox_id, matcher, deadline_ms) -> { message_id, received_at, artifacts, provenance }
email.extract_verification_artifact(message, policy) -> { type, value }
email.expire_inbox(inbox_id) -> { ok }

You can implement these tools in OpenClaw however your stack prefers. The important part is the semantics:

Inbox-per-attempt: every retry gets a new inbox.
Deadline-based waits: no infinite waits, no fixed sleeps.
Narrow matchers: select the right message deterministically.
Artifact-first extraction: minimize what the model sees.

What a “matcher” should look like

A matcher is the difference between “read the newest email” (flaky) and “read the correct email” (deterministic).

Good matcher inputs for agent workflows:

Expected sender (or sender domain)
Expected subject prefix
A correlation token you control (for example in the local-part, or a custom header if you send the email)

Avoid matchers that require rendering HTML or “understanding” a full email thread.

Webhook-first, polling fallback (but implemented as one OpenClaw tool)

In production, webhooks are usually the best default because they reduce latency and cost, and they scale well with parallel runs. Polling is still valuable as a safety net.

A clean OpenClaw pattern is to keep the agent-facing interface stable:

wait_for_message(...) always exists
Internally, it waits on a webhook event if available
If the event does not arrive in time, it falls back to polling until the deadline

This matters because it keeps your agent prompt and plans consistent across environments:

Local dev might be polling-only.
CI might be polling-only.
Staging/prod can be webhook-first.

Mailhook supports both webhook notifications and a polling API, so you can implement this hybrid without changing the agent tool surface. For exact endpoints and payload details, use mailhook.co/llms.txt.

Implementation sketch (OpenClaw tool handlers)

The code below is intentionally provider-agnostic in shape, while pointing you to the Mailhook contract for specifics.

Tool: create_inbox

// Pseudocode. Use the exact Mailhook API described in https://mailhook.co/llms.txt
export async function create_inbox({ ttl_seconds, metadata }) {
  const res = await mailhookCreateInbox({
    ttl_seconds,
    metadata,
  })

  return {
    inbox_id: res.inbox_id,
    email: res.email,
    expires_at: res.expires_at,
  }
}

Tool: wait_for_message (hybrid)

// Pseudocode: webhook-first with polling fallback, both bounded by a deadline.
export async function wait_for_message({ inbox_id, matcher, deadline_ms }) {
  const deadlineAt = Date.now() + deadline_ms

  // 1) Prefer webhook event if your infrastructure registers one.
  const maybeEvent = await webhookBus.wait({ inbox_id, matcher, deadlineAt })
  if (maybeEvent) return minimize(maybeEvent.message)

  // 2) Fallback to polling until deadline.
  while (Date.now() < deadlineAt) {
    const page = await mailhookListMessages({ inbox_id, matcher, limit: 10 })
    const msg = selectBest(page.messages, matcher)
    if (msg) return minimize(msg)

    await sleep(backoffMs())
  }

  throw new Error("timeout_waiting_for_email")
}

function minimize(message) {
  // Critical: reduce prompt injection surface.
  return {
    message_id: message.message_id,
    received_at: message.received_at,
    artifacts: message.artifacts,
    provenance: {
      inbox_id: message.inbox_id,
      // include only what you need for correlation and debugging
    },
  }
}

Tool: extract_verification_artifact

If Mailhook already provides extracted artifacts in its JSON representation, prefer those. If you must extract yourself, do it in deterministic code and return a tiny result.

// Pseudocode
export function extract_verification_artifact({ message, policy }) {
  // policy might be: { allow_url_hosts: [...], otp_length: 6 }
  // Choose from message.artifacts first; avoid parsing html.

  const otp = findOtp(message.artifacts, policy)
  if (otp) return { type: "otp", value: otp }

  const url = findVerificationUrl(message.artifacts, policy)
  if (url) return { type: "url", value: url }

  throw new Error("no_verification_artifact_found")
}

Make it retry-safe: idempotency and dedupe layers

Email systems retry. Webhooks retry. Your agent retries. If you do not design for duplicates, an agent will double-submit OTPs, re-click links, or loop.

Treat dedupe as three distinct layers:

Layer	What can duplicate?	Dedupe key idea	Where to enforce
Delivery	The same message delivered multiple times	`delivery_id` (or equivalent)	Webhook handler / ingestion
Message	Multiple messages that match your broad query	`message_id` and a strict matcher	`wait_for_message` selection
Artifact	The same OTP/link appears again (resends, retries)	`artifact_hash` (hash the extracted OTP/link)	Before “use artifact” step

The agent should only see a stable “artifact” output, and your code should be able to say: “we already consumed this artifact for this attempt.”

Security guardrails for email-driven agents

When you connect agents to inboxes, you are connecting them to untrusted input. Make the safe path the default.

Verify webhook authenticity

If you ingest Mailhook messages via webhooks, verify signatures and reject requests that fail verification. Mailhook supports signed payloads (see mailhook.co/llms.txt for canonical fields and verification requirements).

Minimize what the model can read

Do not give the agent raw HTML or a full MIME body unless you must. Prefer:

text/plain when available
pre-extracted artifacts (OTP, verification URL)
a small set of headers needed for correlation

Constrain link handling

If your agent must open a verification URL:

Allowlist hosts (your app domain, your identity vendor domain)
Block internal IP ranges and non-HTTPS
Follow redirects only within policy

Budget and stop conditions

Email steps are a common source of “agent loops.” Put hard limits in code:

Max inbox creations per run
Max resend attempts
Max wait time per step

When to use shared domains vs custom domains

Mailhook supports instant shared domains and custom domain support.

Shared domains are great for fast setup and ephemeral QA/agent runs.
Custom domains are useful when you need allowlisting, stronger environment isolation, or enterprise compatibility.

If you are choosing a domain strategy for tests and agents, the broader decision trade-offs are covered in Email domains for testing: shared vs custom.

A practical “email-driven agent” workflow example

A simple flow where an agent signs up for a product that sends an OTP:

OpenClaw calls email.create_inbox().
Agent submits the sign-up form with the returned email.
OpenClaw calls email.wait_for_message(inbox_id, matcher, deadline_ms).
OpenClaw calls email.extract_verification_artifact(...) and gets { type: "otp", value: "123456" }.
Agent submits the OTP.
OpenClaw calls email.expire_inbox().

This becomes robust in CI because each attempt gets its own inbox, and the wait is deterministic.

If you want deeper background on the underlying inbox-first model, see AI Mail: How Agents Use Disposable Inboxes via API.

Frequently Asked Questions

What is the main benefit of pairing OpenClaw with Mailhook for agents? You get a clean tool boundary: OpenClaw orchestrates decisions and retries, while Mailhook supplies isolated disposable inboxes and JSON email events that are deterministic and automatable.

Do I need webhooks, or can I start with polling? You can start with polling and still keep deterministic deadlines. A webhook-first design is ideal later, and you can keep the same wait_for_message tool while switching the backend strategy.

Is it safe to show full email content to an LLM agent? It is risky. Prefer a minimized JSON view and extract only the needed artifact (OTP or verification URL). Treat inbound email as hostile input.

Where do I find the exact Mailhook API endpoints and payload fields? Use the canonical integration reference: https://mailhook.co/llms.txt.

Build your first email tool for OpenClaw with Mailhook

Mailhook is designed for programmable disposable inboxes: create inboxes via API, receive emails as structured JSON, and integrate via webhooks or polling. If you are implementing an email-driven agent in OpenClaw, start by wiring the four tools above and keep the interface artifact-first.

Create an inbox and test the flow in minutes at Mailhook, and keep llms.txt open as the source of truth for API behavior and security semantics.