Email retrieval के लिए webhook और polling में क्या difference है?

Webhooks low-latency, event-driven delivery provide करते हैं जब emails arrive करते हैं तो आपके endpoint को call करके, लेकिन publicly accessible endpoint और proper retry handling require करते हैं। Polling में आपका code repeatedly messages के लिए check करता है, simplicity offer करता है और तब भी काम करता है जब webhook endpoints down हों, लेकिन higher latency और more API calls के साथ।

Automation में deterministic email retrieval कैसे ensure करें?

Inbox isolation (one inbox per run), correlation tokens for matching, sender/subject based narrow matchers, और fixed sleeps के बजाय explicit timeouts use करें। 'Latest email' match करने से avoid करें और instead specific criteria use करें right message identify करने के लिए।

LLM agents जब emails read करते हैं तो security क्यों important है?

Emails को untrusted input treat करना चाहिए क्योंकि वे malicious content, forwarded threads, या hostile links contain कर सकते हैं। Best practices में webhook signatures verify करना, HTML की बजाय text/plain prefer करना, extracted content को whitelist करना, और agents को full email content की बजाय केवल minimal artifacts जैसे OTPs देना शामिल है।

Good email API JSON structure में क्या include होना चाहिए?

Essential fields में stable message identifiers for deduplication, arrival timestamps, envelope addresses (from/to/cc), subject, normalized headers, text content (preferably text/plain), optional HTML content, attachment metadata, और optionally debugging के लिए raw source शामिल होने चाहिए।

API के जरिए Email प्राप्त करें: Messages को JSON में

Email एक अंतिम “human-first” interfaces में से एक है जिसे अभी भी ज्यादातर software teams को automate करना पड़ता है। यदि आप signup verification flows, QA tests, या LLM agents बना रहे हैं जिन्हें real-world tasks complete करने की जरूरत है, तो आपको eventually API के जरिए email प्राप्त करना होगा, विश्वसनीय रूप से, और एक ऐसे format में जिसमें brittle HTML scraping की जरूरत न हो।

एक programmable inbox API इसे solve करती है raw inbound email को structured JSON में convert करके, फिर इसे आपके code तक webhooks (push) और/या polling (pull) के जरिए deliver करती है। यह article core design, practical JSON contract कैसी दिखनी चाहिए, और इसे deterministic और safe automation में कैसे wire करना चाहिए, इसे explain करता है।

यदि आप specifically Mailhook के साथ integrate कर रहे हैं, तो canonical machine-readable integration reference प्रोजेक्ट का llms.txt है: Mailhook API contract (llms.txt).

“API के जरिए email प्राप्त करना” actually क्या मतलब है (और JSON क्यों है point)

जब developers “get email” कहते हैं, तो अक्सर उनका मतलब इनमें से कुछ होता है:

Mailbox से message fetch करना (IMAP/POP), फिर खुद RFC 5322 और MIME parse करना।
Message event receive करना (webhook), फिर उसे store और parse करना।
ID के जरिए inbox query करना, automation के लिए normalized fields और content प्राप्त करना।

Automation और agents के लिए, कठिन भाग शायद ही कभी SMTP delivery खुद होती है। Pain delivery के बाद सब कुछ में है:

MIME multi-part, nested, और encoding edge cases से भरा होता है।
Headers folded, duplicated, और attacker-controlled हो सकते हैं।
Bodies HTML-heavy, tracking-laden, या text/plain missing हो सकते हैं।
Timing nondeterministic है, आप “sleep 5 seconds” नहीं कर सकते और reliability expect नहीं कर सकते।

एक अच्छी inbox API raw email (RFC 5322 plus MIME द्वारा defined, RFC 2045 देखें) को stable JSON representation में convert कर देती है, ताकि आपका automation fields पर assert कर सके, fragile rendering पर नहीं।

Automation-friendly model: inboxes और messages, long-lived accounts नहीं

Key shift यह है कि inbox को short-lived resource की तरह treat करना, personal mailbox के बजाय message queue के करीब:

आप inbox create करते हैं programmatically।
आपको उस inbox के लिए routable email address मिलता है।
आपका system उस address पर email भेजता है।
आप messages को JSON के रूप में retrieve करते हैं (webhook या polling)।
आप optionally inbox को rotate या expire करते हैं isolation tight रखने के लिए।

यह “inbox-first” approach ही parallel CI runs, multi-agent toolchains, और retry-heavy verification flows को predictable बनाता है।

A simple flow diagram showing “Create inbox” producing an email address and inbox_id, then “App sends email” into the inbox, then two retrieval paths: “Webhook delivers JSON event” and “Polling API returns messages JSON”, ending at “Extractor pulls OTP or magic link”.

JSON कैसी दिखनी चाहिए? QA और LLM agents के लिए practical contract

अलग providers अलग schemas expose करते हैं, लेकिन robust automation को generally same conceptual pieces की जरूरत होती है:

Email concept	आपको इसकी क्यों जरूरत है	JSON field examples (illustrative)
Stable message identifier	Deduplication, idempotency	`message_id`, `provider_id`, `rfc822_message_id`
Arrival timestamp	Time budgets, debugging	`received_at`
Envelope addresses	Routing और assertions	`from`, `to`, `cc`, `reply_to`
Subject	Matching और correlation	`subject`
Headers (normalized)	Debugging, correlation, auth signals	`headers` (map या list, normalized)
Text content (`text/plain` prefer करें)	Agents के लिए safe parsing	`text`, `text_plain`, `body_text`
HTML content (optional, carefully treat करें)	Human debugging, fallbacks	`html`, `body_html`
Attachments metadata	Security, artifact extraction	`attachments[]` with `filename`, `content_type`, `size`
Raw source (optional but valuable)	Forensics जब parsers disagree करें	`raw` या `source`

Agent और test environments के लिए दो practical recommendations:

Assertions और extraction के लिए text/plain prefer करें। HTML humans के लिए useful है, लेकिन automation के लिए यह large, attacker-friendly surface है।
Debugging के लिए raw source available रखें। जब CI में कुछ break होता है, exact delivered content को inspect कर पाना घंटों बचा सकता है।

Mailhook की product positioning explicitly इसी के around centered है: API के जरिए disposable inboxes create करें और emails को structured JSON के रूप में receive करें, LLM agents और QA automation के लिए built। Exact field names और payload formats के लिए, canonical reference use करें: mailhook.co/llms.txt।

Messages retrieve करना: webhook-first, polling fallback

“API के जरिए email प्राप्त करने” के दो common ways हैं, और ज्यादातर production setups दोनों use करते हैं।

Webhooks (push)

Webhook के साथ, inbox provider आपके endpoint को call करता है जब नया message arrive करता है। यह ideal है जब आप fast, event-driven workflows चाहते हैं।

Strengths:

Low latency, no polling loops
Natural event stream model
अच्छी scalability जब many inboxes active हों

Operational requirements:

आपको retries और idempotency implement करना होगा
आपको authenticity verify करनी होगी (signatures)
आपको public internet से reachable endpoint चाहिए (या secure tunnel)

Mailhook real-time webhook notifications और security के लिए signed payloads support करता है (content trust करने से पहले signatures verify करें)।

Polling (pull)

Polling के साथ, आपका code provider से inbox में messages के लिए ask करता है जब तक match arrive न हो या timeout reach न हो जाए।

Strengths:

Test runners में reason करना simple
Works even अगर आपका webhook endpoint down हो
Reliability के लिए good fallback path

Costs:

Higher latency (interval के depending)
More API calls

Mailhook एक polling API for emails include करता है, जो webhooks का practical complement है।

Quick comparison

Mechanism	Best for	Reliability gotcha	Default advice
Webhook	Real-time automation, high concurrency	Duplicate deliveries, replay risk	Signatures verify करें, message ID से dedupe करें
Polling	CI tests, fallback paths	Fixed sleeps cause flakes	Timeout और backoff के साथ poll करें, narrowly match करें

Deterministic retrieval: right message match करें, “latest email” नहीं

ज्यादातर flakiness ambiguous matching से आती है। यदि आप केवल “fetch latest message” करते हैं, तो retries, parallel runs, और delayed delivery आपको परेशान करेंगे।

Deterministic strategy usually include करती है:

Inbox isolation: one inbox per run, per attempt, या per agent session।
Correlation token: एक run_id या nonce include करें जहां आप match कर सकें।
Narrow matchers: expected sender, expected subject prefix, या unique token से filter करें।
Explicit time budget: एक single function जो deadline तक wait करे।

Correlation के लिए, most robust option अपना header जैसे X-Correlation-Id add करना है (जब आप sender control करते हैं)। यदि आप नहीं कर सकते, तो subject token या unique link parameter use करें।

Minimal “wait for email” interface (agent-friendly)

चाहे आप test harness build कर रहे हों या LLM tool, cleanest abstraction एक small, deterministic API surface है:

create_inbox() → returns { inbox_id, address }
wait_for_message(inbox_id, matcher, timeout_ms) → returns एक single message JSON
extract_verification_artifact(message) → returns { otp } या { url }

यह LLM या test runner को mailbox search semantics से deal करने से रोकता है। यह prompt injection risk भी reduce करता है क्योंकि agent को “the whole inbox history” की जरूरत नहीं, केवल minimal artifact की।

Example: JSON से OTP retrieve करना (pseudocode)

नीचे का code intentionally provider-agnostic है। यह exact Mailhook endpoints या field names assume किए बिना control flow और safety checks illustrate करता है।

Polling-based retrieval

import re
import time

def wait_for_message(fetch_messages, inbox_id, timeout_s=60, poll_interval_s=1.5):
    deadline = time.time() + timeout_s
    seen_ids = set()

    while time.time() < deadline:
        messages = fetch_messages(inbox_id)  # returns a list of message JSON objects

        for msg in messages:
            msg_id = msg.get("message_id") or msg.get("id")
            if msg_id and msg_id in seen_ids:
                continue
            if msg_id:
                seen_ids.add(msg_id)

            subject = (msg.get("subject") or "").lower()
            sender = (msg.get("from") or "").lower()

            # Narrow matcher example
            if "verify" in subject and "no-reply" in sender:
                return msg

        time.sleep(poll_interval_s)

    raise TimeoutError("No matching email arrived before timeout")


def extract_otp(message_json):
    text = message_json.get("text") or message_json.get("text_plain") or ""
    m = re.search(r"\b(\d{6})\b", text)
    if not m:
        raise ValueError("OTP not found in text body")
    return m.group(1)

Webhook-based retrieval (high level)

Webhook mode में, आप flow invert करते हैं:

आपका webhook endpoint message JSON receive करता है।
आप signature verify करते हैं।
आप message को inbox_id से keyed storage में write करते हैं।
आपका test runner या agent उस storage (या internal queue) पर matching event के लिए wait करता है।

Mailhook signed payloads support करता है, इसलिए verification first-class step होनी चाहिए। Exact signature scheme और headers के लिए, consult करें: Mailhook llms.txt।

Security और safety: email को untrusted input treat करें (especially LLMs के साथ)

यदि LLM agent emails read कर रहा है, तो आपको assume करना होगा कि content hostile हो सकता है। Even verification emails में unexpected content, forwarded threads, या malicious links हो सकते हैं यदि attacker आपके inbox में messages trigger कर सकता है।

Practical guardrails:

Webhook signatures verify करें और unsigned या invalid payloads reject करें।
Agent pipelines में HTML render न करें। text/plain plus strict extraction prefer करें।
जो extract करते हैं उसे whitelist करें। Magic links के लिए, follow करने से पहले hostname और path को expected patterns के against validate करें।
Retention और logs minimize करें। Emails अक्सर secrets, tokens, या PII contain करते हैं।
Tool output constrain करें। Agent को केवल OTP या validated URL दें, full raw email नहीं।

ये controls केवल security hygiene नहीं हैं, ये reliability improve करते हैं क्योंकि आपकी pipeline deterministic हो जाती है और template drift के लिए less sensitive।

जहां Mailhook fit करता है: disposable inboxes + JSON output + webhooks

यदि आप agents या CI के लिए “get email via API” implement कर रहे हैं, तो Mailhook उन primitives के around designed है जिन्हें आप typically खुद build करते हैं:

API के जरिए disposable inbox creation
Structured JSON के रूप में emails delivered
Real-time webhooks (signed payloads के साथ)
Polling API fallback path के रूप में
Shared domains quick starts के लिए और custom domain support tighter control के लिए
Batch email processing higher-throughput workflows के लिए

यदि आप exact request/response formats चाहते हैं (और आप LLMs के लिए tools build कर रहे हैं), तो canonical reference से start करें: https://mailhook.co/llms.txt, फिर इसे अपने test harness या agent tools से connect करें।

Ship करने से पहले reliability checklist

Automation में email retrieval पर rely करने से पहले, sure करें कि आप इनका “yes” में answer कर सकते हैं:

आप run, attempt, या session per fresh inbox create करते हैं।
आपकी wait logic deadline-based है, no fixed sleeps।
आप message ID (या equivalent stable identifier) से dedupe करते हैं।
आपके matching rules narrow और correlation-friendly हैं।
आप webhook signatures verify करते हैं (यदि webhooks use कर रहे हैं)।
आप minimal artifacts (OTP, validated URL) extract करते हैं agent में whole emails feed करने के बजाय।

यदि आप ये implement करते हैं, तो “get email via API” आपकी pipeline का flakiest step होने के बजाय predictable building block बन जाता है।

Mailhook के programmable inboxes और JSON email retrieval explore करने के लिए, Mailhook देखें और integration contract handy रखें: Mailhook llms.txt।

API के जरिए Email प्राप्त करें: Messages को JSON के रूप में Retrieve करें