Skip to content

04. Webhooks and Callbacks — push the receipt out when the event happens

~16 min read. Polling asks again and again. Webhooks speak when there is actual news.

Built on the ELI5 in 00-eli5.md. The receipt — the message that tells you what became ready — becomes an event payload your system pushes to another system at the right moment.


1) Push versus pull is about who carries the update burden

Pull means the client keeps checking. Push means the server sends the update when it happens. See. Neither is universally better. The right choice depends on frequency, latency, and ownership.

Polling example:

  • Client calls GET /orders/4812 every 10 seconds.
  • Most calls return the same status.
  • Traffic grows even when nothing changed.

Webhook example:

  • Client registers a callback URL once.
  • Your system sends an event when order 4812 changes.
  • Traffic appears only on real change.

A compact picture:

Polling:   Client ──▶ check ──▶ Server ──▶ same answer again
Webhook:   Server ──▶ event ──▶ Client endpoint

Worked example with numbers. Suppose 100,000 merchants track payout status. Polling every 30 seconds means over 3,300 requests each second, even when almost nothing changed. If only 2,000 payout events actually happen each second, webhooks can cut waste drastically.

Push also shifts responsibility. Now your platform owns delivery logic, retries, signatures, and event records. So what to do? Choose webhooks when timely updates matter, and consumers can host reliable endpoints.

2) Registration should be explicit and event contracts should stay boring

A webhook system usually starts with registration. Clients tell you where to send events, which events they want, and sometimes which secret to use.

Example registration request:

POST /v1/webhook-endpoints
Content-Type: application/json

{
  "url": "https://merchant.example.com/hooks/payouts",
  "events": ["payout.paid", "payout.failed"]
}

Good registration records often store:

  • Endpoint URL.
  • Subscribed event types.
  • Secret for signature verification.
  • Status like active or disabled.
  • Delivery metadata such as last success time.

The event payload is your outgoing receipt. Keep it structured. Keep it versioned. Keep it boring. A webhook body should not require guesswork.

Example event:

{
  "id": "evt_981",
  "type": "payout.paid",
  "createdAt": "2026-05-08T10:15:00Z",
  "data": {
    "payoutId": "po_44",
    "amountPaise": 250000,
    "currency": "INR"
  }
}

Notice the top-level envelope. It carries event id, type, and timestamp. The nested data carries the business object. That split helps routing, logging, and replay tooling.

3) Delivery must assume failures and retry with backoff

Webhook delivery lives on the open internet. Endpoints time out. DNS breaks. TLS certificates expire. Consumers deploy bugs on Fridays. So yes, retries are mandatory.

A sane delivery path:

Event created
Delivery queue
   ├── attempt 1 ──▶ 200? done
   ├── attempt 2 ──▶ 500? retry later
   ├── attempt 3 ──▶ timeout? retry later
   └── dead-letter or disable after threshold

Backoff prevents storm behavior. Do not retry after 1 second, then 1 second, then 1 second forever. Use increasing delays. For example 1 minute, 5 minutes, 30 minutes, 2 hours.

Worked example. A merchant endpoint stays down for 20 minutes. With immediate retries every 5 seconds, you send 240 doomed requests. With exponential backoff, you may send only 4 or 5 attempts in that window. That is kinder to both sides.

Track every attempt. Store status code, latency, response snippet, and next retry time. If the consumer asks, "Why did we miss event evt_981?" you need an answer, not a shrug.

4) Signatures and idempotency protect both sides from confusion

Never trust a webhook just because it hit your URL. Attackers can copy JSON shapes. Consumers must verify signatures. That means you sign the raw payload with a shared secret, and the receiver recomputes the signature. If values mismatch, reject it.

Example header pattern:

X-Signature: t=1715163300,v1=4fd7a9...

Verification flow:

┌─────────── sender signs body ───────────┐
│ raw payload + secret ──▶ HMAC digest    │
└─────────────────────────────────────────┘
Receiver computes same digest and compares

Idempotency matters too. Webhook delivery is commonly at-least-once, not exactly-once. So consumers may receive the same receipt twice. They must deduplicate using event id.

Worked example. Your system sends evt_981. Consumer returns 500 after actually saving it. You retry, and they receive evt_981 again. If their handler blindly creates a second payout record, finance reports become nonsense. If they store processed event ids, the second attempt becomes a harmless no-op.

5) Event delivery patterns depend on scale and control needs

Some teams send directly from the app server. That works for low volume. Larger systems separate event creation from delivery. Simple, no? Queues and workers absorb spikes better.

Common patterns:

  • Application writes event to outbox table.
  • Worker reads outbox and pushes to queue.
  • Delivery workers call subscriber endpoints.
  • Failures go to dead-letter storage.
  • Replay tools resend selected events safely.

Worked example with numbers. Suppose checkout traffic spikes to 20,000 events per minute during a sale. If each app request thread tries live delivery, latency and failure coupling become ugly. If the app only records the event, and workers deliver asynchronously, checkout stays fast while webhook throughput scales separately.

This is why outbox patterns exist. You persist the business change and event record together. Then delivery happens reliably afterward. That reduces ghost events, where a callback was sent for data that never committed.

Mini worked example. If a partner wants inventory updates within 3 seconds, polling every minute clearly misses the product need. A webhook or callback is the right push tool there.

Another design note. Timestamp checks help block replay attacks with old signed payloads. Many platforms reject signatures older than 5 minutes.

One more operational habit. Expose a delivery log UI or API for consumers. Support teams solve incidents much faster when they can inspect attempt history. See. Good webhook platforms optimize for failure visibility, not only happy-path emission. That discipline builds trust. Very important.


Where this lives in the wild

  • A Stripe developer platform engineer builds signed webhook delivery so merchants learn about payment successes without aggressive polling.
  • A Twilio platform engineer sends status callbacks for messages and calls, then retries safely when customer endpoints fail.
  • A Slack platform engineer relies on event subscriptions so apps react to channel activity in near real time.
  • A Shopify app ecosystem engineer pushes order and fulfillment events to partner apps that cannot keep polling every shop constantly.
  • A Razorpay payouts engineer uses webhook retries and signature verification so merchants receive trustworthy settlement updates.

Pause and recall

  1. When does push beat pull clearly?
  2. Why is exponential backoff better than constant retries?
  3. What does signature verification protect against?
  4. Why must webhook consumers be idempotent?

Interview Q&A

Why are webhooks usually delivered at least once, not exactly once?

Because networks and receivers fail in ambiguous ways, so the sender retries to avoid silent data loss. Deduplication on the consumer side is the practical answer. Common wrong answer to avoid: "Just guarantee exactly once by trusting TCP delivery."

Why wrap webhook payloads in an envelope with id and type?

The envelope supports routing, observability, replay, and deduplication separately from business data. It keeps the event contract operationally usable. Common wrong answer to avoid: "Put only the business object in the body because metadata is clutter."

When should you disable a webhook endpoint temporarily?

Disable or pause it after repeated failures when retries keep hurting both systems, then surface clear recovery actions to the consumer. That prevents endless waste and noisy incident loops. Common wrong answer to avoid: "Retry forever at full speed because delivery must not stop."

Why use an outbox or queue for high-volume webhook systems?

It decouples request success from callback delivery, absorbs spikes, and reduces inconsistent event emission during failures. This makes the system easier to reason about under load. Common wrong answer to avoid: "Direct delivery from request handlers is always simplest and therefore always best."


Apply now (5 min)

Exercise. Design webhook delivery for order.shipped events. Write the registration endpoint, an example payload, one signature header, and a retry schedule for four attempts. Then say how the consumer should deduplicate duplicates.

Sketch from memory. Draw event creation, queue, delivery worker, and subscriber endpoint. Then mark where signature verification happens, and where failed attempts are recorded.


Bridge. Webhooks answer who gets notified next. Authentication answers who may call anything at all, and what they may touch after entry. → 05-auth-oauth-jwt-sessions.md