05. Sync vs Async Communication — Decide which roads must wait and which should move on¶

~15 min read. The same feature can feel instant or fragile based on how its components talk.

Built on the ELI5 in 00-eli5.md. The road — the path between city zones — now splits into wait-for-reply roads and drop-and-go roads.

First separate the two road types clearly¶

In HLD, communication is not just "service A talks to service B." That is too vague. You must ask one sharper question: does A wait for B before moving ahead? If yes, the road is synchronous. If no, the road is asynchronous. See. This one choice changes latency, coupling, retries, failure blast radius, and user experience. Synchronous means request and reply are part of one active conversation. REST and gRPC usually sit here. The caller waits, times out, retries, or fails. Asynchronous means the sender drops work onto a queue or stream and moves on. Message queues and event streams sit here. The receiver can process now, later, or in parallel. Here is the first picture: ┌──────────┐ sync call ┌────────────┐ │ Service A│ ───────────→ │ Service B │ └──────────┘ ←─────────── │ │ waits reply └────────────┘

┌──────────┐ async send ┌────────────┐ consume ┌────────────┐ │ Service A│ ───────────→ │ Queue/Log │ ─────────→ │ Service B │ └──────────┘ moves on └────────────┘ └────────────┘ Now what is the problem? Teams often pick one model by habit. That is lazy architecture. User-facing confirmation, payment authorization, and seat locking usually need sync. Email sending, analytics, thumbnail generation, and fan-out notifications usually fit async. Simple, no?

When synchronous roads are the right choice¶

Choose sync when the caller truly needs an answer before deciding the next step. The user taps "Pay now" and wants to know success or failure. The app cannot say, "We will think and tell you tomorrow." Good sync signals: - user is actively waiting - the next step depends on immediate result - low latency is achievable and predictable - the downstream failure should be visible right now - the workflow is short and tightly bounded Sync communication makes reasoning simpler. One request enters, one reply returns. That simplicity helps when correctness matters more than loose decoupling. A checkout path often looks like this: ┌────────┐ POST /pay ┌──────────────┐ auth request ┌────────────┐ │ Client │ ───────────→ │ Checkout API │ ─────────────→ │ Payment svc│ └────────┘ waits └──────────────┘ waits └────────────┘ The client waits. The API waits. The payment service replies. Then the order can continue. That makes sense because the answer changes the customer promise. But sync has a price. Every extra synchronous road adds one more place where latency can stack. Every extra dependency can fail the whole chain. One slow service becomes everyone else's problem. Worked latency example. Suppose a mobile checkout request does this synchronously: 1. API auth check = 20 ms 2. pricing service call = 35 ms 3. inventory reservation = 45 ms 4. payment authorization = 120 ms 5. coupon validation = 30 ms 6. notification write = 25 ms Now add them. 20 + 35 = 55 ms. 55 + 45 = 100 ms. 100 + 120 = 220 ms. 220 + 30 = 250 ms. 250 + 25 = 275 ms. Add 40 ms network overhead and serialization cost. 275 + 40 = 315 ms total best-path latency. That may still feel fine. Now assume payment occasionally spikes to 400 ms. Then total becomes 20 + 35 + 45 + 400 + 30 + 25 + 40 = 595 ms. The whole user experience now feels slow because one dependency stretched. So what to do? Keep the critical sync path short. Put only must-know-now steps on that road.

When asynchronous roads are the right choice¶

Choose async when the sender does not need immediate completion. The work still matters, but the caller should not wait for the full chain. Classic fits include email, analytics, recommendation refresh, search indexing, invoice PDF generation, and webhook fan-out. Good async signals: - work can finish later - traffic arrives in bursts - multiple consumers need the same event - downstream failures should be retried separately - producer and consumer should scale independently See the deeper benefit. Async is not only about speed. It is about decoupling. One service can publish an event without knowing which services will care tomorrow. That makes the city's roads more flexible. But async is not magic. It adds queue depth, duplicate delivery, out-of-order processing, and harder debugging. If you choose async, you must own those realities. A common pattern: ┌──────────────┐ OrderPlaced ┌──────────────┐ │ Order service│ ────────────→ │ Event stream │ └──────────────┘ └──────┬───────┘ │ ┌────────────┼────────────┐ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌──────────┐ │ Email │ │ Billing │ │ Analytics│ └─────────┘ └─────────┘ └──────────┘ One event enters. Three consumers react. That is fan-out. If email is down, analytics can still proceed. If analytics lags, billing need not wait. Worked burst example. Suppose a flash sale creates 60,000 successful orders in 10 minutes. Step 1: orders per second. 60,000 ÷ 600 = 100 orders per second. Step 2: downstream fan-out. Each order creates 4 follow-up tasks: email, analytics, inventory feed, loyalty update. 100 × 4 = 400 messages per second. Step 3: one consumer slows down. Assume email workers can process only 250 messages per second for 5 minutes. Incoming email tasks are 100 per second because only one of the four tasks is email. That is fine. No backlog there. Now assume analytics receives all events plus clickstream and reaches 180 messages per second input, but workers process 120 per second. Backlog growth = 180 - 120 = 60 messages per second. Over 5 minutes, 60 × 300 = 18,000 queued analytics messages. That sounds scary, but the checkout path stays healthy because analytics sits off the critical path. This is exactly where async wins.

Request-reply vs fire-and-forget vs event-driven fan-out¶

Now let us separate three patterns that people mix up.

1) Request-reply¶

Caller waits for response. Use when business logic needs the answer immediately. REST and gRPC are common transports here.

2) Fire-and-forget¶

Sender publishes work and does not wait for the result. Use when the act of handing off is enough for now. Be careful. True fire-and-forget without durable storage can lose work. In serious systems, you usually want enqueue-and-acknowledge, not blind hope.

3) Event-driven fan-out¶

A service publishes a fact, not a command. Many consumers react independently. This is powerful when one business event should wake up several domains. Example. OrderPlaced can drive billing, warehouse picking, coupons, notifications, and analytics. The publisher should not hardcode every future consumer into one giant sync chain. So what is the HLD decision? Ask what kind of dependency exists. If service A needs service B's answer, keep sync. If A only needs to announce that something happened, async often fits better. If many systems need the same business fact, event-driven fan-out is usually the cleanest road shape.

Coupling, retries, and failure behavior¶

Synchronous chains create temporal coupling. Both sides must be up now. Both sides must respond within the same deadline. Asynchronous systems reduce temporal coupling. Producer and consumer can be alive at different moments. But async does not remove logical coupling. If the event contract changes badly, consumers still break. So design contracts carefully in both worlds. Now the retry question. In sync, retries are usually immediate and visible to the caller. In async, retries happen in background workers and can stretch for minutes or hours. That changes idempotency design. Suppose a payment-confirmed event is delivered twice. If the email service sends two receipts, that is annoying. If the loyalty service adds points twice, that is expensive. So what to do? Consumers must handle duplicates safely. Another worked comparison helps. Case A: everything sync. Checkout API calls payment, inventory, email, ledger, analytics, and recommendation refresh in sequence. Assume each dependency has 99.9% availability. End-to-end availability if all six must succeed = 0.999^6. 0.999 × 0.999 = 0.998001. 0.998001 × 0.999 = 0.997003. 0.997003 × 0.999 = 0.996006. 0.996006 × 0.999 = 0.995010. 0.995010 × 0.999 = about 0.994015. That is 99.4015%. Case B: sync only payment and inventory, move four side effects to async. Critical-path availability = 0.999^2 = 0.998001, or 99.8001%. See the jump. By shortening the must-succeed-now chain, you improved the customer-visible path. This is why architects obsess over what belongs on the request path.

A practical HLD recipe¶

Use synchronous communication for identity checks, payment auth, seat locks, and direct user queries. Use asynchronous communication for notifications, audits, enrichments, search indexing, and downstream fan-out. Mix both on purpose. A healthy architecture rarely says "everything sync" or "everything async." It says, "these decisions need a reply now, and these others can happen off-path." Ask these four questions before choosing: 1. What must the user know before the screen changes? 2. What can be retried safely later? 3. Which consumers should scale independently? 4. Which failure should block the whole action, and which failure should only create backlog? If you answer those honestly, the communication pattern becomes much clearer.

Where this lives in the wild¶

Amazon checkout — payment authorization and inventory checks happen on synchronous request paths because the buyer must know whether the order is confirmed before leaving the page.
Uber — trip events fan out asynchronously to billing, receipts, fraud checks, and internal analytics so the rider app is not blocked by every downstream consumer.
Slack — sending a message feels synchronous to the author, but indexing, notifications, compliance export, and analytics can run asynchronously after the message is accepted.
Stripe webhooks — event delivery is asynchronous by design because merchant systems may be offline, slow, or retrying, yet the payment platform must continue processing safely.
LinkedIn — profile updates can publish events that asynchronously refresh search indexes and recommendation systems instead of forcing every derived system into one inline request.

Pause and recall¶

What is the simplest test for deciding whether a communication path should stay synchronous?
Why does moving side effects off the critical path improve both latency and availability?
When is event-driven fan-out cleaner than calling four downstream services directly?
What new responsibilities appear when you choose asynchronous communication?

Interview Q&A¶

Q: Why use sync for payment authorization but async for email sending? A: Payment authorization changes whether the order is valid right now, so the caller needs the answer before continuing. Email sending is a side effect that can succeed later through retries without changing the immediate customer promise. Common wrong answer to avoid: "Async is always better because it is faster" — async hides waiting from the caller, but it adds queues, retries, and delayed failure handling. Q: Why choose event-driven fan-out instead of one service calling four downstream services directly? A: Because a single business fact may need many independent consumers with different latency and scaling profiles. Publishing one event reduces direct coupling and lets each consumer fail or retry without stretching the original request path. Common wrong answer to avoid: "Because events remove coupling completely" — they reduce temporal coupling, but contract coupling and data semantics still matter. Q: Why might gRPC still be the right choice even in a highly scalable system? A: When the caller needs a fast, structured reply and the workflow is tightly bounded, gRPC gives efficient synchronous communication. Scalability does not automatically mean queues everywhere; it means choosing the right boundary for waiting. Common wrong answer to avoid: "Queues scale better, so use queues for user-facing reads" — many user-facing reads need direct answers, not eventual completion. Q: Why shorten synchronous chains even when each dependency is individually reliable? A: Because reliability multiplies across the chain. Several 99.9% services combined on one request path produce lower end-to-end availability and higher latency variance than any single service alone. Common wrong answer to avoid: "99.9% is basically 100%, so six services are fine" — compounding failure and latency is exactly the hidden danger.

Apply now (5 min)¶

Pick one product flow you know well: signup, checkout, ride booking, or video upload. List every downstream action after the main request starts. Mark each action as either "must reply now" or "can happen later." Then redraw the flow with two roads: 1. one synchronous critical path 2. one asynchronous side-effect path Now estimate numbers. Assume the main request gets 200 actions per second and produces 3 follow-up events each. How many background messages per second does that create? Show the math before you answer. Sketch from memory: draw one user request entering the system, one sync road that waits for reply, and one async road that fans out to three consumers. Label which failures block the user and which only create backlog.

Bridge. Traffic flows, but all cars still enter through one gate. That gate becomes the next bottleneck. → 06-load-balancing-and-routing.md