Skip to content

06. Data Flow First Design — Follow the ticket, not your panic

~15 min read. When the whiteboard feels blank, trace one request and let the system reveal itself.

Built on the ELI5 in 00-eli5.md. The order ticket — one user request moving through the restaurant — shows us which prep station touches data, and when.


1) Start with movement, not boxes

See. Most candidates freeze because they start with nouns. Load balancer, cache, queue, database. Nice words. No flow. So the page stays blank. Data flow fixes that. We start with one user action. We follow the order ticket end to end. We ask four things. What data arrives? What changes? What must persist? What goes back to the user?

┌────────┐   tap place order   ┌──────────┐   validate   ┌─────────────┐
│ client │ ──────────────────→ │ API tier │ ───────────→ │ order logic  │
└────────┘                     └──────────┘              └──────┬──────┘
                                      write rows ────────────────┤
                                      publish event ─────────────┤
                                      send response ◀────────────┘
That picture is already a design. Simple, no? It tells us there is a write path, a response path, and maybe an event path. Now what is the problem? Many systems do not have one path. They have one write path and many read paths. The write path optimizes correctness. The read path optimizes speed and shape. If you mix both too early, clarity dies. So what to do? First trace the write, then the read, then compare constraints. That is how the kitchen stops looking mysterious.


2) The write path is about state change

Take a food-delivery app. A user taps "Place order." That single tap becomes multiple internal steps. Look.

┌────────┐   ┌────────────┐   ┌──────────────┐   ┌───────────┐
│ client │→ │ auth check  │→ │ order service │→ │ SQL store  │
└────────┘   └────────────┘   └──────┬───────┘   └─────┬─────┘
                                     │                 │
                                     ├──→ outbox table │
                                     │                 │
                                     └──→ response     │
                                  outbox poller ───────┘
                                         └──→ queue ──→ payment/inventory/notification
Notice the order. We authenticate, validate the cart, compute price, reserve an order id, write the order record, record downstream work, and then answer the client. Why this order? Because the write path decides truth. If truth is not durable, everything later is theatre. The user may see success. But the order may vanish. That is fatal. So the write path usually protects a few things. Atomicity for coupled updates. Idempotency for retries. Durability and auditability for accepted work. Your house rules decide how strict this path must be. A payments flow needs stricter writes than a like button. A seat-booking flow needs stricter writes than a profile edit. See the interview trick. The write path is not every side effect. The write path is the minimum safe commit. Email, analytics, search indexing, and partner notification can happen later. But the order row cannot happen later. The payment authorization reference cannot happen later. The outbox entry for guaranteed follow-up cannot happen later. That is the line. Draw that line early.


3) The read path is about serving a question fast

Now the same user opens "Track order." Very different goal. No new truth is being created. We are shaping existing truth.

┌────────┐  ask for status   ┌──────────┐   fan-in   ┌─────────────────┐
│ client │ ────────────────→ │ API tier │ ─────────→ │ order view svc  │
└────────┘                   └──────────┘            └──────┬──────────┘
                        cache lookup ────────────────────────┤
                        read model query ────────────────────┤
                        ETA service call ────────────────────┤
                        response shaping ◀───────────────────┘
Look at the difference. The write path asked, "Can we safely commit?" The read path asks, "Can we answer quickly and clearly?" That changes everything. Maybe the read path hits Redis first. Maybe it reads a denormalized table. Maybe it joins order state with courier ETA. Maybe it tolerates two-second staleness. So we should not force read queries onto write tables. That is the classic trap. One schema cannot serve every question well. The write model likes normalized truth. The read model likes pre-shaped answers. Simple, no? A product page read path may need product name, price, stock, rating, and image URL. A write table may store those pieces across services. That is fine. We can build a read model. We can cache the read model. We can even accept eventual consistency here. Why? Because the user asked for information. Not a money movement. Not a seat lock. This is why following data prevents whiteboard chaos. One order ticket becomes two questions. How do we commit truth? How do we serve truth? Now the boxes become obvious.


4) Worked example: trace one order with numbers

Suppose we are designing order placement for a quick-commerce app. Dinner peak is 500 order attempts per second. Eighty percent pass payment and become accepted orders. So accepted writes are: 500 attempts/sec × 0.8 = 400 committed orders/sec. Each accepted order writes these items. - order row = 2 KB - payment reference row = 0.5 KB - inventory reservation row = 0.5 KB - outbox event row = 1 KB Total durable write per accepted order is: 2 + 0.5 + 0.5 + 1 = 4 KB. Total durable write throughput is: 400 orders/sec × 4 KB = 1600 KB/sec. 1600 KB/sec is about 1.6 MB/sec. Per minute, storage ingest is: 1.6 MB/sec × 60 = 96 MB/min. Per hour, storage ingest is: 96 MB/min × 60 = 5760 MB/hour. That is about 5.76 GB/hour. Now latency. Our house rules say checkout confirmation must return in under 700 ms at p95. We budget the path. - client to edge = 60 ms - edge to API + auth/validation = 35 ms - pricing call = 40 ms - payment authorization = 260 ms - SQL transaction plus outbox write = 35 ms - response serialization and network back = 70 ms Total estimated latency is: 60 + 35 + 40 + 260 + 35 + 70 = 500 ms. So we have 200 ms headroom. Good. Now what should stay synchronous? Everything needed to safely say, "Order accepted." That includes the SQL commit. That includes the payment authorization result. That includes the outbox record for downstream work. What should move async? - sending SMS - pushing restaurant tablet notification - analytics counters - search index updates If each of those took even 80 ms extra, four side effects add: 80 × 4 = 320 ms. Our 500 ms path becomes 820 ms. That breaks the p95 target. See. The numbers tell us where to cut. This is the real power of data-flow-first design. You stop arguing in vague words. You start protecting the critical path.


5) A practical whiteboard checklist

When you feel stuck, say this out loud. What is the user action? What is the minimum durable commit? What events leave that commit? What question does the first read path answer? Which parts can be stale? Which parts cannot? Which component owns each transition? That is enough to begin. A good first-pass design often fits this pattern. - API receives the menu request - one service owns the business transition - one primary store commits truth - one queue carries deferred side effects - one read model serves the common query Now what is the problem after that? Hot reads. Cold writes. Rebuild lag. Fanout cost. Search indexing delay. Good. Those are solvable second-order problems. Blank-whiteboard syndrome is worse. Kill that first. Follow the data. Let the path draw the system for you.


Where this lives in the wild

  • Swiggy checkout — backend engineer separates the order-accept write path from the order-tracking read path so dinner spikes do not corrupt truth.
  • Uber trip request screen — rider platform engineer commits trip creation first, then serves live driver ETA through a different read flow.
  • Stripe PaymentIntent — payments engineer treats confirmation writes and dashboard/reporting reads as different data paths with different latency needs.
  • LinkedIn feed publishing — feed engineer handles post creation, fanout, and timeline reads as separate flows because each path has different bottlenecks.
  • Shopify order admin — merchant tools engineer keeps order writes strict while search-heavy merchant views use read-optimized indexes.

Pause and recall

  1. Why does tracing one order ticket reduce blank-whiteboard syndrome?
  2. What belongs in the minimum safe commit, and what can move async?
  3. Why can the read path tolerate a different schema from the write path?
  4. In the worked example, which step dominated latency, and what design move followed from that?

Interview Q&A

Q: Why design the write path and read path separately instead of drawing one generic request flow? A: They optimize for different things. Writes protect correctness and durability. Reads protect latency and response shape. One diagram hides that tension.

Common wrong answer to avoid: "Because reads are usually more than writes" — volume matters, but the real reason is different constraints on truth versus serving.

Q: Why trace one request end to end before listing components? A: Flow exposes ownership, state transitions, and critical latency edges. Component lists sound smart but miss causality.

Common wrong answer to avoid: "Because interviewers like storytelling" — the value is operational clarity, not presentation style.

Q: Why push notifications and analytics behind async events instead of keeping them in the checkout call? A: They are not part of the minimum safe commit. Keeping them synchronous burns latency budget and couples availability to non-critical work.

Common wrong answer to avoid: "Because queues are faster" — queues help decouple and smooth traffic, but they do not magically make work free.

Q: Why build a read model instead of querying normalized write tables directly for every screen? A: Write schemas preserve clean mutations. Read screens need pre-joined, pre-shaped answers. Forcing one model to do both usually hurts both.

Common wrong answer to avoid: "Because denormalization is always faster" — speed helps, but the deeper issue is matching storage shape to the question asked.

Apply now (5 min)

Exercise: Pick one flow from BookMyShow, Zomato, or UPI. Write the first user action, the minimum safe commit, and three async side effects. Sketch from memory: Draw one write path and one read path. Mark the primary store, one queue, one cached read model, and the slowest synchronous step.


Bridge. Data flows through the system. But where does it rest? The storage decision — SQL or NoSQL — shapes everything downstream. → 07-storage-decision-framework.md