08. Scaling the Read Path — add more lanes for readers¶

~15 min read. When reads explode, do not make every customer stand at one counter.

Built on the ELI5 in 00-eli5.md. The overflow lane — the mechanism for handling excess load — now duplicates the warehouse for read traffic so readers stop fighting with writers.

Why reads become the first big problem¶

Most systems are read-heavy. Users refresh feeds. Users open product pages. Users reload dashboards. Users scroll the same profile again. Writes are fewer. Reads are endless. So the first scaling pain usually comes from reads. See. One primary database can safely accept writes and serve reads at small scale. Then traffic grows. Now what is the problem? Every read competes with writes for CPU, memory, buffer cache, and disk. A long reporting query can delay checkout writes. A viral page can evict hot write pages from memory. Simple, no? Readers and writers are different workloads. So we give readers their own overflow lane. That overflow lane can be a read replica. It can be a CDN node. It can be a denormalized read table. It can be a materialized view. It can even be a separate read model in CQRS. The idea is the same. Duplicate or reshape data for faster reads. Do not force every read through one giant warehouse. Picture the split.

              writes
                │
                ▼
        ┌────────────────┐
        │ Primary DB     │
        │ source of truth│
        └───────┬────────┘
                │ replication / projection
        ┌───────┼───────────────┬───────────────┐
        ▼       ▼               ▼               ▼
   ┌────────┐ ┌────────┐   ┌────────────┐  ┌──────────┐
   │Replica A││Replica B│   │Read View    │  │CDN Edge  │
   │reads    ││reads    │   │materialized │  │static     │
   └────────┘ └────────┘   └────────────┘  └──────────┘

One writer. Many reader paths. That is the read-side overflow lane.

Read replicas with numbers, not slogans¶

Read replicas are copies of the primary database. Applications send writes to the primary. Applications send many reads to replicas. Good. But let us do the math. Suppose one primary can handle 8,000 read QPS comfortably. It also handles 2,000 write QPS. Your product suddenly needs 30,000 read QPS and still 2,000 write QPS. If all traffic hits one primary, total load is 32,000 ops per second. That is 4x the read comfort level even before complex queries. Now add 3 replicas. Replica capacity per node is also 8,000 read QPS. Total read capacity from replicas = 3 × 8,000 = 24,000 read QPS. Keep 6,000 reads on primary only for fresh or fallback queries. Now total available read capacity = 24,000 + 6,000 = 30,000 read QPS. Writes still go only to primary at 2,000 write QPS. So the system now fits. Step by step routing looks like this. Replica A gets 8,000 reads. Replica B gets 8,000 reads. Replica C gets 8,000 reads. Primary keeps 6,000 reads + 2,000 writes. Read total = 8,000 + 8,000 + 8,000 + 6,000 = 30,000. Write total = 2,000. Now the catch. Replication lag. If the primary commits first and replicas catch up 300 ms later, some users read stale data. Example. At 10:00:00.000, user changes address. Primary commits at 10:00:00.040. Replica receives update at 10:00:00.340. A read sent to replica at 10:00:00.200 is stale for 140 ms. Calculation is 340 ms - 200 ms = 140 ms. So what to do? For profile edits, maybe read-your-write requests stay on primary for a short window. For public feeds, replica lag is often acceptable. You must decide by business cost. Not by database marketing. Also remember this. Replicas do not magically fix bad queries. If one dashboard query scans 50 million rows, you only spread pain. You do not remove it.

CDN, denormalization, and materialized views¶

Some reads should never touch your region. Static assets are the obvious case. CSS files do not need primary database freshness. Image thumbnails do not need SQL joins. Video chunks do not need origin disks for every play. That is why a CDN is the cheapest read-side overflow lane for static content. Now dynamic content. Many slow reads are slow because the data shape is wrong. Suppose a product page needs: name, price, inventory, average rating, last 3 reviews, merchant badge, and shipping ETA. If you join 6 tables on every request, the query becomes expensive. Now imagine 12,000 product page reads per second. The primary warehouse starts sweating. Denormalization means we store a read-friendly shape. Maybe one table or document already contains the page payload. Yes, we duplicate data. But we save read cost. Worked example. Assume live join query takes 45 ms. Assume denormalized read row takes 8 ms. Traffic is 10,000 reads per second. DB work for live joins = 10,000 × 45 ms = 450,000 ms each second. DB work for denormalized rows = 10,000 × 8 ms = 80,000 ms each second. Saved work = 450,000 - 80,000 = 370,000 ms each second. Percentage reduction = 370,000 / 450,000 = 82.2%. Simple, no? We paid storage duplication to save query time. Materialized views go one step further. They precompute results. Suppose your dashboard needs daily revenue by city. Raw events table has 200 million rows. Running the aggregate each page load takes 12 seconds. Now create a materialized view refreshed every minute. The view has 500 city rows. Dashboard query now reads 500 rows in 40 ms. You traded freshness for speed. Again, this is a read-side overflow lane. The reader sees a ready-made answer. Not raw parts.

CQRS: separate read model, separate write model¶

CQRS means Command Query Responsibility Segregation. Big name. Simple idea. The write model is optimized for correct updates. The read model is optimized for fast queries. One shape need not satisfy both jobs. See the split.

┌────────────┐   command   ┌──────────────┐
│ App/User   │───────────→ │ Write Model  │
└────────────┘             │ primary DB   │
      │                    └──────┬───────┘
      │                           │ events / replication
      │                           ▼
      │                    ┌──────────────┐
      └──── query ───────→ │ Read Model   │
                           │ view/search  │
                           └──────────────┘

When should you do this? Not on day one. Do it when read patterns are very different from write patterns. Feeds. Search. Analytics. Catalog browsing. Leaderboards. These often want different indexes, tables, or even databases. Now what is the tradeoff? Freshness and complexity. If commands update the write model first, the read model may lag. Suppose an order is placed at 2:00:00. Event reaches projector at 2:00:00.150. Read model updates at 2:00:00.230. A dashboard read at 2:00:00.100 misses the new order. A dashboard read at 2:00:00.300 sees it. So the inconsistency window is 230 ms. Some businesses accept that. Some cannot. Bank balance screens need tighter rules than social feeds. Why X not Y? Why CQRS and not just more replicas? Because replicas copy the same normalized write shape. CQRS lets you build a totally different read shape. That matters when the bottleneck is query shape, not node count. One more warning. Do not call every cache plus replica setup "CQRS." CQRS means explicit separation of command and query models. Not vague scaling language.

Where this lives in the wild¶

Instagram — follower graphs, profiles, and feed reads are served from read-scaled storage tiers so write paths do not fight every scroll refresh.
Netflix Open Connect — CDN appliances serve video segments close to viewers, which is the classic read-side overflow lane for streaming.
Amazon product pages — denormalized page payloads keep title, price, availability, and rating fast to fetch without rebuilding the whole page on each hit.
Shopify analytics dashboards — precomputed reporting tables and cached aggregates keep merchant dashboards responsive even when raw event volume is large.
GitHub repository pages — repeated reads for issues, commits, and metadata are pushed onto replicas and caches instead of hammering the primary writer.

Pause and recall¶

Why do read replicas increase capacity and still fail to guarantee fresh reads?
When is denormalization better than adding one more replica?
What exact problem does a materialized view solve?
In CQRS, why can the read model lag even when writes are correct?

Interview Q&A¶

Q: Why read replicas and not sharding first for a read-heavy app? A: Replicas are the simpler first move when writes still fit on one primary and the real pain is read volume. Sharding changes key placement, routing, rebalancing, and operational complexity. If the write path is not yet the bottleneck, replicas are cheaper and faster to adopt. Common wrong answer to avoid: "Because replicas are faster than shards" — speed is not the core distinction. The real difference is whether you are duplicating the same data or splitting the dataset itself. Q: Why denormalize and not keep doing live joins? A: Live joins are elegant for writes but expensive for hot read paths. Denormalization moves work earlier so the user reads a ready shape. You pay with duplicate data and more update logic, but you often win massively on latency. Common wrong answer to avoid: "Joins are bad" — joins are fine at the right scale. The issue is repeated cost on hot paths, not some universal ban on relational design. Q: Why CQRS and not just more read replicas? A: More replicas help if the same query shape is fine and only capacity is missing. CQRS helps when readers need a different shape, index strategy, or storage engine entirely. It solves a modeling problem, not only a throughput problem. Common wrong answer to avoid: "CQRS is for microservices" — no. CQRS is about separating command and query models, even inside one system. Q: Why not send every read to a CDN? A: CDNs are excellent for static or cacheable shared responses, but many reads depend on identity, permissions, or very fresh personalized state. Those reads need replicas, denormalized views, or dedicated read models deeper in the system. Common wrong answer to avoid: "Because CDNs only store files" — modern CDNs can cache more than files. The real constraint is cacheability and freshness, not file-ness.

Apply now (5 min)¶

Your system gets 24,000 read QPS and 1,500 write QPS. One DB node safely serves 6,000 read QPS plus those writes. Calculate how many read replicas you need if the primary should keep only 6,000 reads. Then assume replica lag is 250 ms. List two API screens that can tolerate that lag and two that should stay on primary after a write. Now sketch one denormalized read row for an e-commerce product page. Include at least six fields. Sketch from memory: draw primary, two replicas, one CDN, and one materialized view, then label which kind of read should go to each.

Bridge. Reads scale. But writes still funnel into one warehouse. Every order, message, and payment still goes through the same door. → 09-scaling-write-path.md