07. Caching at System Level — many small fast warehouses¶

~14 min read. The fastest database query is the one you never send.

Built on the ELI5 in 00-eli5.md. The warehouse — the storage layer — now gets a small fast helper warehouse beside it so repeated reads do not keep hitting the big slow one.

Why we build a cache before we buy a bigger database¶

Think of a huge warehouse at the edge of the city. It stores everything. But it is a little far. And every truck must queue. Now imagine a tiny fast warehouse right beside the shop. Same popular items. Very short walking distance. That is a cache. See. Caching is not magical. It is just storing copies of hot data closer to the reader. The main goal is simple: cut latency, cut database load, absorb traffic spikes, and buy time before deeper scaling work. Now what is the problem? People say, "We added Redis, done." No. Caching is layered. Browser cache is one layer. CDN cache is another. Application cache is another. Database query cache can be another. Each layer is a small fast warehouse placed closer to demand. If the first layer hits, lower layers do nothing. Simple, no? Here is the picture.

┌──────────┐   ┌──────────┐   ┌────────────┐   ┌────────────┐   ┌───────────┐
│ Browser  │──→│   CDN    │──→│ App Cache  │──→│ DB Cache   │──→│ Main DB    │
│ cache    │   │ edge POP │   │ Redis      │   │ query/page │   │ warehouse  │
└──────────┘   └──────────┘   └────────────┘   └────────────┘   └───────────┘
      ▲               ▲               ▲               ▲                ▲
      │               │               │               │                │
      └──── fast hit returns before lower layers even wake up ────────┘

So what to do? Place the smallest, cheapest, fastest store nearest the user. Let misses fall through to the next layer. Do not ask the deepest warehouse first. That defeats the whole point.

Layered caching with one worked example¶

Let us take a product page. Traffic is 120,000 requests per minute. Without caching, the database sees all 120,000 requests. Assume each DB read takes 40 ms. Total DB read work each minute is: 120,000 × 40 ms = 4,800,000 ms. That is 4,800 seconds of read work per minute. Impossible on one box. Now layer by layer. Step 1: browser cache handles repeat visits. Say 35% of requests are browser hits. 35% of 120,000 = 42,000 requests. Requests left = 120,000 - 42,000 = 78,000. Step 2: CDN caches images, CSS, JS, and some public HTML. Say CDN handles 50% of the remaining requests. 50% of 78,000 = 39,000 requests. Requests left = 78,000 - 39,000 = 39,000. Step 3: Redis caches product JSON in the app tier. Say Redis hits 70% of what reaches the app. 70% of 39,000 = 27,300 requests. Requests left = 39,000 - 27,300 = 11,700. Step 4: database query cache handles 30% of leftover repeated SQL. 30% of 11,700 = 3,510 requests. Requests left for the main DB = 11,700 - 3,510 = 8,190. Now compare. Original DB load = 120,000 requests per minute. New DB load = 8,190 requests per minute. Load reduction = 120,000 - 8,190 = 111,810 requests. Reduction percentage = 111,810 / 120,000 = 93.175%. Round it. About 93.2% less read traffic hits the big warehouse. Now latency. Suppose browser hit latency is 5 ms, CDN hit latency is 20 ms, Redis hit latency is 2 ms, DB cache hit latency is 8 ms, and main DB miss path is 40 ms. Weighted average latency becomes: 42,000 × 5 ms = 210,000 ms. 39,000 × 20 ms = 780,000 ms. 27,300 × 2 ms = 54,600 ms. 3,510 × 8 ms = 28,080 ms. 8,190 × 40 ms = 327,600 ms. Add them. 210,000 + 780,000 + 54,600 + 28,080 + 327,600 = 1,400,280 ms. Average per request = 1,400,280 / 120,000 = 11.67 ms. So we went from 40 ms average DB-only reads to about 11.7 ms. That is why layered caching changes the whole system shape.

Cache-aside, write-through, and write-behind¶

Now we need write rules. Because cached data gets stale. Three common patterns matter.

1) Cache-aside¶

Application checks cache first. If hit, return it. If miss, read database. Then write the result into cache. Then return to user. This is the most common pattern for product pages and profiles. Why? Because only hot keys get cached. Cold keys never waste cache memory. Flow looks like this.

┌───────┐   miss    ┌────────┐   read    ┌───────────┐
│ User  │──────────→│ Redis  │──────────→│ Main DB    │
└───────┘           └────────┘           └───────────┘
     ▲                   │                      │
     │                   └──── write back ─────┘
     └──────────── return data after cache fill ────────────

Problem? First request is slow. And invalidation is your job.

2) Write-through¶

Application writes to cache and database in the same request path. So cache stays fresh immediately. Good for counters or session state where reads must see recent writes. Bad side? Every write becomes slower. And you may cache data nobody reads. Example. A profile update takes 15 ms to write MySQL. Redis write adds 2 ms. Total write path becomes 17 ms. You paid 13.3% more write latency. Calculation is: (17 - 15) / 15 = 0.1333.

3) Write-behind¶

Application writes to cache first. Database write happens later in batches. This is very fast for the user-facing write path. Good for analytics, click streams, and non-critical aggregates. But look carefully. If cache crashes before flush, data can vanish. So write-behind needs durable queues or logs. Simple, no? Fast now means risk later. A small number example. Suppose 10,000 like events arrive each second. Single DB writes cost 1 ms each. Doing them one by one needs 10,000 ms of DB work per second. Not possible. Now batch 500 likes per flush. 10,000 / 500 = 20 batches. Say each batch write takes 12 ms. 20 × 12 ms = 240 ms of DB work per second. That is a huge drop. But only if the buffer is durable. Otherwise your tiny fast warehouse becomes a data shredder.

CDN, app cache, DB cache, and invalidation¶

CDN caching is for content many users share: images, video chunks, CSS, and public API responses. The CDN is a global small fast warehouse near the user. Application-level caches like Redis or Memcached hold business objects. Product detail JSON. Cart summaries. Feature flags. Session lookups. Database query cache stores repeated query results or pages. Useful when the same SQL repeats a lot. Less useful when writes are frequent. Now the hardest part. Invalidation. People joke, "Only two hard things exist." One of them is cache invalidation. Why? Because cached copies become lies after writes. Common strategies are these. TTL-based invalidation. Attach an expiry time. Example: product page TTL = 300 seconds. Easy to operate. But data may be stale for up to 5 minutes. Explicit delete on write. When price changes, delete product:123 from Redis. Next read repopulates it. This works well with cache-aside. Versioned keys. Instead of product:123, use product:123:v17. Readers move to v18 after an update. Old keys expire naturally. Good when many CDN nodes hold copies. Event-driven invalidation. Publish product-updated. Subscribers evict or refresh matching keys. Good when many services keep their own caches. Now see one TTL tradeoff numerically. A product price changes once every 2 hours. That is once every 7,200 seconds. You set TTL = 300 seconds. Worst-case staleness = 300 seconds. Average staleness after a random update is about 150 seconds. Because users hit at random points inside the TTL window. Expected stale window = TTL / 2 = 150 seconds. If that is acceptable, fine. If not, TTL alone is too weak. Use delete-on-write or versioning. One more issue. Cache stampede. A hot key expires. Then 5,000 requests miss together. All rush to the main DB. So what to do? Use request coalescing. Use jittered TTLs. Use stale-while-revalidate. Let one request refill while others wait or get slightly stale data. That is disciplined cache design. Not just "put Redis and pray."

Where this lives in the wild¶

Cloudflare CDN — edge POPs cache images, CSS, JS, and full page responses so origin servers see only misses and uncached traffic.
Wikipedia — Varnish caches anonymous article reads in front of MediaWiki and MySQL, which is why repeated page views do not hit the primary database every time.
Shopify storefronts — Redis-backed application caches keep hot product and collection data close to app servers during flash sales.
GitHub — Memcached and Redis reduce repeated reads for sessions, repository metadata, and rendered fragments in the Rails stack.
YouTube — edge caches store popular video chunks near viewers, pushing only cold chunks back to deeper storage.

Pause and recall¶

Why does layering caches reduce load more than using only one cache?
In cache-aside, what exact steps happen on a miss?
Why can write-behind improve throughput and still increase risk?
If TTL is 300 seconds, what is the average stale window after an update?

Interview Q&A¶

Q: Why cache-aside and not write-through for a huge product catalog? A: Cache-aside stores only hot keys. Write-through stores every written key, even cold ones. For massive catalogs, that wastes memory and write latency. So we usually prefer cache-aside unless freshness rules are very strict. Common wrong answer to avoid: "Write-through is always better because the cache is fresh" — freshness is only one dimension. Cost, memory, and write amplification also matter. Q: Why TTL and not permanent caching for public data? A: Permanent caching turns stale data into long-lived lies. TTL gives an upper bound on staleness and lets the system heal even if explicit invalidation fails. It is a safety net, not a full freshness strategy. Common wrong answer to avoid: "Because Redis needs TTL to free memory" — memory pressure is separate. TTL here is mainly about correctness and recovery from missed invalidations. Q: Why use a CDN when Redis already exists? A: A CDN sits near the user and offloads static or shared content before requests reach your application region. Redis still requires the request to reach your backend. They solve different distance problems. Common wrong answer to avoid: "CDN is just internet Redis" — no. CDN nodes specialize in edge delivery, HTTP semantics, and geographic proximity. Q: Why not depend on the database query cache alone? A: Query caches help repeated SQL, but they do not replace browser, CDN, and app caches. They sit too deep in the path. By the time a request reaches the database layer, you already spent network and application work. Common wrong answer to avoid: "Because query caches are old-fashioned" — the real issue is placement, invalidation behavior, and limited coverage, not fashion.

Apply now (5 min)¶

Take an API doing 30,000 reads per minute. Assume browser hit rate 20%. Assume CDN hit rate 40% of the remainder. Assume Redis hit rate 50% of what reaches the app. Compute the final DB read rate step by step. Then compute the reduction percentage. Now add one rule. Price updates must appear within 10 seconds. Choose one strategy between TTL-only, delete-on-write, or versioned keys. Say why in two lines. Sketch from memory: draw the layered cache stack from browser to main database, and mark where cache-aside, write-through, and write-behind usually fit.

Bridge. Cache helps. But the main warehouse still gets hammered on reads whenever keys miss or freshness requirements stay tight. We need more read capacity, not just better shortcuts. → 08-scaling-read-path.md