08. Caching Where and Why — Fast by not repeating yourself¶

~18 min read. If the same read keeps happening, do not make the kitchen cook it from scratch every time.

Built on the ELI5 in 00-eli5.md. The kitchen keeps remaking the same dish, so we place smart shortcuts between the menu request and the real prep station that owns the truth.

1) Cache the answer as close to the question as possible¶

See. Caching is not one thing. It is many layers. Each layer answers a different kind of repeat question.

┌────────┐   ┌──────┐   ┌──────────┐   ┌────────────┐   ┌──────────┐
│ client │→ │ CDN  │→ │ app cache │→ │ app server  │→ │ database  │
└────────┘   └──────┘   └──────────┘   └────────────┘   └──────────┘
                 │             │                │
                 │ static pages│ hot objects    │ query results
                 └─────────────┴────────────────┴──────────────→ truth source

Look at the picture first. A CDN helps when many users ask for the same bytes. Images. JS bundles. Public product pages. An application cache helps when many requests ask for the same business object. User profile. Product details. Feature flags. A database cache helps when the same expensive query repeats. Now what is the problem? People say, "Add Redis." That is incomplete. The real question is where the repeated work happens. If bandwidth is the pain, use edge caching. If serialization or API fan-in is the pain, use app caching. If query execution is the pain, use database-side caching or read models. Simple, no? The best cache is the one that removes the most expensive repeated step. Your house rules decide how stale the answer may be. Your menu decides which layer sees repetition.

2) The three core write/read patterns¶

Most interview answers need these patterns.

Cache-aside¶

The app reads the cache first. On miss, it loads from the database and populates the cache.

Write-through¶

The app writes the database and cache in the same logical step.

Write-behind¶

The app writes the cache first and flushes to storage later. Look at them side by side.

cache-aside:   read → cache miss → DB → fill cache → response
write-through: write → DB + cache together → response
write-behind:  write → cache now → DB later

Cache-aside is the default for read-heavy systems. Why? Because unused keys do not consume write work. Write-through is useful when reads must be hot immediately after writes. Think profile updates or product metadata. Write-behind is useful when write bursts are huge and some delay is acceptable. Think metrics, counters, batched aggregates. Now what is the problem? Each pattern fails differently. Cache-aside can stampede on miss. Write-through can add write latency. Write-behind can lose data if the buffer dies. So what to do? Pick the failure you can afford. That is the adult answer. Not "this pattern is best." There is no best. There is only best for this path.

3) TTL and invalidation decide whether the cache helps or lies¶

A cache is useful only while it is believable. So we need expiry rules. TTL is the blunt tool. Invalidate-on-write is the sharp tool. Most real systems use both.

fresh write ──→ evict key now
                │
                └── TTL still exists as backup when eviction fails

Why is invalidation hard? Because one truth change may affect many cached shapes. Update product price. Which keys changed? - product:123 - seller:45:top-products - category:shoes:page:1 - homepage:deals - search:running-shoes:sorted-by-price See the pain? The cache key is often not the truth key. It is a view key. That is why people call invalidation hard. Not because deletion is hard. Because dependency mapping is hard. So what to do? Use TTL for wide fan-out views. Use explicit invalidation for high-value direct objects. Use versioned keys when you can. Use shorter TTL where wrong answers hurt. Use longer TTL where recomputation is costly and staleness is harmless. Negative caching also matters. If user 999 does not exist, cache that miss briefly. Otherwise your kitchen keeps checking the same empty shelf.

4) Worked example: sizing a cache for product reads¶

Suppose a storefront gets 20,000 product-detail reads per second. The primary database can safely handle 1,500 of those reads per second. So cache misses must satisfy: 20,000 × miss_rate ≤ 1,500. Solve for miss_rate. 1,500 ÷ 20,000 = 0.075. So miss_rate must be at most 7.5%. That means hit_rate must be at least 92.5%. Good. Now picture a two-layer cache. CDN serves 70% of requests. Of the remaining 30%, app cache serves 80%. Database load becomes: 20,000 × 0.30 × 0.20 = 1,200 reads/sec. That fits the 1,500 read/sec limit. Simple, no? Now latency. - CDN hit = 20 ms - app cache hit = 45 ms - DB miss = 180 ms Weighted average latency is: (0.70 × 20) + (0.24 × 45) + (0.06 × 180). Calculate each part. 0.70 × 20 = 14 ms. 0.24 × 45 = 10.8 ms. 0.06 × 180 = 10.8 ms. Total average latency is: 14 + 10.8 + 10.8 = 35.6 ms. Very good. Now what about TTL? Say price changes happen roughly every 10 minutes. That is every 600 seconds. If we set TTL to 60 seconds, worst-case staleness is 60 seconds on a missed invalidation. Average stale window after a missed invalidation is about 30 seconds. If that is acceptable, fine. If price correctness must be tighter, add explicit eviction on price update. Now the dangerous part. One hot product gets 400 requests per second. Its cache entry expires. If all 400 miss together, the database sees a mini-spike. That is the thundering herd. Add single-flight locking. Add request coalescing. Add jitter so keys do not expire together. Add stale-while-revalidate for safe pages. See. Caching is math plus failure handling. Not just a speed trick.

5) Stampede prevention and cache hygiene¶

The thundering herd is the famous problem. But there are others. Problem one. Cold start. After deploy, every cache is empty. Fix: warm hot keys or accept gradual fill. Problem two. Hot key imbalance. One celebrity profile or one match score dominates traffic. Fix: replicate hot keys, shard carefully, or isolate that path. Problem three. Stale forever. A forgotten invalidation leaves wrong data live for hours. Fix: always pair explicit invalidation with TTL backup. Problem four. Cache penetration. Attackers request random missing ids. Fix: negative cache, bloom filter, or rate limit. Problem five. Over-caching. Teams cache a 5 ms query and add a 50 ms invalidation mess. Fix: measure before caching. Look. The goal is not maximum cache usage. The goal is cheaper latency and lower origin load. If a cache makes correctness, operability, or debugging worse, it is not a win. It is debt in a fast costume.

Where this lives in the wild¶

Cloudflare edge cache — web performance engineer serves static assets and public pages near users so origin bandwidth and latency both drop.
Wikipedia anonymous page views — SRE rely heavily on edge and application caching because the same articles are read again and again.
Shopify storefront product pages — commerce engineer cache product details aggressively while treating price and inventory invalidation with more care.
GitHub release asset downloads — platform engineer use CDN caching so large binaries do not keep hitting origin storage.
YouTube thumbnails and static media metadata — serving engineer benefit from edge caching because the same popular objects fan out globally.

Pause and recall¶

Why is "where to cache" a better question than "should we use Redis"?
When does cache-aside fit better than write-through?
In the worked example, what minimum hit rate protected the database?
Why is invalidation hard even when deleting one key is easy?

Interview Q&A¶

Q: Why cache at the CDN layer instead of only inside the application? A: CDN caching removes bandwidth, TLS, and origin hops before the request even reaches your app. If the bytes are public and repeated, edge wins first.

Common wrong answer to avoid: "Because CDN is always faster" — it is faster only when the object can safely live at the edge and the cache key is stable.

Q: Why choose cache-aside instead of write-through for many read-heavy systems? A: Cache-aside keeps writes simpler and only fills keys that users actually read. That avoids turning every write into immediate cache work.

Common wrong answer to avoid: "Because write-through is outdated" — write-through is useful when fresh post-write reads are common and staleness is unacceptable.

Q: Why use TTL plus explicit invalidation instead of only one of them? A: TTL gives a safety net when invalidation misses. Explicit invalidation gives freshness when TTL alone would be too stale.

Common wrong answer to avoid: "Because two methods are always safer" — the point is not redundancy for its own sake, but covering different failure modes.

Q: Why allow stale-while-revalidate instead of forcing every expired key to block on origin? A: For many read paths, slightly stale data is cheaper than a stampede and still good enough for users. It smooths load during refresh.

Common wrong answer to avoid: "Because users never notice stale data" — some paths absolutely cannot serve stale answers, so this is a per-endpoint choice.¶

Apply now (5 min)¶

Exercise: Pick one read-heavy endpoint you know. Write the current origin QPS and safe database QPS. Compute the miss-rate budget. Then pick one cache layer and one invalidation rule. Sketch from memory: Draw client, CDN, app cache, app, and database. Label one path as hit and one as miss. Add one stampede-prevention move beside the hot key.

Bridge. Reads are fast now. But write spikes still overwhelm the database. One kitchen cannot handle 10x traffic. We need to scale — but in which direction? → 09-scaling-dimensions.md