Skip to content

08. CDN and Edge — storing popular content at nearby postal branches

~15 min read. Distance is expensive, so smart systems move copies closer to the user.

Built on the ELI5 in 00-eli5.md. The post office — the branch that stores and forwards letters — now becomes the nearby edge location holding popular copies for faster delivery.


1) What a CDN really changes

Without a CDN, every user request travels to origin. That means long paths, repeated load, and avoidable latency. A CDN puts copies at edge locations. Users fetch from a nearby post office instead. Only misses travel back to origin. See the simple picture. ┌──────────┐ nearby fetch ┌──────────────┐ │ User │ ─────────────────────→ │ Edge PoP │ └──────────┘ └──────┬───────┘ │ cache miss ▼ ┌──────────────┐ │ Origin │ └──────────────┘ PoP means point of presence. That is just an edge location with servers. It may store files. It may terminate TLS. It may run firewall rules. It may even run code. Now the central idea. CDN is latency reduction plus origin protection. If 90% of image requests hit edge cache, origin sees only 10% of that traffic. Users also avoid long-haul network delay most of the time.

2) Cache-Control tells the edge what to keep

Edge caching is not magic. The response headers teach the CDN what is safe. Common headers: - Cache-Control: max-age=300 - Cache-Control: public, max-age=3600 - Cache-Control: no-store - ETag: "v42" max-age=300 means cached copy is fresh for 300 seconds. public means shared caches may store it. no-store means do not keep this response. ETag helps revalidation when content may or may not have changed. Worked example. Suppose product image traffic is 50,000 requests per minute. Edge hit ratio becomes 92%. Requests served at edge = 50,000 × 0.92 = 46,000. Requests reaching origin = 4,000. If each origin fetch costs 18 ms CPU plus 30 KB bandwidth, origin saves 46,000 × 18 ms = 828,000 ms of work each minute. That is 828 CPU-seconds per minute. Bandwidth saved at origin = 46,000 × 30 KB = 1,380,000 KB. That is about 1.38 GB each minute. See the scale now. Headers become money. One warning. Do not cache user-specific private responses publicly. Personalized cart or bank balance needs care. Static logo file is fine. Authenticated invoice PDF needs tighter rules.

3) Origin pull and origin push are different operating models

In origin pull, edge fetches object from origin on first miss. Later requests reuse cached copy. In push, you proactively upload assets to edge or CDN storage. One diagram helps. ┌────────────┐ miss ┌────────────┐ │ Edge cache │ ───────→ │ Origin │ origin pull └────────────┘ └────────────┘ ┌────────────┐ upload ┌────────────┐ │ Build job │ ───────→ │ Edge store │ push model └────────────┘ └────────────┘ Origin pull is easier operationally. You keep one truth at origin. Edge fills lazily on demand. Push is useful when assets are known and stable. Example: A game release ships 40 GB of patch files. You may pre-position them before launch night. That avoids first-user miss storms. Now consider miss penalty math. User in Pune requests a video thumbnail. Edge miss goes to Singapore origin. Round trip to origin is 120 ms. Edge hit serves in 18 ms. First user pays 138 ms total. Next users pay about 18 ms. That is why warm caches feel magical. They are just prepared distance shortcuts.

4) PoPs, edge compute, and latency math

Edge does more than file caching now. It can also run tiny logic close to user. Examples: - redirect by country - bot filtering - header rewriting - A/B flag checks - image resizing This is edge compute. Small code runs at or near the PoP. So you avoid a full trip to origin for lightweight decisions. Worked latency example. Suppose user-to-origin latency is 180 ms. User-to-edge latency is 22 ms. Edge compute adds 6 ms. Origin compute would have added 35 ms. Path A without edge compute: - request to origin = 180 ms - origin work = 35 ms - response back = included in round-trip assumption - total ≈ 215 ms Path B with edge compute: - request to edge = 22 ms - edge work = 6 ms - total ≈ 28 ms Saved latency ≈ 215 - 28 = 187 ms. That is massive for login redirects or image transforms. But keep edge logic small and deterministic. Heavy database joins still belong deeper inside. Think of edge as a smart post office counter, not as the full head office.

5) Good CDN design depends on keys and invalidation

Cache key decides what counts as same object. Usually key includes hostname and path. Sometimes query string matters too. Bad key design causes incorrect reuse. Example. /image?id=42&width=200 and /image?id=42&width=800 must not map to the same resized asset. One fast checklist helps. - include fields that change representation - exclude tracking parameters that do not change bytes - separate mobile and desktop variants when output differs - separate language variants when text changes That key discipline protects correctness first. Only then does it improve hit ratio. Now invalidation. When content changes, old copies must expire or be purged. Two common patterns: - versioned asset names like app.v19.js - purge or ban requests for changed objects Versioned names are wonderful for static assets. They avoid global purge stress. Dynamic content often needs shorter TTL and revalidation. One final worked example. News homepage gets 200,000 requests in 10 minutes. Cache TTL is 30 seconds. You publish a breaking banner at minute 3. If you wait for TTL only, stale window may last up to 30 seconds. If you purge instantly, edge fetches fresh copy on next request. That is freshness versus origin load tradeoff. No free lunch. Only explicit choice. Good CDN design is careful distance management.


Where this lives in the wild

  • Cloudflare — edge platform engineer tunes cache keys, TTLs, and workers so static and dynamic content take the right path.
  • Akamai — media delivery engineer places large video objects across PoPs to reduce long-haul bandwidth and startup delay.
  • Amazon retail — frontend performance engineer uses CDN caching for product images, JavaScript bundles, and regional edge redirects.
  • YouTube — streaming infrastructure engineer relies on geographically distributed caches so popular video chunks do not always hit origin storage.
  • Zomato — web performance engineer benefits when restaurant images and menu assets are served from nearby edge locations.

Pause and recall

  1. What two big problems does a CDN solve together?
  2. Why is Cache-Control central to edge behaviour?
  3. When would origin push make more sense than origin pull?
  4. Why can edge compute save much more than a few milliseconds?

Interview Q&A

Q: Why use a CDN even if your origin servers are already powerful? A: Distance still costs latency, and repeated traffic still wastes origin bandwidth. A CDN cuts both by serving hot content from nearby PoPs. Common wrong answer to avoid: "Because CDNs are only for static websites" — dynamic acceleration, TLS termination, and edge logic matter too. Q: Why not cache every response aggressively at the edge? A: Some responses are private, personalized, or rapidly changing. Wrong caching can leak data or serve stale business state. Common wrong answer to avoid: "Because cache misses are acceptable" — safety and correctness are the core issue, not only miss cost. Q: Why choose versioned asset names instead of constant purging for static files? A: Versioned names let caches keep immutable files confidently and avoid purge storms. Freshness becomes part of the URL itself. Common wrong answer to avoid: "Because purging is impossible" — purging exists, but versioning is simpler and cheaper for build artifacts. Q: Why is edge compute not a full replacement for origin services? A: Edge code is best for lightweight, local decisions with tight execution budgets. Deep stateful business logic still needs central systems and databases. Common wrong answer to avoid: "Because edge machines are weak" — the real limit is what work belongs close to the user versus close to the data.


Apply now (5 min)

Exercise: Assume an image endpoint gets 80,000 requests per minute. If edge hit ratio is 95%, compute origin requests per minute. Then assume each origin response is 40 KB. Estimate bandwidth avoided at origin each minute. Next, compare a 170 ms origin round trip with a 25 ms edge round trip. Write the latency saved for a cache hit. Sketch from memory: Draw user, edge PoP, and origin. Add one cache hit arrow and one cache miss arrow. Then label one header, one cache key field, and one invalidation strategy.


Bridge. Content is now nearby. Next we study workloads where the conversation itself stays open, because updates keep flowing instead of ending after one reply. → 09-websockets-sse-long-polling.md