05. Building Blocks Toolkit — The few parts behind almost every design¶

~16 min read. Most systems are not invented from scratch. They are assembled from familiar parts.

Built on the ELI5 in 00-eli5.md. A prep station — one component doing one job well — is useful only when the restaurant actually needs it.

1) You do not need infinite components¶

System design interviews feel huge at first. So many products. So many architectures. So many acronyms. Look. Underneath, most systems reuse a short toolkit. Not identical tools. Identical categories. If you know what each category does, you can compose designs calmly. That is the point of this file. Not to memorize brand names. To learn the jobs. Here is the picture first.

users
  │
  ▼
CDN ──→ load balancer ──→ web/app servers ──→ database
                     │               │
                     │               ├──→ cache
                     │               ├──→ message queue
                     │               └──→ object storage

Simple, no? Not every design needs every box. A CRUD admin tool may skip CDN, queue, and object storage. A video platform absolutely will not. The trick is not remembering the full diagram. The trick is mapping each block to one pressure.

2) The core blocks and their one-line mental models¶

Load balancer¶

Mental model. Front door traffic cop. It spreads requests across many servers. When you need it. When one server is not enough. When you want health checks and failover. When you want to hide individual server addresses. What it does not do. It does not fix slow application code. It does not store durable state.

Web or application servers¶

Mental model. The active kitchen where request logic runs. They validate input. Run business logic. Call storage. Return responses. When you need them. Always, unless the system is almost fully static. Best habit. Keep them as stateless as possible. That makes horizontal scaling easy.

Database¶

Mental model. Source of truth for durable state. When you need it. Whenever the system must remember structured data across requests. Users. Orders. Payments. Relationships. Events. What to watch. Consistency rules. Read and write volume. Index strategy. Replication and backups.

Cache¶

Mental model. A fast short-term memory for hot data. When you need it. When reads are frequent. When the same data is fetched repeatedly. When latency targets are tight. What to watch. Eviction. TTL. Staleness. Invalidation.

Message queue¶

Mental model. A waiting line that absorbs bursts and decouples work. When you need it. When some work can happen asynchronously. When producers are faster than consumers. When retries matter. When spikes would otherwise overload the app tier. What to watch. Ordering. Idempotency. Retries. Dead-letter handling.

CDN¶

Mental model. A global close-to-user copy layer. When you need it. When the same static or semi-static bytes are read from many places. Images. JS bundles. CSS. Video chunks. Public assets. What to watch. Cache hit rate. Purge strategy. Signed URLs if content is protected.

Object storage¶

Mental model. Cheap durable home for blobs. When you need it. When files are large. When media volume grows. When database rows should store metadata, not full binary bodies. What to watch. Access control. Lifecycle rules. Upload patterns. CDN integration.

Optional extra: search index¶

Mental model. A read-optimized copy built for finding things fast. When you need it. When keyword or filtered retrieval matters more than primary transaction writes. What to watch. Index freshness. Rebuild cost. Eventual consistency with the source of truth. See. These are the usual prep stations. Once you know their jobs, the interview feels smaller.

3) How to choose blocks from requirements, not from habit¶

Candidates often add components by reflex. That is dangerous. A queue is not a badge. A cache is not a default virtue. A CDN is not needed for every product. So what to do? Use requirement-to-block mapping.

need                      ──→ likely block
many identical reads      ──→ cache
static assets worldwide   ──→ CDN
durable structured state  ──→ database
async work or bursts      ──→ message queue
many app instances        ──→ load balancer
large files or media      ──→ object storage
text search               ──→ search index

This is not law. It is a first pass. Then ask the next question. What problem disappears if I add this block? If the answer is vague, do not add it yet. Example. A queue helps when write spikes would overwhelm email sending workers. A cache helps when many users request the same hot profile or feed object. A CDN helps when users are far away from origin and assets are cacheable. If the system has none of those pressures, skip the block. That restraint is senior behavior.

4) A worked selection example¶

Prompt: design a link preview service for a chat app. Assume these house rules. 5 million DAU. Each DAU sends 12 messages per day. 20% of messages contain a URL. Each unique URL preview is reused often because many people share the same links. Preview generation involves fetching a page and parsing metadata. Read path latency target is 120 ms for already-known previews. Fresh preview generation can happen asynchronously within a few seconds. Now choose blocks one by one. Total messages per day = 5,000,000 × 12. = 60,000,000 messages per day. URL messages per day = 60,000,000 × 20%. = 12,000,000 URL messages per day. Average URL writes per second = 12,000,000 ÷ 86,400. ≈ 139 per second. Assume peak factor = 5. Peak URL writes ≈ 695 per second. Now interpret. Do we need a load balancer? Yes. There will be multiple app servers. Do we need web or app servers? Yes. They handle URL detection and preview lookup. Do we need a database? Yes. Preview metadata must persist. Do we need a cache? Yes. Already-known previews must return within 120 ms and are highly reusable. Do we need a message queue? Yes. Fresh preview fetch and parse work can happen async. The queue absorbs bursty URL submissions. Do we need object storage? Maybe. If we store screenshots or large images, yes. If we store only metadata, maybe not yet. Do we need a CDN? Only if preview images or screenshots are served widely. Not mandatory on day one for metadata-only previews. See how the design stayed selective? We picked blocks because of pressure. Not because the toolkit list exists.

5) What each block is bad at¶

This is important. A tool helps because it has strengths. It also brings cost. Load balancer cost. Extra hop. Routing policy complexity. Application server cost. Need stateless discipline if you want easy scaling. Database cost. Becomes a central dependency and often the hardest thing to scale. Cache cost. Staleness bugs and invalidation logic. Queue cost. Operational delay and harder end-to-end debugging. CDN cost. Purge behavior and cache consistency complexity. Object storage cost. Extra indirection and eventual consistency between metadata and files. Search index cost. Dual-write or asynchronous indexing complexity. Look. Senior candidates mention both sides. "I add cache for latency, but I accept staleness management." That sentence sounds much better than "I will add Redis."

6) A quick assembly pattern you can reuse¶

When the prompt starts, you can build the first draft in layers. Layer one. How do requests enter? Usually load balancer plus app servers. Layer two. Where does truth live? Usually a database. Layer three. What is hot and repeatable? Maybe a cache. Layer four. What is slow, bursty, or asynchronous? Maybe a queue plus workers. Layer five. What is large and static? Maybe object storage plus CDN. That is enough for many first-round designs. Simple, no? Do not start by aiming for all eight blocks. Start by walking one order ticket through the system. Then add only the blocks that remove visible pain. That is how a clean restaurant comes together.

Where this lives in the wild¶

Instagram media upload path — staff backend engineer: separates metadata database, object storage, CDN, and async thumbnail workers on purpose.
Stripe webhooks delivery platform — senior infrastructure engineer: uses app servers, durable storage, and queues because retryable async delivery is core.
Netflix video delivery — principal edge engineer: leans heavily on CDN and object storage because global byte delivery dominates.
GitHub repository page rendering — staff engineer: relies on app servers, databases, caches, and search index because hot reads and lookup matter.
Slack link unfurl service — senior platform engineer: combines cache, queue, metadata store, and sometimes blob storage depending on preview assets.

Pause and recall¶

What pressure usually justifies adding a cache?
Why is a message queue better thought of as a waiting line than as a generic performance booster?
When does object storage become a better fit than keeping blobs inside the main database?
Why is it valuable to say what a block is bad at during an interview?

Interview Q&A¶

Q: Why choose blocks from requirements, not from a memorized template? A: Components exist to relieve specific pressure, so you should add them only when the requirements make that pressure explicit. That keeps the design explainable and prevents you from collecting boxes that increase complexity without solving a real problem.

Common wrong answer to avoid: "Because templates are for junior engineers." — Templates are not bad by themselves; the mistake is applying one without showing why each block is needed here.

Q: Why keep application servers stateless when possible, not store session state locally? A: Stateless servers are easier to scale, replace, and fail over because any healthy instance can handle the next request. Local sticky state couples routing to machine identity, which makes balancing, recovery, and rolling changes much harder.

Common wrong answer to avoid: "Because stateless systems are always faster." — Statelessness is mainly an operational and scaling advantage, not a universal latency guarantee.

Q: Why add a queue for asynchronous work, not just spawn more threads in the app server? A: A queue gives you buffering, retry control, and decoupling between producers and workers when traffic arrives in uneven bursts. More threads may increase concurrency, but they do not preserve pending work well or create the same control over failure handling.

Common wrong answer to avoid: "Because queues are more scalable than threads by definition." — The value is not magical scalability; it is controlled backpressure and more reliable async processing.

Q: Why store large media in object storage, not the primary relational database? A: Large blobs have different access patterns and economics than transactional rows, so they usually belong in storage optimized for cheap durable bytes. The relational database should normally keep metadata and references, while object storage pairs naturally with CDN delivery for media-heavy reads.

Common wrong answer to avoid: "Because databases cannot store binary files at all." — They can, but doing so often makes cost, scaling, and operational behavior worse for this workload.¶

Apply now (5 min)¶

Take the prompt: design a note-taking app with image attachments. Write the requirements in one minute. Then choose only the blocks you need. For each block, force yourself to complete this sentence. "I am adding this because..." Then sketch from memory: - entry path - source of truth - hot read optimization - async or media path

If one block has no clear reason, remove it.¶

Bridge. You have the parts. But parts sitting on a table do not make a restaurant. Data needs to flow. Write path, read path, one ticket at a time. → 06-data-flow-first-design.md