02. Component Decomposition — splitting the city into clear zones¶

~15 min read. A giant box is comforting until one hot corner melts the whole system.

Built on the ELI5 in 00-eli5.md. The blueprint — the city map of zones and links — now gets sharper borders so each zone owns one job.

1) One big box hides too many problems¶

Your first HLD often starts as one application box. That is fine for minute one, not minute ten.

A single box hides different business capabilities, different scaling curves, and different failure costs.

Suppose you are designing food delivery. Posting restaurant menus is mostly write-light and admin-heavy. Searching restaurants is read-heavy. Checkout is money-sensitive. Delivery tracking is location-heavy and realtime.

If you keep these inside one unnamed blob, you cannot reason clearly. You do not know which part needs a cache, stricter consistency, or graceful failure.

That is why decomposition matters. We are not splitting for fashion. We are splitting for control.

A bad starting picture looks like this: ┌──────────┐ ┌──────────────────────┐ ┌──────────┐ │ Client │──→│ Food App Backend │──→│ DB │ └──────────┘ └──────────────────────┘ └──────────┘

Everything enters one blob. Everything exits one blob. Every future argument becomes emotional.

Good decomposition asks one simple question. What distinct responsibilities are fighting inside this box?

The answer usually appears in four dimensions: - business capability - data ownership - scaling pattern - failure isolation

Simple, no?

2) Start with business capabilities, not deployment count¶

Many people split by technical layers first. One service for controllers, one for logic, one for repositories.

That is not HLD decomposition. That is just code shuffling.

At system level, we split by capability. Capability means a meaningful business promise the system makes.

In food delivery, likely capabilities are: - restaurant catalog - search and discovery - order placement - payment processing - delivery tracking - notifications

Now test each capability. Can it own its own rules? Can it own its own data? Can it change without forcing every other capability to redeploy?

If yes, it is a strong candidate boundary.

See the improved blueprint: ┌──────────┐ ┌─────────────┐ │ Client │──→│ API Gateway │ └──────────┘ └──────┬──────┘ │ ┌────────────────┼────────────────────────────────────┐ ▼ ▼ ▼ ▼ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ Catalog Svc│ │ Search Svc │ │ Order Svc │ │ Track Svc │ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ ▼ ▼ ▼ ▼ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │Catalog DB │ │Index / Cache│ │ Order DB │ │ Location DB│ └────────────┘ └────────────┘ └────────────┘ └────────────┘

Now the city has zones. The blueprint is still high level, but the fights are visible.

Catalog can tolerate slower writes. Search wants indexes and caching. Orders want transactional safety. Tracking wants fast updates and graceful staleness.

This is where single responsibility becomes system-wide. One service should have one dominant reason to change.

If a rule change for restaurant menus forces order service redeploys, your boundary is weak. If delivery ETA logic lives inside payments, your boundary is nonsense.

3) Use three tests before extracting a service¶

Now what is the problem? People oversplit.

Every box becomes a service. Then every click becomes six network hops. Then latency, retries, observability, and deploy coordination become pain.

So what to do? Use tests.

Test A: Change together or not?¶

If two modules change together almost every week, keep them together longer. Frequent co-change means the boundary may be fake.

Example: cart pricing rules, checkout total calculation, and coupon validation. If every business rule touches all three, separating too early may create chatty APIs and duplicate logic.

Test B: Scale together or not?¶

If one capability sees 20x the traffic of another, separation becomes useful.

Search may handle 30,000 QPS. Order placement may handle 2,000 QPS. Keeping them together means you scale expensive order code just to serve search traffic.

Test C: Fail together or not?¶

If notification sending breaks, should checkout stop? Usually no. That means notifications should not sit in the same critical path boundary.

Good boundaries reduce blast radius. A broken recommendation engine should not block login. A broken analytics pipeline should not block payments.

See one more decomposition rule. Data ownership should be crisp. Each service may read many things, but one service should be the primary writer for its core data.

Why? Because shared writes create coordination hell. Two services updating the same order row independently usually means hidden coupling.

4) Worked example: break a commerce box with numbers¶

Assume we start with one commerce backend. Peak traffic looks like this: - product browse: 40,000 QPS - search: 25,000 QPS - cart update: 6,000 QPS - checkout: 2,500 QPS - payment confirm: 1,500 QPS - order status polling: 12,000 QPS

Now see the first clue. Browse + search together are 65,000 QPS. Money path together is 4,000 QPS. Order status is 12,000 QPS.

If one node handles 2,000 mixed QPS safely, naive monolith capacity is: Total peak QPS = 40,000 + 25,000 + 6,000 + 2,500 + 1,500 + 12,000 = 87,000 QPS. Nodes needed = 87,000 ÷ 2,000 = 43.5. Round up = 44 nodes.

But now the waste. We are scaling payment code, fraud checks, and checkout memory footprint across all 44 nodes.

Split it once: - catalog service handles browse metadata - search service handles query and ranking - cart-order service handles cart plus checkout state - payment service handles payment intent and confirmation - order-status service handles read models for customer tracking

Now estimate separately. Catalog browse nodes = 40,000 ÷ 4,000 = 10 nodes, because browse path is cache-heavy. Search nodes = 25,000 ÷ 2,500 = 10 nodes, because search ranking is CPU-heavy. Cart-order nodes = 8,500 ÷ 1,500 = 5.67, so 6 nodes. Payment nodes = 1,500 ÷ 500 = 3 nodes, because each request does external calls and strict checks. Order-status nodes = 12,000 ÷ 3,000 = 4 nodes.

Total after split = 10 + 10 + 6 + 3 + 4 = 33 nodes. We dropped from 44 to 33 nodes. Savings = 11 nodes. Percentage saved = 11 ÷ 44 = 25% less compute.

This is not just cost. Now payment deploys separately. Search scales separately. Order-status can use a denormalized read store without corrupting money flows.

The blueprint is cleaner because the responsibility lines are cleaner.

One warning, yes? Do not split cart and checkout from order writes if they share one tight transaction model today. Keep them together until a real pressure appears. A boundary should remove pain, not create premature distributed-systems pain.

A quick smell list¶

Keep things together when: - they share one transaction almost every time - they change together every sprint - their scale and SLOs are similar - you do not need independent ownership yet

Split them when: - one path is far hotter than the rest - one path has much stricter reliability needs - one path can fail without hurting core business - one team needs faster independent delivery

Where this lives in the wild¶

Amazon — catalog browse, search, cart, checkout, and payments are separated because read volume, business rules, and failure costs differ dramatically.
Uber — rider APIs, driver dispatch, trip pricing, payments, and maps are distinct zones because dispatch latency and payment correctness need different controls.
Swiggy — restaurant catalog, discovery, order orchestration, delivery partner tracking, and customer notifications have different traffic shapes during lunch peaks.
Shopify — storefront reads, checkout, payments, and merchant admin flows are separated because flash-sale browse traffic should not melt the money path.
DoorDash — menu ingestion, consumer search, order lifecycle, and Dasher location tracking are isolated because location churn and transactional order state do not belong together.

Pause and recall¶

What are the four dimensions that usually reveal hidden fights inside one big box?
Why is splitting by technical layers weaker than splitting by business capability?
In the worked example, why did decomposition reduce nodes even before optimizations?
Which service should usually be the primary writer of order state: payment service or order service?

Interview Q&A¶

Q: Why business capability boundaries, not one service per database table? A: Because tables are storage details, while services exist to own behavior. A capability boundary groups rules, workflows, and data ownership around one business promise. One-table-per-service often creates awkward workflows and excessive cross-service chatter.

Common wrong answer to avoid: "Microservices means each table deserves its own service."

Q: Why keep some modules together even if microservices are allowed? A: Because every split adds network latency, retries, observability work, and coordination cost. If two modules change, scale, and fail together, separating them buys little. A bad split turns local complexity into distributed complexity.

Common wrong answer to avoid: "Separate earlier because it is easier to merge later."

Q: Why primary data ownership, not shared writes from many services? A: Because shared writes hide coupling and create race conditions. One owner gives clearer invariants, audit trails, and evolution paths. Other services can read replicas or derived views without fighting for authority.

Common wrong answer to avoid: "Shared database writes are fine if teams communicate well."

Q: Why X not Y: why split search from checkout, not checkout from payment first? A: Search and checkout have radically different QPS, CPU cost, and caching behavior. Checkout and payment are both money-path concerns and often stay tightly coordinated initially. Split the strongest pressure first, not the most fashionable boundary first.

Common wrong answer to avoid: "Always isolate payment first because it sounds more senior."

Apply now (5 min)¶

Take one giant box from a product you know: edtech app, food delivery app, or ticket booking app.

Do this fast: 1. Write 5 business capabilities. 2. Mark which one owns customer-facing money movement. 3. Mark which one sees the highest read QPS. 4. Mark which one can fail without stopping purchases. 5. Draw 4-6 service boxes with one database or index under each owner.

Sketch from memory: Draw the bad one-box diagram first. Then redraw it as zones with one sentence for why each split exists. If your reasons sound like "different team" only, think again.

Bridge. The zones now exist on the blueprint, but blank arrows are not enough. Components need contracts, request shapes, and safe rules for talking to each other. → 03-api-design-at-boundaries.md