15. Honest Admission — The map is useful, but not complete¶
~14 min read. Senior architects sound calm partly because they know which answers are still messy.
Built on the ELI5 in 00-eli5.md. The blueprint — the city-level plan of zones and connections — matters here because even the best city planners admit they still do not have perfect rules for every trade-off.
1) Microservices versus monolith: no universal winner¶
People love a clean slogan. "Always start monolith." "Always go microservices for scale." Both are too neat.
A monolith can ship faster, keep transactions simpler, and reduce coordination cost. Microservices can isolate teams, scale uneven workloads, and reduce blast radius for some failures.
Now what is the problem? The right answer depends on team size, domain churn, release discipline, operational maturity, and data boundaries. One blueprint does not fit every city.
See the comparison.
┌──────────────┬───────────────────────┬────────────────────────┐ │ Choice │ Wins when │ Hurts when │ ├──────────────┼───────────────────────┼────────────────────────┤ │ Monolith │ small team, fast loop │ scaling teams, tangled │ │ Microservices│ uneven load, autonomy │ many boundaries, ops │ └──────────────┴───────────────────────┴────────────────────────┘
What people miss is migration cost. Breaking one service into five creates new roads, new failure modes, new observability gaps, and often more queues. That may still be correct. It is just not free.
So the honest admission is simple. There is no universal answer. There is only fit for context.
2) Auto-scaling is still more art than science¶
Cloud dashboards make scaling look automatic. Reality is noisier.
Traffic spikes do not arrive in polite straight lines. Marketing campaigns, cricket match finals, payment salary days, and retry storms create ugly shapes. By the time CPU alarms fire, the backlog may already be growing.
A typical blueprint has metrics flowing into a scaler, then into replicas. That sounds clean. The hard part is choosing the right signal and the right lag.
Worked example. Suppose one service handles 1,000 requests per second per node. You run 10 nodes, so comfortable capacity is 10,000 RPS.
At 9:00, traffic is 8,000 RPS. At 9:01, traffic jumps to 15,000 RPS because a sale starts. Step 1: shortfall = 15,000 - 10,000 = 5,000 RPS.
Suppose new nodes take 2 minutes to boot and warm caches. Step 2: extra nodes needed = 5,000 / 1,000 = 5 nodes. Step 3: until those nodes are ready, each minute carries 5,000 excess requests. Step 4: two-minute backlog = 5,000 × 120 = 600,000 queued requests.
Now add retries. If 20% of timed-out clients retry once, Step 5: extra retry load = 600,000 × 0.20 = 120,000 more requests. See how the scaler is already behind.
So what to do? Use predictive signals, queue depth, warm pools, and rate limits. Still, no formula perfectly predicts spikes. Auto-scaling remains a partly human art, especially when the overflow lane itself takes time to open.
3) Distributed transactions remain painful¶
Everyone wants cross-service correctness with local-service freedom. That is why 2PC and Saga keep appearing in interviews.
2PC gives stronger coordination. Saga gives looser coordination through steps and compensations. Both have real costs.
Think of a booking flow with inventory, payment, and order services. You want all three to agree. Simple, no? Not really.
┌──────────────┐ reserve ┌──────────────┐ charge ┌──────────────┐ │ Order svc │ ─────────→ │ Inventory │ ─────────→ │ Payment │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ └──────────────────────── confirm / compensate ───────────┘
Worked example with 2PC first. Step 1: coordinator asks inventory and payment to prepare. Step 2: both lock local state and reply prepared. Step 3: coordinator sends commit.
If payment node replies late after prepare, locks stay longer. If coordinator crashes, participants may wait or require recovery logic. Latency and blocking rise.
Now Saga. Step 4: reserve inventory. Step 5: charge payment. Step 6: create order. If Step 6 fails, run compensations. Step 7: refund payment. Step 8: release inventory.
Cleaner? Sometimes. But compensation may be slow, visible, or legally awkward. A refund after double email confirmation still confuses users.
So the honest answer is not, "Use Saga, problem solved." The honest answer is, "Pick which pain matches the business." One road gives blocking risk. The other gives inconsistency windows.
4) Cost optimization and right-sizing are mostly heuristics¶
Leaders ask, "Can we reduce cloud cost by 30%?" Engineers ask, "Which service is actually oversized?" Neither side gets a clean theorem.
Right-sizing sounds precise. In practice, teams use heuristics: average CPU, p95 CPU, memory peaks, queue lag, reserve ratios, and business seasonality. Each signal lies in its own way.
Average CPU can hide spikes. Peak CPU can overstate steady demand. Memory may be flat while file descriptors explode. A cheap instance may save compute cost but increase latency enough to reduce revenue.
Worked example. Suppose Service A runs on 20 nodes costing ₹4,000 per node per month. Step 1: monthly cost = 20 × 4,000 = ₹80,000.
An engineer sees average CPU at 28% and suggests cutting to 12 nodes. Step 2: proposed cost = 12 × 4,000 = ₹48,000. Step 3: apparent saving = 80,000 - 48,000 = ₹32,000.
But peak traffic uses 70% CPU on 18 nodes during two evening hours. Step 4: missing nodes at peak = 18 - 12 = 6. Step 5: if latency then rises from 120 ms to 380 ms and checkout conversion falls 1.5%, the saved infrastructure cost may lose far more business value.
See. Right-sizing is not only a server spreadsheet. It is a system outcome problem. The blueprint may look cheaper while the city works worse.
5) Observability and multi-region consistency still resist clean answers¶
Teams say, "We added logs, metrics, and traces." Good start. Still not enough. The real problem is correlation: which signal explains the user-visible failure fast enough for action?
Observability remains evolving because systems change shape faster than teams change dashboards. A new async road, a new cache warehouse, or one extra region can make old alerts meaningless.
Now add multi-region data. Users want low latency everywhere. Businesses want correctness everywhere. These wants fight each other.
Suppose your database replicates from Mumbai to Frankfurt in 120 ms average lag. Step 1: user updates profile in Mumbai at time 0. Step 2: Frankfurt reads at time 80 ms. Step 3: Frankfurt still sees stale data because 80 < 120.
So what to do? Attempt 1: wait for both regions before acknowledging write. Latency rises. Attempt 2: acknowledge locally and accept stale remote reads. Consistency drops. Attempt 3: route all writes to one primary region. Simplicity rises, but distant user latency worsens and failover becomes harder.
That is why low-latency multi-region consistency is still not solved in a clean, universal way. The overflow lane of extra regions adds resilience, but it also adds new disagreement paths.
Where this lives in the wild¶
- Amazon retail checkout — principal architect must choose between tighter coordination and compensating workflows across cart, inventory, and payment.
- Uber dispatch platform — scaling engineer tunes auto-scaling against rush- hour spikes where late capacity is almost the same as no capacity.
- Shopify core platform — staff engineer balances modular service boundaries against the operational drag of too many cross-service dependencies.
- Google global products — site reliability engineer weighs low-latency regional reads against replication lag and cross-region failure handling.
- Datadog-like observability stacks — product engineer keeps redefining what to correlate as architectures add queues, caches, and serverless edges.
Pause and recall¶
- Why is "monolith first" not a universal rule?
- In the auto-scaling example, how did boot time turn a 5-node shortage into a large backlog problem?
- Why is Saga not a free replacement for 2PC?
- Why does adding more regions improve resilience but complicate consistency?
Interview Q&A¶
Q: Why choose a monolith over microservices for some systems and not others? A: Team topology, data boundaries, deployment frequency, and failure isolation needs differ. Architecture shape should follow those constraints, not ideology. Common wrong answer to avoid: "Because monoliths cannot scale" — many monoliths scale very far; the question is when their coordination model stops fitting.
Q: Why is auto-scaling not just a matter of setting a CPU threshold? A: CPU is lagging and incomplete. Queue depth, warm-up time, retry behavior, and traffic prediction all matter before extra capacity becomes useful. Common wrong answer to avoid: "Because cloud vendors have bad autoscalers" — the deeper issue is delayed signals and complex workload shape.
Q: Why use Saga instead of 2PC, and why not say Saga is always better? A: Saga avoids some blocking and coordination costs, but it introduces compensation complexity and temporary inconsistency. It is a trade, not a win. Common wrong answer to avoid: "Because 2PC is impossible in microservices" — it is possible; it is just often operationally painful.
Q: Why is low-latency multi-region consistency still hard? A: Physics, partitions, and concurrent writes keep forcing a trade-off between freshness, latency, availability, and operational simplicity. Common wrong answer to avoid: "Because databases are not advanced enough yet" — better tooling helps, but the core trade-off is structural.
Apply now (5 min)¶
Exercise: Take an app you know well, like food delivery or payments. For that app, write one sentence each on where you would currently prefer a monolith, where you would split services, and where you would accept eventual consistency.
Sketch from memory: Draw one blueprint with a single region and one with two regions. Then mark which new roads, warehouses, and uncertainty points appear after the second region is added.
Bridge. The blueprint is done — zones, roads, warehouses, everything at the city level. Now we go inside each building and design the rooms, hallways, and plumbing. That is Low-Level Design. → ../02_lld_low_level_design/00-eli5.md