Skip to content

14. Honest admission — seniority is knowing where the map is fuzzy

~16 min read. The mature answer is often not certainty; it is clear uncertainty with good judgment.

Built on the ELI5 in 00-eli5.md. The house rules — the constraints that shape the whole restaurant — still matter here, but they do not always produce one perfect answer.


1) First picture: some design questions stay genuinely contested

See. System design is not a school worksheet. Many questions have multiple defensible answers.

speed ───────────────┐
consistency ─────────┼──→ tension zone
cost ────────────────┤
team complexity ─────┘

Look. If one option were always best, senior engineers would stop arguing. They do not. Because context keeps changing. Traffic changes. Team skill changes. Budget changes. Regulatory pressure changes.

So what to do? Do not bluff certainty. State the invariant. State the tradeoff. State the condition that would change your choice. Simple, no?

This is why the last lesson matters. First principles are strong. They still stop before the final meter.


2) Microservices versus monolith: there is no permanent winner

Now what is the temptation? To answer this like religion. That is a mistake.

A monolith gives you:

  • simpler local development,
  • simpler debugging,
  • easier transactions,
  • and fewer network hops.

Microservices give you:

  • stronger team isolation,
  • independent deploys,
  • more targeted scaling,
  • and clearer fault domains when boundaries are real.

The hard part is that both lists are true.

┌──────────────┬────────────────────────────┐
│ monolith     │ microservices              │
├──────────────┼────────────────────────────┤
│ simpler dev  │ stronger isolation         │
│ fewer hops   │ targeted scaling           │
│ easy joins   │ independent deploys        │
│ one release  │ more operational overhead  │
└──────────────┴────────────────────────────┘

Look. A 6-person team with one product surface may do better with a modular monolith. A 40-team platform with hot domains and independent release cycles may need service boundaries.

So what to do? Ask four things.

  • Are domain boundaries stable?
  • Is team ownership already split?
  • Do different parts need very different scaling?
  • Is deploy coordination becoming painful?

If the answer is mostly no, a monolith is not childish. If the answer is mostly yes, microservices are not hype. The kitchen shape should follow the real work, not fashion.


3) Eventual consistency: how eventual is acceptable?

See. People say, "eventual consistency is okay." Okay for what? That is the whole question.

Worked example. Suppose inventory updates replicate with a 400 ms delay. A flash sale receives 2,500 purchase attempts per second. The stale visibility window is 0.4 seconds. Potential stale attempts are:

  • stale attempts = 2,500 × 0.4 = 1,000 attempts

Now imagine only 120 units remain. One thousand stale attempts is obviously dangerous. For inventory reservation, that lag may be unacceptable.

Now compare that with social likes. If 1,000 users briefly see an old like count, the business may not care. Same lag. Very different consequence.

Simple, no? Acceptable staleness is a business decision disguised as a technical one. The house rules decide it. Not the database slogan.

Good questions to ask are:

  • What becomes incorrect during the stale window?
  • For how long?
  • For which users?
  • Is the error reversible or harmful?

Look. A stale analytics dashboard is annoying. A stale bank balance is dangerous. Do not treat them as the same design problem.


4) Cost estimates and multi-region choices stay slippery

Now what is another place where interviewers disagree? Cost. People want exact numbers. Reality gives ranges.

Quick example. Suppose you forecast 50 TB of monthly egress. At $0.09 per GB:

  • 50 TB = 50 × 1,024 GB = 51,200 GB
  • forecast cost = 51,200 × 0.09 = $4,608

Actual usage becomes 80 TB. Then:

  • 80 TB = 80 × 1,024 GB = 81,920 GB
  • actual cost = 81,920 × 0.09 = $7,372.80
  • delta = 7,372.80 - 4,608 = $2,764.80

Look. A reasonable estimate can still be very wrong. That does not mean estimating is useless. It means you should communicate the confidence range.

Now multi-region. Suppose a user writes in Mumbai. Local database commit is 8 ms. If you synchronously replicate to Frankfurt and coordination adds a full 110 ms round trip plus 10 ms overhead, write latency becomes:

  • local commit = 8 ms
  • cross-region round trip = 110 ms
  • coordination overhead = 10 ms
  • total ≈ 128 ms

If you use active-passive with async replication, the write may stay close to 8 ms locally. But failover becomes slower, and the passive region may lag.

So what to do? Choose based on the failure you fear most. If regional write latency is sacred, active-passive may win. If regional outage tolerance is sacred, active-active may win. But active-active also brings conflict resolution, testing burden, and more cost.

The waiting line can smooth bursts. It cannot solve geography. Physics still collects its rent.


5) What to say when you genuinely do not know

See. The worst move is fake certainty. The better move is structured honesty.

A strong answer sounds like this:

I do not know X from direct experience.
My current default is Y because of constraint Z.
The main unknown is A.
I would validate it with B.

Example. "I have not run an active-active ledger personally at this scale. My default would be active-passive because monetary correctness matters more here than the lowest write latency. The main unknown is the required regional recovery target. I would validate that with business continuity requirements before locking the design."

That is not weakness. That is judgment.

Now what should you avoid? Not only, "it depends." That phrase is incomplete. You must say what it depends on.

Useful decision axes are:

  • user harm,
  • correctness risk,
  • latency target,
  • operational complexity,
  • team maturity,
  • and cost sensitivity.

The restaurant still needs a decision. Honesty does not replace decision-making. It improves it.

A senior engineer can say, "Here is my current choice, here is the uncertainty, and here is how I would reduce it." That lands well in interviews and in real design reviews. Yes?


Where this lives in the wild

  • GitHub platform architecture — a platform architect extracts services only where ownership and failure isolation justify the split, instead of splitting the monolith everywhere.
  • Amazon cart and inventory — a retail engineer may tolerate brief staleness in cart counts but not in payment or reservation correctness.
  • Razorpay settlement systems — a reliability lead keeps money movement much stricter than merchant analytics dashboards because the harm profile is different.
  • Cloudflare global traffic management — an edge architect may choose active-active for routing layers while keeping some control-plane data paths simpler.
  • Notion sync backend — a product infrastructure engineer weighs multi-region latency, conflict resolution, and cloud spend together, not one by one.

Pause and recall

  • Why does the microservices-versus-monolith debate stay unresolved across teams?
  • In the stale-window example, what made 400 ms unacceptable for inventory but tolerable for likes?
  • Why are cost estimates useful even when they are wrong?
  • What is the right structure for admitting uncertainty in an interview?

Interview Q&A

Q: Why admit uncertainty, not bluff a definite answer when the tradeoff is genuinely open? A: Because design quality depends on assumptions. Naming the unknowns shows better judgment than pretending the uncertainty does not exist.

Common wrong answer to avoid: "Confidence means never saying you are unsure" — mature confidence includes uncertainty management.

Q: Why choose a monolith, not microservices, in some serious systems? A: Because simpler deployment, simpler transactions, and lower operational overhead can outweigh service-level isolation when the team and domain are still compact.

Common wrong answer to avoid: "Microservices are always more scalable" — they scale some dimensions better while making others harder.

Q: Why choose active-passive, not active-active, for some multi-region systems? A: Because active-passive can preserve simpler correctness and lower steady-state write latency when conflict resolution risk is more dangerous than slower failover.

Common wrong answer to avoid: "Active-active is strictly better because both regions stay live" — it adds serious coordination and correctness complexity.

Q: Why is saying "it depends" not enough by itself? A: Because the useful part is the decision axis. You must say what it depends on and how that changes the recommended design.

Common wrong answer to avoid: "It depends shows nuance" — nuance without decision criteria sounds evasive, not senior.


Apply now (5 min)

Pick one unresolved design debate from your notes. Write two options. Under each, list the main win, the main cost, and the business condition that would make you change your mind. Then write one honest-admission sentence for what you still do not know.

Sketch from memory:

  • the tension-zone diagram,
  • the 400 ms stale-window math,
  • and the sentence template for structured uncertainty.

Bridge. First principles are set. Now we apply them to full high-level designs — complete architectures for real systems, end to end. → ../01_hld_high_level_design/00-eli5.md