02. Requirements Before Boxes — The house rules decide the shape¶

~14 min read. The fastest way to fail is to solve an undefined problem confidently.

Built on the ELI5 in 00-eli5.md. The house rules — the system constraints and goals — must be clear before you build the restaurant.

1) Never draw boxes first¶

Most candidates hear the prompt. Then they grab the marker. Then boxes start appearing. API gateway. Load balancer. Cache. Queue. Database. Look. This feels productive. It is usually wrong. Boxes are answers. Requirements are the question. If the question is fuzzy, the boxes are theater. A system design prompt is usually underspecified on purpose. The interviewer wants to see whether you can shape the problem. Not whether you can recite a standard stack. See this flow.

vague prompt
   │
   ├── What must the system do?
   ├── Who uses it and how often?
   ├── What matters most: latency, cost, durability?
   ├── What is out of scope?
   │
   ▼
clear **house rules**
   │
   ▼
first architecture draft

Simple, no? We do not draw because we are excited. We draw because some requirement forces a shape. If the requirement is low latency, maybe we cache. If the requirement is strict ordering, maybe we serialize writes. If the requirement is low cost, maybe we batch work. So what to do? Ask questions that change the design. Skip questions that are trivia. Bad question. "Should I use PostgreSQL or MySQL?" Too early. Good question. "Do we need strong consistency for the core user action?" That answer actually changes storage choices.

Functional requirements say what users can do. Think of them as the menu. What can a user ask from the system? What must the system return? What state changes happen? A strong candidate extracts the main actions first. Not every action. The main ones. Suppose the prompt is: Design a food delivery tracking system. The essential functional requirements might be these. Users can place an order. Users can view order status. Delivery partners can update location. Users can cancel within a rule window. Restaurants can accept or reject an order. That is enough to begin. Now what is the problem? Weak candidates keep expanding forever. Coupons. Reviews. Loyalty points. Dark mode. Voice search. No. That is feature inventory, not scope control. You need to separate core from optional. A useful sentence is this. "For this discussion, I will optimize the core order lifecycle first." That one sentence saves ten minutes. Good functional scoping usually covers four things. Primary actor. Primary action. Primary state change. Primary read path. If those four are clear, your order ticket has a path. Without that path, even a nice diagram is empty.

3) Non-functional requirements define the pressure¶

Functional requirements tell you what exists. Non-functional requirements tell you what hurts. This is where system design becomes system design. You need to ask about scale. You need to ask about latency. You need to ask about availability. You need to ask about consistency. You need to ask about durability. You need to ask about security or compliance if relevant. These are the real house rules. They decide whether one service is enough. They decide whether replicas are needed. They decide whether a queue is mandatory. They decide whether stale reads are acceptable. Use a short checklist. Who uses it? How many daily active users? How many monthly active users? What is the read to write ratio? What latency target matters for the critical path? What availability target is expected? What data can never be lost? What can be eventually consistent? What region or compliance boundaries exist? See. This is already more valuable than drawing five generic boxes. Candidates often ask DAU and MAU casually. But those numbers are not decoration. They help you estimate peak traffic and storage growth. Read to write ratio changes caching strategy. Latency SLA changes data placement. Availability target changes redundancy decisions. Consistency changes database and workflow choices. Everything flows from this.

4) A worked example: turn "millions of users" into usable scope¶

Prompt: design a notification system for a shopping app. The interviewer says: "Assume millions of users." That sounds big. But it is still useless. So let us pin it down. Assume 12 million MAU. Assume 2 million DAU. Assume each DAU gets 8 notifications per day. Assume users open the app 5 times per day. Assume reads to writes are 3:1 because each open fetches recent notifications. Assume p95 read latency target is 150 ms. Assume delivery availability target is 99.95%. Assume promotional notifications can be delayed. Assume payment success notifications cannot be lost. Now we have something real. Let us derive a few useful facts. Notification writes per day = 2,000,000 × 8. = 16,000,000 writes per day. Writes per second average = 16,000,000 ÷ 86,400. ≈ 185 writes per second. Peak traffic is usually much higher than average. Assume 5× peak. Peak writes ≈ 185 × 5 = 925 writes per second. Now reads. If read to write ratio = 3:1. Peak reads ≈ 925 × 3 = 2,775 reads per second. Now interpret this. Do we need a queue for every notification? Maybe yes for fanout and retry. Do we need global multi-region writes on day one? Probably no, unless the requirement says active-active. Do we need very strong consistency for promotional notifications? No. Do we need durable write acknowledgment for payment alerts? Yes. See how the house rules changed the design already? We did not touch boxes. Still, the system got clearer.

5) How to scope the interview without sounding hesitant¶

Many candidates know they should ask questions. They just ask them badly. They sound random. Or apologetic. Or endless. You need a crisp pattern. Try this. "Before I draw anything, I want to lock the core requirements." "I will clarify functional scope first, then scale and SLAs." "After that, I will propose the simplest design that fits." That sounds senior. It sounds like design review behavior. Now which questions are usually highest value? First, ask what the user absolutely must do. Second, ask what success metric matters most. Third, ask what numbers matter enough to size the system. Fourth, ask what can be out of scope today. Fifth, ask which guarantees are non-negotiable. That is enough to move. Look. Interviews are timed. You are not writing a product requirements document. You are extracting the design-shaping facts. That is why a one-minute summary helps. For example: "I will design for 2 million DAU." "The core flow is publish and read notifications." "Payment alerts need durability." "Promotional alerts can be delayed." "The target is 150 ms reads and 99.95% delivery availability." Now the interviewer knows your frame. Now your future boxes have reasons. Simple, no?

Where this lives in the wild¶

BookMyShow seat booking — staff backend engineer: starts with inventory locking and payment timeout rules before choosing data stores.
Google Drive sharing flow — senior storage engineer: clarifies collaboration semantics, file size limits, and permission latency before sketching services.
Swiggy live order tracking — principal engineer: separates customer reads from courier location writes before discussing streaming and caching.
Razorpay webhooks platform — staff platform engineer: defines retry guarantees, ordering needs, and merchant SLAs before queue or database choices.
LinkedIn notification service — senior infrastructure engineer: distinguishes mandatory member alerts from best-effort growth notifications before architecture review.

Pause and recall¶

Why are boxes considered answers rather than the starting point?
What is the difference between functional requirements and non-functional requirements?
Which requirement questions most strongly change storage, caching, and queue decisions?
Why is "millions of users" not enough to begin designing seriously?

Interview Q&A¶

Q: Why ask about read to write ratio, not just total users? A: Total users tell you scale, but not where the real pressure sits in the system. A read-heavy workload changes caching and indexing priorities, while a write-heavy workload changes storage, contention, and durability choices.

Common wrong answer to avoid: "Because read-heavy systems are more common in interviews." — The ratio matters because it changes the hot path, not because one pattern is trendier than another.

Q: Why define out of scope features, not promise everything? A: Scope control is part of design skill because it keeps the discussion focused on the highest-value problem first. Clear boundaries make the system easier to reason about, estimate, and defend under interview time pressure.

Common wrong answer to avoid: "Because the interviewer probably does not care about extra features." — Extra features are not harmless; they can distort priorities and pull the design away from the core use case.

Q: Why ask for latency and availability separately, not treat both as performance? A: They represent different promises: latency is about how fast requests complete, while availability is about whether the service is usable at all. Since they fail in different ways, they push you toward different design choices, redundancy plans, and operational tradeoffs.

Common wrong answer to avoid: "If latency is low, availability will usually be fine too." — A system can be fast when healthy and still be unavailable too often because the failure modes are different.

Q: Why capture guarantees in words before numbers become exact? A: Qualitative guarantees still shape architecture even before you have precise traffic numbers. Statements like "cannot lose payments" or "promotions can be delayed" already tell you where to spend durability, consistency, and retry budget.

Common wrong answer to avoid: "Without exact QPS, requirement discussion is mostly pointless." — Exact numbers refine the design, but verbal guarantees already narrow the acceptable architecture a lot.¶

Apply now (5 min)¶

Take the prompt: design a document collaboration system. Spend two minutes writing only requirement questions. Force yourself to separate them into two columns. Column one: functional menu items. Column two: non-functional house rules. Then sketch from memory: - the three highest-value questions you would ask first - one sentence that sets scope - one sentence that states a latency or availability target

If you can do that cleanly, your design will start on solid ground.¶

Bridge. Requirements are set. But "millions of users" is not a number. How many QPS? How much storage? We need math. → 03-back-of-envelope-math.md