04. Design News Feed¶

⏱️ Estimated time: 20 min | Level: advanced

ELI5 callback: On this stage, you are pitching a city notice board. Show the blueprint first, then let the choreography explain fan-out, ranking, and celebrity pain.

Step 1: Requirements & Constraints¶

See. First trap is solving the wrong question. Ask crisp questions, then freeze scope.

Functional requirements - Let users create posts and follow other users, because scope must stay explicit. - Show a home feed with recent and relevant posts, because scope must stay explicit. - Support pagination, hiding blocked users, and deletions, because scope must stay explicit. - Allow lightweight ranking beyond pure chronological order, because scope must stay explicit. - Handle celebrity accounts with millions of followers, because scope must stay explicit.

Non-functional requirements - Feed reads should feel quick even at very high fan-out, because that constraint changes architecture. - New posts should appear within seconds for most users, because that constraint changes architecture. - Ranking quality matters, but availability matters more, because that constraint changes architecture. - Storage must handle massive timeline materialization if chosen, because that constraint changes architecture. - System should degrade gracefully during celebrity spikes, because that constraint changes architecture.

Constraints and assumptions - Assume 100 million daily active users, so your estimate stays grounded. - Assume 2 posts per active user per day on average, so your estimate stays grounded. - Assume 20 feed opens per active user per day, so your estimate stays grounded. - Assume average follow graph degree is 200, with huge skew, so your estimate stays grounded.

What to explicitly de-scope - Stories, reels, and ads auctioning are out for now, because interview time is limited. - Comment ranking is a separate subsystem, because interview time is limited. - Cross-device read position sync can come later, because interview time is limited. - Advanced machine-learning ranking pipelines are simplified here, because interview time is limited.

On the stage, say what is in and out. That makes the choreography visible and saves time.

Step 2: Scale Estimation¶

Now watch. Use round numbers, not thesis-level math. One minute of math can remove ten minutes of confusion.

Assumptions - Posts per day = 100 million × 2 = 200 million, so the back-of-envelope math stays honest. - That is about 2,300 writes per second average before peaks, so the back-of-envelope math stays honest. - Feed opens per day = 100 million × 20 = 2 billion, so the back-of-envelope math stays honest. - Average feed reads are about 23,000 per second before peaks, so the back-of-envelope math stays honest.

Quick math - With a 5x peak factor, read traffic crosses 100,000 per second, which directly changes component choices. - If each user stores 500 feed item IDs, timeline storage grows fast, which directly changes component choices. - Celebrity fan-out on write can explode from one post, which directly changes component choices. - Read-time assembly shifts work from writes to reads, which directly changes component choices. - Ranking service cost depends on candidate set size, which directly changes component choices.

Capacity implications - Read path is the main scaling concern for normal users, so the design stays proportional. - Write path is the main concern for celebrity publishers, so the design stays proportional. - Cache assembled timelines for active users, so the design stays proportional. - Keep ranking candidate sets bounded before scoring, so the design stays proportional.

Latency budget - Timeline fetch should stay within low hundreds of milliseconds, because user feel matters early. - Fresh content should appear within seconds for most users, because user feel matters early. - Ranking can use precomputed features to stay cheap, because user feel matters early. - Cold-start users may rely on popularity defaults, because user feel matters early.

These numbers shape the first blueprint. Simple, no? Design follows load.

Step 3: High-Level Design¶

See. Keep the top-level flow boring and understandable. The interviewer rewards a clean blueprint before clever tricks.

┌──────────┐   ┌────────────┐   ┌──────────────┐
│ posters  │──→│ post svc   │──→│ fanout queue │
└──────────┘   └────────────┘   └──────┬───────┘
                                       │
                           ┌───────────┼─────────────┐
                           │           │             │
                     ┌─────▼────┐ ┌────▼─────┐ ┌─────▼─────┐
                     │ graph DB │ │ timeline │ │ object    │
                     │ follows  │ │ store    │ │ media CDN │
                     └──────────┘ └────┬─────┘ └───────────┘
                                        │
                                  ┌─────▼─────┐
                                  │ feed API  │
                                  │ + ranking │
                                  └─────┬─────┘

Main flow - A new post is stored with author metadata and media pointers, so the read and write path stays clear. - Follower graph is consulted to decide fan-out strategy, so the read and write path stays clear. - Normal users may get fan-out-on-write into timeline storage, so the read and write path stays clear. - Celebrity posts may stay in author storage for read-time fetch, so the read and write path stays clear. - Feed API gathers candidate IDs, applies ranking, and returns pages, so the read and write path stays clear. - Caches hold recent timeline pages for active readers, so the read and write path stays clear.

Data model sketch - Post record stores author_id, post_id, created_at, and media refs, so keys and queries stay obvious. - Follow edges map follower_id to followee_id with status fields, so keys and queries stay obvious. - Timeline rows store user_id, post_id, source, and rank hints, so keys and queries stay obvious. - Feature store keeps popularity and freshness signals for ranking, so keys and queries stay obvious.

What to say aloud - Start by naming the classic fan-out-on-write versus read split, so the interviewer hears your structure. - Use reasoning aloud to explain why celebrity accounts need a branch, so the interviewer hears your structure. - Mention that ranking should happen on a bounded candidate set, so the interviewer hears your structure. - State that cache warmth depends on repeat feed visits, so the interviewer hears your structure.

Step 4: Deep Dive¶

So what to do? Pick two hotspots and go deeper. Do not deep dive everywhere.

Component 1: Fan-out strategy and celebrity handling¶

Goal - Balance write amplification against read amplification, so the deep dive has a target. - Keep celebrity posts from melting the write pipeline, so the deep dive has a target.

Design notes - Use fan-out-on-write for normal accounts with modest follower counts, because details must still map to scale. - Switch to fan-out-on-read for huge accounts with massive fan-out, because details must still map to scale. - Store a threshold in config so the branch is explicit, because details must still map to scale. - Precompute candidate lists incrementally to limit read-time work, because details must still map to scale.

Component 2: Timeline assembly and ranking¶

Goal - Assemble enough candidates quickly for a useful first page, so the deep dive has a target. - Apply ranking without turning every request into a giant search job, so the deep dive has a target.

Design notes - Fetch precomputed timeline IDs for most users first, because details must still map to scale. - Merge in celebrity content at read time when needed, because details must still map to scale. - Rank using freshness, affinity, and engagement signals, because details must still map to scale. - Paginate by cursor so inserts do not break ordering badly, because details must still map to scale.

Use reasoning aloud to compare one easy option and one scalable option. Add an honest gap if exact thresholds are unknown.

Interviewer follow-ups to prepare - How do you delete a post already fanned out widely? - What changes when one user follows one million accounts? - How would you support muted accounts and blocked content? - Where would you cache ranked pages versus raw candidate lists?

Why not the simpler alternative? - Pure fan-out-on-write is fast to read, but brutal for celebrities, so tradeoffs stay visible. - Pure fan-out-on-read saves writes, but makes every read heavier, so tradeoffs stay visible. - Full machine-learning ranking is powerful, but too deep for baseline design, so tradeoffs stay visible. - Chronological-only feeds are simple, but often weaker for engagement, so tradeoffs stay visible.

Step 5: Tradeoffs & Failure Modes¶

Now watch. Senior answers end with tradeoffs and breakage paths. That is where judgment shows up.

Tradeoffs - Fan-out-on-write gives fast reads, but high storage and write cost, so the interviewer hears the cost clearly. - Fan-out-on-read lowers write amplification, but hurts read latency, so the interviewer hears the cost clearly. - Caching ranked pages is fast, but staleness becomes visible, so the interviewer hears the cost clearly. - Aggressive ranking improves relevance, but reduces transparency, so the interviewer hears the cost clearly. - Hybrid strategies are practical, but operationally more complex, so the interviewer hears the cost clearly.

Failure modes - Timeline fanout workers can fall behind during celebrity posts, because real systems always break somewhere. - Follower graph outage can block post distribution decisions, because real systems always break somewhere. - Ranking service slowdown can delay feed responses, because real systems always break somewhere. - Delete propagation can miss some cached timelines temporarily, because real systems always break somewhere. - Hot users can create cache stampedes during major events, because real systems always break somewhere.

Recovery levers - Fallback to chronological ordering if ranking is slow, so failure discussion ends with action. - Use a celebrity bypass lane that skips full fan-out, so failure discussion ends with action. - Invalidate cached pages lazily rather than synchronously everywhere, so failure discussion ends with action. - Expose post-delete replay jobs for missed timelines, so failure discussion ends with action.

Close with an honest gap on one metric you would validate live. That sounds calm, not weak.

Interview Q&A¶

Q1. Why not always use fan-out-on-write? A: Because celebrity accounts make write amplification extreme. One post could trigger millions of timeline inserts immediately. Common wrong answer to avoid: Fan-out-on-write is the universally correct answer.

Q2. Why not always assemble feeds at read time? A: Because read traffic is massive and users expect fast feeds. Precomputation buys a lot for common users. Common wrong answer to avoid: Read-time assembly is always cheaper because storage is expensive.

Q3. Where should ranking happen? A: After you gather a bounded candidate set. Ranking everything in storage directly is too expensive and hard to tune. Common wrong answer to avoid: Ranking should happen across the entire corpus on every request.

Q4. How do deletions work with materialized timelines? A: Mark the post deleted in source storage, then invalidate or filter timeline entries asynchronously during reads and background cleanup. Common wrong answer to avoid: Deleting one post means scanning every timeline synchronously.

Apply now (5 min) — practice exercise¶

Take five minutes. Do this without notes.

Practice checklist - Say the average reader path and the celebrity path separately, so your rehearsal stays focused. - Estimate read QPS before picking a strategy, so your rehearsal stays focused. - Draw the hybrid design in under thirty seconds, so your rehearsal stays focused. - Explain one ranking signal and one deletion path, so your rehearsal stays focused. - Name one failure mode caused by celebrity traffic, so your rehearsal stays focused.

Self-check - Did you mention the celebrity problem explicitly? - Did you separate candidate generation from ranking? - Did you say where caching helps most? - Did you cover deletion propagation at least briefly?

Say this opening - Open with feed requirements and freshness expectations, so your first minute sounds controlled. - Then compare write-time and read-time fan-out, so your first minute sounds controlled. - Finish with celebrity handling and fallback behavior, so your first minute sounds controlled.

Run the choreography once in short form, then once with details. Stay aware of the stage and pause for questions.

Bridge. Feed assembled. Now real-time communication — a chat system. → 05