05. Design Chat System¶

⏱️ Estimated time: 20 min | Level: advanced

ELI5 callback: On this stage, you are proposing a city conversation grid. Show the blueprint first, then let the choreography justify connections, ordering, and presence.

Step 1: Requirements & Constraints¶

See. First trap is solving the wrong question. Ask crisp questions, then freeze scope.

Functional requirements - Support one-to-one and group messaging, because scope must stay explicit. - Maintain online presence and last-seen information, because scope must stay explicit. - Show read receipts and message delivery states, because scope must stay explicit. - Sync messages across multiple user devices, because scope must stay explicit. - Handle media attachments and push notifications when offline, because scope must stay explicit.

Non-functional requirements - Send latency should feel near real time for active chats, because that constraint changes architecture. - Messages should be durable and ordered per conversation, because that constraint changes architecture. - Connection layer must handle millions of open WebSockets, because that constraint changes architecture. - Presence should be approximate, not perfectly strong-consistent, because that constraint changes architecture. - System should isolate noisy groups from normal personal chats, because that constraint changes architecture.

Constraints and assumptions - Assume 30 million daily active users, so your estimate stays grounded. - Assume 40 messages per active user per day on average, so your estimate stays grounded. - That is roughly 1.2 billion messages per day, so your estimate stays grounded. - Assume peak chat traffic is 8 times the average, so your estimate stays grounded.

What to explicitly de-scope - End-to-end encryption key management is simplified here, because interview time is limited. - Voice and video calling use separate media systems, because interview time is limited. - Advanced moderation tooling is outside this baseline, because interview time is limited. - Message search indexing can be a later extension, because interview time is limited.

On the stage, say what is in and out. That makes the choreography visible and saves time.

Step 2: Scale Estimation¶

Now watch. Use round numbers, not thesis-level math. One minute of math can remove ten minutes of confusion.

Assumptions - 1.2 billion messages per day is about 13,900 per second average, so the back-of-envelope math stays honest. - With 8x peaks, sends can exceed 100,000 per second, so the back-of-envelope math stays honest. - Concurrent WebSocket connections can be in the millions, so the back-of-envelope math stays honest. - Presence heartbeats can outnumber actual message sends, so the back-of-envelope math stays honest.

Quick math - If heartbeats arrive every 30 seconds, connection traffic is huge, which directly changes component choices. - Per-conversation ordering is easier than global ordering, which directly changes component choices. - Message payloads are small, attachments are not, which directly changes component choices. - Unread counters can be precomputed instead of scanned, which directly changes component choices. - Push notifications only matter for inactive devices, which directly changes component choices.

Capacity implications - Split connection handling from message persistence, so the design stays proportional. - Store attachments in object storage, not the message table, so the design stays proportional. - Partition messages by conversation identifier, so the design stays proportional. - Treat presence as a soft-state subsystem with TTL, so the design stays proportional.

Latency budget - Active chat send-to-deliver should stay within a few hundred milliseconds, because user feel matters early. - Presence updates can tolerate slight delay and approximation, because user feel matters early. - Read receipt fan-out should not block message writes, because user feel matters early. - Cold reconnects should quickly fetch missed history, because user feel matters early.

These numbers shape the first blueprint. Simple, no? Design follows load.

Step 3: High-Level Design¶

See. Keep the top-level flow boring and understandable. The interviewer rewards a clean blueprint before clever tricks.

┌──────────┐   ┌─────────────┐   ┌──────────────┐
│ devices  │──→│ WS gateways │──→│ chat service │
└──────────┘   └──────┬──────┘   └──────┬───────┘
                      │                 │
             ┌────────▼──────┐  ┌──────▼──────┐
             │ presence svc  │  │ message bus │
             │ + Redis TTL   │  │ fanout      │
             └────────┬──────┘  └──────┬──────┘
                      │                 │
               ┌──────▼──────┐   ┌─────▼─────┐
               │ session map │   │ message DB│
               │ device routes│  │ by convo  │
               └─────────────┘   └─────┬─────┘
                                        │
                                  ┌─────▼─────┐
                                  │ push svc  │

Main flow - Device opens a WebSocket and authenticates with a session token, so the read and write path stays clear. - Gateway registers device-to-user mapping and heartbeat state, so the read and write path stays clear. - Outgoing message is sent to chat service with conversation ID, so the read and write path stays clear. - Message is persisted, sequenced, and published for recipients, so the read and write path stays clear. - Online recipient devices receive it over existing sockets, so the read and write path stays clear. - Offline recipients get push notifications and fetch later, so the read and write path stays clear.

Data model sketch - Conversation metadata stores members, type, and last sequence, so keys and queries stay obvious. - Message rows use conversation_id plus sequence_number as the key, so keys and queries stay obvious. - Device session map links user_id to active gateway connections, so keys and queries stay obvious. - Read state table tracks last_read_sequence per user per conversation, so keys and queries stay obvious.

What to say aloud - Start by splitting connections, messages, and presence into separate concerns, so the interviewer hears your structure. - Use reasoning aloud to explain why ordering is per conversation, so the interviewer hears your structure. - Mention that presence is soft state and can be approximate, so the interviewer hears your structure. - State that offline delivery uses push plus history sync, so the interviewer hears your structure.

Step 4: Deep Dive¶

So what to do? Pick two hotspots and go deeper. Do not deep dive everywhere.

Component 1: WebSocket layer and presence tracking¶

Goal - Keep millions of connections alive without overloading chat logic, so the deep dive has a target. - Know where to route messages for each active device, so the deep dive has a target.

Design notes - Gateways hold connections and publish heartbeats into presence state, because details must still map to scale. - Use Redis or a similar store with TTL for online indicators, because details must still map to scale. - Map each user to one or more active device sessions, because details must still map to scale. - Shard gateways by connection count, not by user semantics, because details must still map to scale.

Component 2: Message ordering and group fan-out¶

Goal - Preserve ordering within each conversation, so the deep dive has a target. - Deliver efficiently to groups without one giant transaction, so the deep dive has a target.

Design notes - Assign sequence numbers at the conversation partition owner, because details must still map to scale. - Persist first, then fan out to recipient sessions asynchronously, because details must still map to scale. - For large groups, fan out via a queue rather than inline loops, because details must still map to scale. - Read receipts should update state separately from message writes, because details must still map to scale.

Use reasoning aloud to compare one easy option and one scalable option. Add an honest gap if exact thresholds are unknown.

Interviewer follow-ups to prepare - How do you handle a user logged in on three devices? - What changes for very large group chats? - How would you repair missed messages after reconnect? - Where do typing indicators belong?

Why not the simpler alternative? - Long polling works early, but wastes resources at large scale, so tradeoffs stay visible. - Global message ordering sounds nice, but adds unnecessary coupling, so tradeoffs stay visible. - Strongly consistent presence is expensive and rarely needed, so tradeoffs stay visible. - Inline group fan-out is simple, but collapses on big rooms, so tradeoffs stay visible.

Step 5: Tradeoffs & Failure Modes¶

Now watch. Senior answers end with tradeoffs and breakage paths. That is where judgment shows up.

Tradeoffs - WebSockets reduce latency, but increase connection management complexity, so the interviewer hears the cost clearly. - Per-conversation ordering is practical, but cross-chat ordering is undefined, so the interviewer hears the cost clearly. - Approximate presence is cheap, but not always perfectly accurate, so the interviewer hears the cost clearly. - Separate push workflows help offline users, but add another subsystem, so the interviewer hears the cost clearly. - Large-group optimizations help scale, but diverge from small-chat flows, so the interviewer hears the cost clearly.

Failure modes - Gateway failure drops active sockets suddenly, because real systems always break somewhere. - Partition owner failure can delay sequence assignment, because real systems always break somewhere. - Presence cache outage makes online indicators stale, because real systems always break somewhere. - Push provider delay can hurt offline message awareness, because real systems always break somewhere. - Huge groups can create fan-out storms and queue backlogs, because real systems always break somewhere.

Recovery levers - Allow clients to resume from last acknowledged sequence, so failure discussion ends with action. - Rebuild session maps on reconnect instead of persisting too much, so failure discussion ends with action. - Fallback to history fetch when real-time delivery fails, so failure discussion ends with action. - Throttle typing and receipt events before they dominate traffic, so failure discussion ends with action.

Close with an honest gap on one metric you would validate live. That sounds calm, not weak.

Interview Q&A¶

Q1. Why not use HTTP polling instead of WebSockets? A: Because active chat requires low-latency bidirectional delivery. Polling burns bandwidth and increases delay. Common wrong answer to avoid: Polling is basically the same as WebSockets at scale.

Q2. How do you guarantee message order? A: By assigning sequence numbers within a conversation partition. That gives a clear ordering rule without global coordination. Common wrong answer to avoid: You need a global sequence number for all messages.

Q3. Why is presence approximate? A: Because exact presence across mobile networks and reconnects is expensive and often not necessary. A recent heartbeat is usually enough. Common wrong answer to avoid: Presence must be strongly consistent or it is useless.

Q4. How do read receipts work without blocking sends? A: Persist the message first, then update per-user read state independently. The receipt path should not slow the send path. Common wrong answer to avoid: Read receipts should be written in the same critical path always.

Apply now (5 min) — practice exercise¶

Take five minutes. Do this without notes.

Practice checklist - Say the connection path and the message path separately, so your rehearsal stays focused. - Estimate concurrent connections before drawing storage, so your rehearsal stays focused. - Draw sequence assignment around one conversation partition, so your rehearsal stays focused. - Explain how offline users catch up, so your rehearsal stays focused. - Name one scaling change for large group chats, so your rehearsal stays focused.

Self-check - Did you separate gateways from chat storage? - Did you mention per-conversation ordering? - Did you keep presence as soft state? - Did you cover multi-device delivery?

Say this opening - Open with active chat versus offline chat behavior, so your first minute sounds controlled. - Then place WebSockets, persistence, and fan-out, so your first minute sounds controlled. - Finish with ordering and reconnect recovery, so your first minute sounds controlled.

Run the choreography once in short form, then once with details. Stay aware of the stage and pause for questions.

Bridge. Chat flowing. Now something search-based — autocomplete. → 06