Skip to content

06. Design Search Autocomplete

⏱️ Estimated time: 20 min | Level: advanced

ELI5 callback: On this stage, you are pitching a city signboard helper. Show the blueprint first, then let the choreography explain prefixes, ranking, and personalization.

Step 1: Requirements & Constraints

See. First trap is solving the wrong question. Ask crisp questions, then freeze scope.

Functional requirements - Return suggestions while the user is still typing, because scope must stay explicit. - Match prefixes quickly across a large query dictionary, because scope must stay explicit. - Rank suggestions by popularity, freshness, and maybe personalization, because scope must stay explicit. - Support typo-safe fallbacks only if time allows, because scope must stay explicit. - Filter unsafe or blocked suggestions before serving, because scope must stay explicit.

Non-functional requirements - P99 latency should stay very low, often under 50 ms, because that constraint changes architecture. - Throughput must handle many keystroke-driven requests per search, because that constraint changes architecture. - Index updates should be frequent enough for trending terms, because that constraint changes architecture. - Memory footprint must stay controlled for in-memory serving, because that constraint changes architecture. - Personalization should improve relevance without dominating cost, because that constraint changes architecture.

Constraints and assumptions - Assume 50 million search users per day, so your estimate stays grounded. - Assume 10 typed characters per search session on average, so your estimate stays grounded. - That can produce 500 million prefix lookups per day, so your estimate stays grounded. - Assume peak query volume is 6 times average, so your estimate stays grounded.

What to explicitly de-scope - Full spelling correction and semantic search are separate systems, because interview time is limited. - Ads blending is out for this baseline, because interview time is limited. - Multilingual transliteration support can come later, because interview time is limited. - Offline model training details are simplified here, because interview time is limited.

On the stage, say what is in and out. That makes the choreography visible and saves time.

Step 2: Scale Estimation

Now watch. Use round numbers, not thesis-level math. One minute of math can remove ten minutes of confusion.

Assumptions - 500 million lookups per day is about 5,800 per second average, so the back-of-envelope math stays honest. - With a 6x peak, traffic reaches roughly 35,000 per second, so the back-of-envelope math stays honest. - Hot prefixes like a or s dominate cache behavior, so the back-of-envelope math stays honest. - Suggestion corpus may contain tens of millions of phrases, so the back-of-envelope math stays honest.

Quick math - If each suggestion entry averages 40 bytes, memory rises quickly, which directly changes component choices. - Prefix trees compress shared prefixes better than flat maps, which directly changes component choices. - Top-k results per node can trade memory for speed, which directly changes component choices. - Personalization lookup should be optional and bounded, which directly changes component choices. - Rebuilding the whole index hourly may be acceptable offline, which directly changes component choices.

Capacity implications - Serve the main index from memory for very low latency, so the design stays proportional. - Push popularity updates through batch or streaming pipelines, so the design stays proportional. - Cache the hottest prefixes at the edge or app tier, so the design stays proportional. - Keep personalization lightweight on the first pass, so the design stays proportional.

Latency budget - Network and TLS eat a large fraction of typeahead latency, because user feel matters early. - Index lookup should be only a few milliseconds in-process, because user feel matters early. - Personalization merge must be bounded and optional, because user feel matters early. - Fallback to global popularity if user features are missing, because user feel matters early.

These numbers shape the first blueprint. Simple, no? Design follows load.

Step 3: High-Level Design

See. Keep the top-level flow boring and understandable. The interviewer rewards a clean blueprint before clever tricks.

┌──────────┐   ┌────────────┐   ┌──────────────┐
│ clients  │──→│ edge cache │──→│ suggest API  │
└──────────┘   └────────────┘   └──────┬───────┘
                           ┌───────────┼─────────────┐
                           │           │             │
                     ┌─────▼────┐ ┌────▼─────┐ ┌─────▼─────┐
                     │ in-memory │ │ profile  │ │ safe-list │
                     │ trie/index│ │ features │ │ filters   │
                     └─────┬─────┘ └──────────┘ └───────────┘
                     ┌─────▼─────┐
                     │ offline   │
                     │ build job │
                     └─────┬─────┘

Main flow - Client sends current prefix after a short debounce, so the read and write path stays clear. - Suggest API normalizes the prefix and checks hot caches, so the read and write path stays clear. - In-memory index returns candidate suggestions for the prefix, so the read and write path stays clear. - Optional profile features rerank candidates for the user, so the read and write path stays clear. - Safety filters remove banned or sensitive suggestions, so the read and write path stays clear. - API returns top results plus metadata for analytics, so the read and write path stays clear.

Data model sketch - Trie nodes or FST states map prefixes to top suggestions, so keys and queries stay obvious. - Suggestion records keep text, score, language, and freshness, so keys and queries stay obvious. - User feature vectors stay in a compact profile store, so keys and queries stay obvious. - Analytics events capture shown suggestions and eventual clicks, so keys and queries stay obvious.

What to say aloud - Start by stating that autocomplete is query-time, not full search, so the interviewer hears your structure. - Use reasoning aloud to explain why the serving index stays in memory, so the interviewer hears your structure. - Mention that ranking can be global first, personalized second, so the interviewer hears your structure. - State that unsafe suggestions need a final filtering step, so the interviewer hears your structure.

Step 4: Deep Dive

So what to do? Pick two hotspots and go deeper. Do not deep dive everywhere.

Component 1: Prefix index design

Goal - Find top suggestions for a prefix with tiny latency, so the deep dive has a target. - Keep memory usage acceptable despite huge vocabulary size, so the deep dive has a target.

Design notes - A trie is easy to explain and works well for prefix lookup, because details must still map to scale. - A finite-state transducer can compress memory further if needed, because details must still map to scale. - Store top-k suggestions at nodes for fast serving, because details must still map to scale. - Rebuild or merge index snapshots offline, not on every query, because details must still map to scale.

Component 2: Ranking and personalization

Goal - Blend popularity with user context without hurting latency badly, so the deep dive has a target. - Keep cold-start behavior strong when user features are absent, so the deep dive has a target.

Design notes - Use global popularity and recency as the baseline score, because details must still map to scale. - Add lightweight boosts from user locale, history, or category affinity, because details must still map to scale. - Bound the reranking candidate set to keep compute stable, because details must still map to scale. - Fall back cleanly to global results when personalization data is missing, because details must still map to scale.

Use reasoning aloud to compare one easy option and one scalable option. Add an honest gap if exact thresholds are unknown.

Interviewer follow-ups to prepare - How often should the popularity score update? - What do you do with brand-new trending queries? - How would you block offensive or unsafe suggestions? - When is personalization not worth the extra latency?

Why not the simpler alternative? - Scanning a sorted list per prefix is simple, but too slow at scale, so tradeoffs stay visible. - Putting all ranking online is flexible, but raises latency sharply, so tradeoffs stay visible. - Heavy personalization may help quality, but often hurts tail latency, so tradeoffs stay visible. - Exact freshness on every keystroke is usually overkill, so tradeoffs stay visible.

Step 5: Tradeoffs & Failure Modes

Now watch. Senior answers end with tradeoffs and breakage paths. That is where judgment shows up.

Tradeoffs - Trie-based serving is fast, but memory-hungry without compression, so the interviewer hears the cost clearly. - Top-k per node reduces latency, but increases index size, so the interviewer hears the cost clearly. - Batch popularity updates are cheap, but less fresh, so the interviewer hears the cost clearly. - Personalization improves relevance, but complicates privacy and caching, so the interviewer hears the cost clearly. - Aggressive filtering protects users, but may hide legitimate queries, so the interviewer hears the cost clearly.

Failure modes - Hot prefixes can overload one shard or one cache node, because real systems always break somewhere. - Bad index builds can serve irrelevant suggestions globally, because real systems always break somewhere. - Profile store latency can hurt tail performance, because real systems always break somewhere. - Safety rule bugs can either overblock or underblock content, because real systems always break somewhere. - Analytics loss can degrade future ranking quality, because real systems always break somewhere.

Recovery levers - Serve only global top suggestions when profile systems lag, so failure discussion ends with action. - Keep the previous index snapshot for instant rollback, so failure discussion ends with action. - Cache ultra-hot prefixes separately from the full index, so failure discussion ends with action. - Expose moderation override lists for urgent blocking, so failure discussion ends with action.

Close with an honest gap on one metric you would validate live. That sounds calm, not weak.

Interview Q&A

Q1. Why not use the main search engine directly? A: Because autocomplete needs much lower latency and different data structures. Full search ranking per keystroke is too expensive. Common wrong answer to avoid: Autocomplete is just normal search with fewer results.

Q2. Why is a trie commonly discussed? A: Because it maps naturally to prefixes and shared character paths. It is easy to reason about in interviews. Common wrong answer to avoid: Trie automatically solves memory concerns by itself.

Q3. Do you need personalization on day one? A: Not necessarily. A strong global popularity model often gets you far, and personalization can be added gradually. Common wrong answer to avoid: Personalization should always be mandatory for good quality.

Q4. How do trending terms appear quickly? A: Use a fast-updating popularity feed or micro-batch refresh that boosts recent query counts into the serving index. Common wrong answer to avoid: Unsafe suggestion filtering can happen only offline.

Apply now (5 min) — practice exercise

Take five minutes. Do this without notes.

Practice checklist - Estimate lookup QPS from keystroke behavior, so your rehearsal stays focused. - Draw the serving path and the offline build path separately, so your rehearsal stays focused. - Explain why the index is in memory, so your rehearsal stays focused. - Pick one ranking signal and one filter, so your rehearsal stays focused. - Say what you do when personalization times out, so your rehearsal stays focused.

Self-check - Did you separate serving from offline indexing? - Did you mention top-k storage or another speed trick? - Did you say how ranking updates arrive? - Did you cover safe filtering?

Say this opening - Open with prefix lookup and latency target, so your first minute sounds controlled. - Then place the in-memory index and ranking logic, so your first minute sounds controlled. - Finish with refresh strategy and safety controls, so your first minute sounds controlled.

Run the choreography once in short form, then once with details. Stay aware of the stage and pause for questions.

Bridge. Search done. Now massive media — design YouTube. → 07