11. Search and vector stores — finding the right book, and the right idea¶

~15 min read. Exact words are easy; useful meaning is the harder hunt.

Built on the ELI5 in 00-eli5.md. The card catalog — a card catalog that understands meaning, not just keywords — now helps us retrieve text, intent, and neighbors.

1. Full-text search starts with a smarter lookup table¶

See. A normal database index is excellent for exact matches and sorted ranges. But search queries are messy, human, misspelled, and often incomplete. Users type "best noise cancelling headset for flights" and expect magic. That is where Elasticsearch-style full-text search becomes useful. It builds an inverted index, not a row-by-row scan path.

term ───────→ documents
flight ────→ doc7, doc10, doc44
noise ─────→ doc7, doc11, doc44
headset ───→ doc7, doc19

Look carefully. Instead of storing document-first pointers, it stores term-first pointers. That makes keyword lookup very fast at large scale. The card catalog is not listing shelves anymore. The card catalog is listing which documents contain each token. Elasticsearch also analyzes text before indexing it. It lowercases words, removes punctuation, and may stem word endings. "Running" and "runs" can map closer to "run". That is why analyzers matter so much. Bad analyzers quietly create bad search quality. Worked example now. Suppose we index three product titles. Doc 1: "Wireless noise cancelling headphones" Doc 2: "Gaming headset with wired microphone" Doc 3: "Travel earbuds and compact charger" A query for "noise headset" becomes two posting-list lookups. The engine merges matching documents and scores them. Doc 1 matches "noise". Doc 2 matches "headset". A combined score decides rank order. Simple, no?

2. Relevance is not only matching; it is ranking the better answer¶

Now what is the next problem? Many documents match. Only a few deserve the first screen. Search engines therefore separate retrieval from ranking. Retrieval finds candidates quickly. Ranking decides which candidate is probably most useful. Classic ranking uses ideas like TF-IDF or BM25. Do not panic at the names. The intuition is simple. Words repeated meaningfully inside a document matter more. Words common across every document matter less. Rare, specific terms usually carry stronger signal.

query → tokenize → candidate docs → score → sort → return top K

Suppose a support portal has 50,000 help articles. The query is "reset iphone backup password". An article containing all four terms should rank highly. An article containing only "password" should rank much lower. BM25 helps create that sensible ordering. It rewards term presence, document saturation, and document-length normalization. Long articles do not win just by being long. Elasticsearch also supports filters beside scoring. This is important in production systems. You may search only English content. Or only products currently in stock. Or only documents inside tenant A. Filters cut the candidate set without affecting text score. That keeps ranking cleaner and cheaper. A practical rule helps here. Use the search engine for candidate retrieval and broad ranking. Use a business layer for strict policy rules afterward. For example, hide blocked sellers after search scoring completes. Do not pack every business rule into one giant query. That becomes brittle very quickly.

3. Vector search asks for similar meaning, not shared words¶

Now let us move from keywords to meaning. A vector database stores embeddings, not only raw text fields. An embedding is a numeric representation of meaning or context. Similar ideas usually land closer inside vector space. That lets search work even when words differ.

"cheap hotel"      ─┐
"budget stay"      ─┼──→ nearby vectors
"affordable lodge" ─┘

See the shift. Keyword search asks, "Which documents contain these tokens?" Vector search asks, "Which vectors lie close to this query vector?" The distance may be cosine similarity or dot product. Closer vectors usually mean semantically related content. Worked example. Suppose a recipe app stores embeddings for 1 million recipes. A user asks, "easy high protein vegetarian dinner". Maybe no recipe contains that exact phrase. Still, the embedding may land near lentil bowls, paneer wraps, and tofu curries. That is the magic people notice first. It feels like the system understood intent. But remember the cost. Vector search is approximate at scale. Brute-force nearest-neighbor search across millions of vectors is expensive. Systems therefore use ANN indexes like HNSW or IVF. They trade exactness for speed and memory efficiency. That is acceptable when recall stays high enough. Where do tools fit? Pinecone is a managed vector service. Milvus is a dedicated open-source vector database. pgvector adds vector support directly inside PostgreSQL. Use pgvector when relational joins matter strongly. Use a dedicated store when vector scale and tuning dominate. There is no universal winner.

4. Hybrid search usually beats purity in real products¶

Students often ask, keyword or vector? Look. Real systems usually need both. Keyword search is precise for exact names, IDs, codes, and filters. Vector search is strong for fuzzy meaning and paraphrases. Hybrid search combines both signals. That is how search feels sharp and flexible together.

final score = 0.6 × keyword_score + 0.4 × vector_score

Suppose an ecommerce user searches "red running shoes nike". Keyword search is crucial for the brand word "nike". Vector search helps with related intent around running and shoe style. If you used only vectors, wrong brands may sneak upward. If you used only keywords, semantic near-matches may disappear. Hybrid search keeps both strengths.

RAG systems show the same lesson. A user asks, "How do I rotate API credentials safely?" Keyword search retrieves docs mentioning API, rotate, and credentials. Vector search retrieves docs phrased as key rotation or secret rollover. Hybrid retrieval usually gives better context chunks to the model. That improves answer quality noticeably.

One caution. Embeddings are not truth. They compress meaning imperfectly. Domain-specific jargon, multilingual content, and fresh terminology can drift. So evaluate with your own queries. Do not trust leaderboard marketing blindly. The card catalog got smarter. It did not become magical.

5. Designing the full search path means balancing quality, cost, and freshness¶

Now let us design a realistic flow. Suppose we run a marketplace with 20 million listings. Each listing has title, description, tags, price, and seller quality. We want typeahead, full-text search, and semantic retrieval. A sensible pipeline looks like this.

write listing
   │
   ├─→ Postgres row
   ├─→ Elasticsearch document
   └─→ embedding job ─→ vector store
query
   ├─→ keyword retrieval
   ├─→ vector retrieval
   └─→ merge + business rerank

Why separate writes this way? Because source-of-truth rows, text indexes, and vectors evolve differently. Postgres stores transactional truth. Elasticsearch stores analyzed search documents. The vector store stores embedding-friendly representations. Keeping responsibilities clear makes failures easier to reason about.

Worked sizing example. Assume each query fans out to top 200 keyword candidates. It also fetches top 100 vector neighbors. The reranker merges 300 total candidates. If reranking one candidate costs 0.08 milliseconds, then reranking cost is 300 × 0.08 = 24 milliseconds. If keyword retrieval takes 18 milliseconds, and vector retrieval takes 22 milliseconds, then end-to-end retrieval before rendering is about 46 milliseconds, plus the 24-millisecond rerank cost. Total search backend time becomes roughly 70 milliseconds. That is acceptable for many interactive search pages.

Freshness matters too. If embeddings are regenerated every six hours, new listings may appear in keyword search before semantic search. That is normal. Just document the lag clearly. For sensitive domains, build near-real-time embedding pipelines. For cheaper domains, batch updates may be enough. Choose by business pain, not fashion.

Where this lives in the wild¶

LinkedIn search relevance engineer combines keyword fields and embedding similarity for people, jobs, and post discovery.
Spotify recommendation infrastructure engineer uses vector neighbors for song similarity, then applies metadata filters and reranking.
Amazon catalog search engineer relies on Elasticsearch-style retrieval for exact terms, brand filters, and faceting.
Notion AI retrieval engineer blends semantic chunk retrieval with keyword constraints for workspace documents.
Pinecone platform user at a SaaS company — ML engineer stores embeddings externally while keeping source records in Postgres.

Pause and recall¶

Why is an inverted index faster than scanning every document row?
Why does BM25 help ranking even when many documents match?
When does pgvector beat a separate vector database choice?
Why does hybrid search usually outperform keyword-only or vector-only retrieval?

Interview Q&A¶

Q: Why choose Elasticsearch and not a plain relational index for product search? A: Product search needs tokenization, scoring, faceting, and typo-tolerant retrieval. Relational indexes shine for exact predicates, but full-text ranking and analyzers are much stronger in a search engine.

Common wrong answer to avoid: "Because SQL databases cannot search text" — they can, but the issue is search quality, ranking control, and scale ergonomics.

Q: Why choose pgvector and not Pinecone for some retrieval systems? A: If the dataset is modest and joins with relational data matter strongly, keeping vectors inside Postgres reduces moving parts and transactional drift.

Common wrong answer to avoid: "Because dedicated vector databases are always overkill" — they become useful when vector scale, recall tuning, and operational separation matter.

Q: Why use hybrid retrieval and not only embeddings for RAG? A: Embeddings capture meaning well, but exact keywords, product codes, legal clauses, and names still need lexical precision. Hybrid retrieval covers both intent and exactness.

Common wrong answer to avoid: "Because embeddings are inaccurate" — they are useful, but incomplete alone for many production queries.

Q: Why separate retrieval from reranking in search design? A: Retrieval must stay fast across huge corpora. Reranking can then spend extra compute on a small candidate set and improve final relevance.

Common wrong answer to avoid: "Because reranking is just optional polishing" — it often creates the visible quality jump users actually feel.

Apply now (5 min)¶

Exercise: Take a learning app with 100,000 lessons. Design one keyword field set, one embedding field, and two metadata filters. Then write one query where keyword search wins, one where vector search wins, and one where hybrid wins.

Sketch from memory: draw the inverted-index map, the vector-neighbor picture, and the hybrid merge pipeline with reranking.

Bridge. Search quality is useless if every query opens a fresh database pipe. Next, we learn why connections themselves become a bottleneck. → 12-connection-pooling.md