01. Flat Retrieval Failure — Why cosine similarity can't follow the map¶

~14 min read. The failure that made everyone realise retrieval needs structure, not just similarity.

Continues from the first-principles overview in 00-first-principles.md. The knowledge graph — the full picture of named facts and connections — is exactly what flat vector search throws away the moment it compresses a passage into a single embedding.

1) The mental model first¶

Imagine you need to travel from station A to station D. You have never seen the knowledge graph. You only memorised station names and rough neighbourhood descriptions.

Flat retrieval is that situation. It returns the passage whose vector is closest to your query vector. But "closest" in embedding space means "similar-sounding topic." It does not mean "the answer is in this passage."

For multi-hop questions, the answer lives not in one passage but in the path between passages. Vector search has no path. It has no relationship to follow and no multi-hop junction to navigate.

2) Concrete failure: a three-hop question¶

Question: "Who is the CEO of the company that owns the team that won the 2022 NBA championship?"

Required hops:

┌──────────────────┐  won_by   ┌──────────────────┐
│  2022 NBA champ  │──────────▶│  Golden State     │
└──────────────────┘           │  Warriors         │
                               └────────┬─────────┘
                                        │ owned_by
                                        ▼
                               ┌──────────────────┐
                               │  Lacob/Guber     │
                               │  (owners)        │
                               └──────────────────┘

Flat retrieval fires a query embedding against a corpus. It surfaces the passage with highest cosine similarity to the question. Say it returns a passage about "NBA 2022 season champions" — score 0.91. That passage mentions the Warriors but not the ownership. So we answer: "Golden State Warriors." Wrong.

3) Why cosine similarity scores mislead¶

Attempt 1. Query: "CEO of Warriors owner company." Top result: Wikipedia intro of Golden State Warriors — score 0.88. No CEO mentioned.

Attempt 2. Add context: "ownership structure Warriors." Top result: Article about Joe Lacob as majority owner — score 0.84. Still no CEO of an owning company.

Attempt 3. Include passage about Lacob's other ventures. Score 0.79. The answer is buried and the retriever ranked three wrong passages above it.

The problem is not the embedding model. The problem is that the answer requires combining three disconnected facts. A graph query engine traversing the knowledge graph would chain the hops in milliseconds. Flat search has no chain — only a similarity score on each isolated passage.

flat search                    graph search
──────────────────────────     ──────────────────────────
query embedding                query entities extracted
       │                              │
       ▼                              ▼
nearest passages               graph traversal (3 hops)
[score 0.91]                   station → line → station
[score 0.88]                   station → line → station
[score 0.84]                   station → line → station
       │                              │
       ▼                              ▼
top-k chunks returned          subgraph returned
no path, no chain              structured chain of facts

4) Worked numerical example: precision drop by hop depth¶

Measure top-1 precision on a set of test questions by hop depth.

Hop depth	Flat retrieval precision
1-hop	0.82
2-hop	0.61
3-hop	0.34
4-hop	0.18

Each added hop nearly halves precision.

The knowledge graph structure lets precision stay near 0.82 at all depths because the graph query engine follows explicit relationships instead of guessing.

5) What flat retrieval is still good for¶

Flat retrieval shines when: - The answer is contained in a single passage. - The query is fuzzy or approximate ("tell me about company X"). - You need broad candidate generation before a more precise step.

The question is what to do. Use flat retrieval to find the first entity. Then use graph traversal to follow relationships from there. That hybrid approach preserves the strength of both.

Where this lives in the wild¶

Google's KGQA system — graph-structured traversal handles multi-hop questions about entity relationships that embedding search misses.
Microsoft GraphRAG — splits retrieval into local subgraph search and global community summaries, not a single embedding vector.
LinkedIn's entity graph — job title + company + skills multi-hop queries need graph paths, not nearest-neighbour on passage text.
Salesforce Einstein — CRM relationships (Account → Contact → Opportunity → Owner) require hop-aware retrieval, not cosine similarity.
AWS Neptune + Amazon Q — enterprise chatbots combine graph traversal for structured company data with vector search for unstructured documents.

Pause and recall¶

At 3-hop depth, what does the flat retrieval precision number tell you about why vector search fails?
Why does adding more context to the query not solve the multi-hop problem?
What does the graph query engine do that a nearest-neighbour index cannot?
Name one scenario where flat retrieval is still the right tool.

Interview Q&A¶

Q: Why not just embed the entire knowledge base into one huge vector and query that? A: A single vector compresses all facts into one point — fine-grained relational structure is destroyed. Two entities connected by five different typed relationships become indistinguishable from entities that merely share vocabulary.

Common wrong answer to avoid: "Storage is the problem" — the issue is representational loss of structure, not disk space.

Q: Why does precision drop with each hop for flat retrieval? A: Each hop requires a separate fact. Flat search must independently retrieve each of those facts and hope they all rank in the top-k. The probability of all k facts ranking correctly is multiplicative — P(hop 1) × P(hop 2) × … — so it collapses fast.

Common wrong answer to avoid: "Because longer documents are harder" — the issue is multi-fact composition, not document length.

Q: Why not just rerank the flat retrieval results with a cross-encoder? A: A cross-encoder scores query–passage relevance in isolation. It still cannot combine three passages into a reasoning chain. Reranking reorders the list; it does not traverse the knowledge graph.

Common wrong answer to avoid: "Reranking fixes all retrieval problems" — it improves top-k ordering but cannot synthesise multi-hop paths.

Q: Why is LLM-based chain-of-thought not sufficient for multi-hop without a graph? A: LLMs hallucinate intermediate hops when facts are absent from context. Without the knowledge graph, the model invents relationships that do not exist.

Common wrong answer to avoid: "LLMs are good at filling gaps" — factual gap-filling is exactly where hallucination risk is highest.

Apply now (5 min)¶

Exercise. Write a three-hop question about a company in your industry. Map out the required entities and the relationships between them. Then imagine what top-1 vector search would return and why it fails.

Sketch from memory. Draw the three-hop route on a blank page. Mark the multi-hop junctions where the reasoning must jump. Show what flat retrieval returns (one isolated chunk) versus what the graph query engine returns (the full chain).

Bridge. Flat retrieval fails because it has no graph — no knowledge graph at all. Before building Graph RAG, we need to understand the data model that makes a graph a graph. → 02-graph-data-model.md