10. Hybrid Graph+Vector Retrieval — The best of both maps¶
~14 min read. Graph traversal gives you exact relations. Vector search gives you fuzzy coverage. Combining them gives you production-grade retrieval.
Continues from the first-principles overview in 00-first-principles.md. The graph query engine follows explicit relationships for known relations. The graph embedding finds similar entities when the exact name is unknown. Hybrid uses both in the same pipeline.
1) When each approach fails alone¶
Graph-only failure case: Query: "Tell me about generative AI regulation." No entity named "generative AI regulation" exists in the knowledge graph as a single node. The graph query engine can't find a starting entity. Result: no retrieval.
Vector-only failure case: Query: "Who is the CEO of the company that owns Instagram?" Top-1 vector result: a Forbes article about Instagram history — cosine similarity 0.87. That article mentions Zuckerberg briefly but not as "CEO of Meta." Result: wrong or incomplete answer (as we saw in file 09).
Hybrid solution: - Use vector search to find relevant entities when entity linking fails. - Use graph traversal to follow relationships once the starting entity is found. - Merge results before LLM generation.
2) Three hybrid fusion strategies¶
Strategy A: Graph-first, vector fallback
┌─────────────────────────────────────────────────┐
│ 1. Try entity linking → graph traversal │
│ 2. If traversal returns < threshold, fallback │
│ to vector search │
│ 3. Merge and deduplicate │
└─────────────────────────────────────────────────┘
Strategy B: Vector-first, graph expansion
┌─────────────────────────────────────────────────┐
│ 1. Run ANN vector search → top-k passages │
│ 2. Extract entities from those passages │
│ 3. Expand entities via graph traversal │
│ 4. Merge expanded context with original chunks │
└─────────────────────────────────────────────────┘
Strategy C: Parallel retrieval, reranking
┌─────────────┐ ┌──────────────┐
│ Graph │ │ Vector │
│ traversal │ │ ANN search │
└──────┬──────┘ └──────┬───────┘
│ │
└──────────┬───────────┘
▼
┌───────────────┐
│ Reranker │ cross-encode all candidates
│ (cross-enc.) │ → unified top-k
└───────────────┘
3) Worked numerical example: strategy A failure rescued¶
Query: "What are Meta's revenue streams?"
Graph traversal attempt: - Entity link "Meta" → Meta Platforms node (confidence 0.93). ✓ - Traverse REVENUE_SOURCE edges → found: advertising, WhatsApp Business, Oculus. - Scores: 3 retrieved facts.
Vector search (parallel): - Top-1: "Meta announced Reality Labs segment revenue of $1.1B in Q4 2023" — score 0.89. - Top-2: "Advertising remains 97% of Meta's total revenue" — score 0.87. - Top-3: "Meta's family apps generate ad revenue across Facebook, Instagram, WhatsApp" — score 0.84.
Graph-only result: 3 typed facts, no numbers. Vector-only result: numbers but no structured breakdown.
Hybrid result: graph facts + vector passages = structured breakdown with supporting numbers. The LLM generates a richer answer than either alone.
graph facts + vector passages = hybrid answer
─────────────────────────────────────────────────────────────────
REVENUE_SOURCE: ads "97% from ads" "97% of revenue
REVENUE_SOURCE: Oculus "Reality Labs $1.1B" from advertising;
REVENUE_SOURCE: WhatsApp "Family apps ads" Reality Labs..."
4) HippoRAG: a principled hybrid design¶
HippoRAG (2024) is a Graph RAG system that explicitly combines: - Hippocampal index: vector store for passage retrieval. - Association graph: entity co-occurrence graph from extracted triples. - Personalized PageRank: spreads retrieval signal from seed entities through the graph.
query
│
├──▶ vector search → seed passages → seed entities
│
└──▶ PPR over entity graph → expand to related entities
│
▼
retrieve passages anchored at expanded entities
│
▼
LLM generation
The graph query engine here is the Personalized PageRank. The graph embedding provides the seed entities. Together they outperform pure vector and pure graph RAG on multi-hop benchmarks.
5) Latency and cost tradeoffs¶
┌─────────────────────────┬──────────────┬────────────────┐
│ Strategy │ Latency │ Recall gain │
├─────────────────────────┼──────────────┼────────────────┤
│ Vector-only │ ~20 ms │ baseline │
│ Graph-only │ ~15 ms │ +20% multi-hop│
│ Graph-first + fallback │ ~25 ms │ +25% overall │
│ Parallel + reranker │ ~80 ms │ +35% overall │
└─────────────────────────┴──────────────┴────────────────┘
The parallel + reranker strategy has best recall but 4× the latency. For real-time applications, graph-first + fallback is often the sweet spot.
Where this lives in the wild¶
- Neo4j v5 hybrid index — built-in vector + graph index lets engineers run Cypher with both ANN and traversal in one query; used by enterprise search teams.
- Microsoft GraphRAG + Azure AI Search — local subgraph traversal combined with Azure Cognitive Search vector index for enterprise document Q&A at Fortune 500 firms.
- Weaviate's hybrid search — property graph + dense vector search in one system; used by e-commerce teams for product discovery combining category graph + description similarity.
- LangChain GraphRAG templates — Vector store + Neo4j traversal combined in standard chains; used by RAG engineers building domain-specific knowledge assistants.
- Cohere + graph hybrid (Cohere Compass) — structured entity graph enriched with dense vector passage search for compliance and regulatory document analysis.
Pause and recall¶
- Why does graph-only retrieval fail on broad thematic queries?
- In Strategy B (vector-first), what happens after vector search that graph-only skips?
- In the Meta example, what did graph provide that vector missed, and vice versa?
- Why does the parallel + reranker strategy have 4× higher latency?
Interview Q&A¶
Q: Why not just use a larger vector store instead of adding a graph? A: Vector stores can't encode explicit typed relations between entities. Two passages may be semantically similar but contain opposite relational facts. The graph encodes direction and relation type — "A is CEO of B" vs "B employs A" — which vector similarity flattens.
Common wrong answer to avoid: "Graphs are always more accurate" — for fuzzy semantic queries, vector search has higher recall; graph excels at structural relational accuracy.
Q: Why does Personalized PageRank work better than simple neighbour expansion for hybrid retrieval? A: PPR assigns relevance scores that decay with graph distance from the seed entities, weighted by graph structure. Simple N-hop expansion treats all reachable entities equally, pulling in irrelevant nodes from densely-connected multi-hop junctions.
Common wrong answer to avoid: "PPR is just BFS with a decay factor" — PPR runs as a random walk convergence, not a BFS; the convergence behaviour is qualitatively different.
Q: Why is the graph-first + fallback strategy preferred over parallel retrieval in low-latency pipelines? A: Parallel retrieval doubles the upstream calls and requires a reranker — adding 40-60 ms. Graph-first executes one path and only falls back when needed. For most queries that have clear entity mentions, graph-first succeeds without paying the fallback cost.
Common wrong answer to avoid: "Parallel is always better because it gets more data" — more data increases reranker cost and prompt length; quality is not monotone in retrieved volume.
Q: Why does the hybrid system still need entity linking even with vector search available? A: Vector search finds semantically similar passages but doesn't identify which specific entity on the knowledge graph to start traversal from. Entity linking gives the graph an exact node ID. Without it, traversal has no starting point.
Common wrong answer to avoid: "Vector search replaces entity linking" — they solve different problems: fuzzy similarity vs. exact entity resolution.
Apply now (5 min)¶
Exercise. Take a query that would fail on pure vector and pure graph separately. Design a Strategy B (vector-first + graph expansion) pipeline for it. Identify where the graph embedding hands off to the graph query engine.
Sketch from memory. Draw the Strategy C parallel pipeline with the reranker. Label each component and its output type. Mark which path takes the graph embedding and which follows the relationships.
Bridge. The hybrid system retrieves well at query time. But the knowledge graph is not static — entities change, new ones appear, and old facts go stale. That's the graph maintenance problem. → 11-graph-maintenance.md