09. Multi-hop Reasoning — Planning a route through multi-hop junctions¶

~15 min read. How graph traversal answers questions that require chaining multiple facts in sequence.

Continues from the first-principles overview in 00-first-principles.md. Multi-hop reasoning is the graph query engine computing a path that passes through several multi-hop junctions on the knowledge graph before arriving at the answer entity.

1) The picture: chaining hops is not stacking retrieval¶

Wrong mental model: "multi-hop = retrieve three passages and concatenate them." Right mental model: each hop is a directed step from one entity to another along a relationship. The path must be valid end-to-end.

Question: "What is the nationality of the CEO of the company that owns Instagram?"

Hop 1: Instagram ──[OWNED_BY]──▶ Meta Platforms
Hop 2: Meta Platforms ──[CEO_IS]──▶ Mark Zuckerberg
Hop 3: Mark Zuckerberg ──[NATIONALITY]──▶ American

answer: American

Each hop follows one relationship. Each intermediate entity (Meta, Zuckerberg) is a multi-hop junction. Get any hop wrong and the answer is wrong.

2) Why LLMs fail at multi-hop without graph support¶

LLMs do multi-hop reasoning by generating intermediate steps from parametric memory. This fails because: 1. The model may not have the intermediate fact in its weights. 2. Even if it does, hallucination at any step cascades. 3. No provenance — you can't check which hop introduced an error.

Experiment: ask GPT-4 "Who is the nationality of the CEO of the company that owns Instagram?" Step A — LLM correctly says "Meta owns Instagram." ✓ Step B — LLM says "Sheryl Sandberg is CEO." ✗ (hallucination) Step C — LLM says "American." (correct answer to wrong question)

The final answer looks right but the reasoning chain is broken. The graph query engine on the knowledge graph would have given the correct chain with provenance.

3) Beam search over graph paths¶

For complex queries, the graph query engine can't always commit to one path. Beam search explores the top-K most promising paths in parallel.

Start: Instagram node
─────────────────────────────
Beam step 1 (K=3 paths):
  path A: Instagram ──[OWNED_BY]──▶ Meta         score 0.95
  path B: Instagram ──[FOUNDED_BY]──▶ Systrom    score 0.72
  path C: Instagram ──[PART_OF]──▶ Facebook Inc  score 0.68

Beam step 2 (expand top-K):
  path A → Meta ──[CEO_IS]──▶ Zuckerberg         score 0.91
  path A → Meta ──[HQ_IN]──▶ Menlo Park          score 0.60
  path B → Systrom ──[NATIONALITY]──▶ American   score 0.61

Beam step 3:
  path A → Zuckerberg ──[NATIONALITY]──▶ American  score 0.94
  path B → Systrom ──[NATIONALITY]──▶ American     score 0.59

At each step, score = relation relevance × entity confidence × path coherence. Final answer comes from the highest-scoring complete path.

4) Worked numerical example: path scoring¶

Score each hop using: score(step) = P(rel | query_context) × confidence(entity_link)

Query context vector q. Relation embedding r_i. Relation relevance: cosine(q, r_i).

Step          rel_relevance   entity_conf   step_score
─────────────────────────────────────────────────────
Hop 1 (OWNED_BY)   0.91          0.97          0.883
Hop 2 (CEO_IS)     0.88          0.95          0.836
Hop 3 (NATIONALITY)0.85          0.99          0.842

Path total score = product of step scores: 0.883 × 0.836 × 0.842 = 0.621.

Now compare to a wrong path:

Step          rel_relevance   entity_conf   step_score
─────────────────────────────────────────────────────
Hop 1 (OWNED_BY)   0.91          0.97          0.883
Hop 2 (CEO_IS)     0.88          0.61          0.537  ← wrong entity
Hop 3 (NATIONALITY)0.85          0.99          0.842
path score = 0.883 × 0.537 × 0.842 = 0.399

The correct path scores 0.621 vs 0.399 for the wrong path. The graph query engine selects the correct path.

5) Iterative retrieval: interleaving LLM and graph¶

A simpler alternative to beam search: iterative retrieval. The LLM generates the next-hop entity to look up; the graph retrieves it.

┌────────────────────────────────────────────────────┐
│  Round 1:  LLM says "look up: company owning Insta"│
│            Graph returns: Meta Platforms           │
│                                                    │
│  Round 2:  LLM says "look up: CEO of Meta"         │
│            Graph returns: Mark Zuckerberg          │
│                                                    │
│  Round 3:  LLM says "look up: nationality of Mark" │
│            Graph returns: American                  │
│                                                    │
│  LLM generates final answer from collected facts.  │
└────────────────────────────────────────────────────┘

The LLM directs the graph query engine, and the graph query engine grounds each step. This avoids hallucination at each hop because the knowledge graph supplies the fact.

Where this lives in the wild¶

Google's KGQA — multi-hop query decomposition over Freebase powers complex entity questions in Google Search without LLM hallucination of intermediate facts.
IBM Watson Discovery + Knowledge Graph — multi-hop reasoning over biomedical graphs answers drug interaction queries by chaining gene-protein-drug relationships.
Amazon Alexa — skill graph traversal chains location-service-availability hops to answer "Is there a Whole Foods near the hotel I'm staying at?"
Diffbot's Fact-Check API — multi-hop traversal verifies claims by chaining entity-relation paths and reporting which step in the chain fails verification.
Palantir Foundry — operational graph reasoning for defence analysts chains organisation-location-event paths across intelligence data sources.

Pause and recall¶

Why does a hallucination at Hop 2 produce a wrong answer even when Hop 1 and Hop 3 are correct?
In beam search, what three factors combine into a step score?
What is the key difference between beam search multi-hop and iterative retrieval?
Why does the correct path score 0.621 vs 0.399 for the wrong path in the example?

Interview Q&A¶

Q: Why can't a well-prompted LLM replace explicit multi-hop graph traversal? A: LLMs lack access to real-time or proprietary entity facts. They hallucinate intermediate entities, especially for less-common entities. Graph traversal provides provenance at each hop — you can audit exactly which relationship was followed.

Common wrong answer to avoid: "LLMs are not smart enough" — capability is not the issue; factual grounding and auditability are.

Q: Why use beam search instead of greedy (top-1) path selection? A: A greedy choice at Hop 1 may commit to a path where later hops fail. Beam search keeps K candidate paths alive and can recover from a suboptimal early hop. This matters especially when early relation scores are close to each other.

Common wrong answer to avoid: "Beam search is always better than greedy" — it uses K× more compute; greedy is fine when the first hop has a clear best relation.

Q: Why is path scoring multiplicative rather than additive? A: We need the entire path to be valid end-to-end. Any single weak step invalidates the chain. Multiplication penalises paths with any low-confidence step heavily. Additive scoring would allow one strong hop to compensate for a very weak hop.

Common wrong answer to avoid: "Multiplication causes underflow" — in practice, scores are kept in log space (sum of logs) to avoid numerical underflow.

Q: Why does iterative retrieval avoid the context-length problem of full multi-hop? A: Full beam search materialises all K candidate paths simultaneously. For long chains (5+ hops), the context grows exponentially. Iterative retrieval adds one hop per round and feeds only the new fact into context, keeping prompt length bounded.

Common wrong answer to avoid: "Iterative retrieval is always slower" — it trades parallelism for context efficiency; latency depends on graph DB speed, not round count.

Apply now (5 min)¶

Exercise. Write a 3-hop question from a domain you know. Map out the required entities and relationships. Score each hop using made-up but realistic confidence values. Compute the path score for the correct path and a plausible wrong path.

Sketch from memory. Draw a beam search tree for a 2-hop query with beam width 2. Label each node (station), each edge (relationship), and each step score. Show which path wins.

Bridge. Multi-hop traversal on a graph is powerful for exact relational chains. But real data often needs fuzzy matching too — some facts live in unstructured text. The fix is to combine graph traversal with vector search. → 10-hybrid-graph-vector.md