07. Graph RAG Architecture — Wiring the knowledge graph into an LLM¶

~16 min read. The full pipeline from user query to grounded LLM answer using graph-structured knowledge.

Continues from the first-principles overview in 00-first-principles.md. The graph query engine traverses the knowledge graph to collect context, the multi-hop junctions enable multi-hop answers, and the LLM reads the assembled subgraph to generate a grounded response.

1) The architecture in one picture¶

User query
    │
    ▼
┌─────────────────────┐
│  Entity extraction  │  find **entities** in the query
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Entity linking     │  map mentions to graph node IDs
└──────────┬──────────┘
           │
    ┌──────┴──────┐
    │             │
    ▼             ▼
┌────────┐   ┌──────────────┐
│ Graph  │   │ Vector index │   ← **graph embedding** for fuzzy lookup
│traversal│  │  (ANN)       │
└────┬───┘   └──────┬───────┘
    │               │
    └───────┬───────┘
            │ merge
            ▼
┌─────────────────────┐
│ Context assembly    │  subgraph + relevant passages
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  LLM generation     │  answer grounded in retrieved context
└─────────────────────┘

The graph query engine traverses explicit relationships. The vector index supplies fuzzy-match context via the graph embedding. The LLM only generates — it does not guess facts.

2) Two retrieval modes: local vs global¶

Local (entity-centric): Given a query entity, expand outward N hops on the knowledge graph. Collect all entities and relationships in that subgraph. Best for: specific entity questions ("Tell me about Sundar Pichai's role at Alphabet").

Global (community-based): Cluster the knowledge graph into communities. Pre-generate summaries for each community. At query time, find the most relevant community summaries. Best for: thematic questions ("What are the major trends in cloud computing?").

Local:                          Global:
                                ┌──────────────────────┐
(A)──▶(B)──▶(C)                 │ Community 1 summary  │
  ╲         ╱                   │  (cloud tech cluster)│
   ▶──(D)──◀                   └──────────────────────┘
   subgraph                     ┌──────────────────────┐
   around A                     │ Community 2 summary  │
                                │  (AI/ML cluster)     │
                                └──────────────────────┘

3) Worked example: a two-hop Graph RAG query¶

Query: "What cloud products does the company Sundar Pichai leads offer?"

Step 1 — Entity extraction: "Sundar Pichai" → Person entity.

Step 2 — Entity linking: "Sundar Pichai" → Q5765 (Wikidata) or internal node ID.

Step 3 — Graph traversal:

Q5765 (Sundar Pichai)
    │ CEO_OF
    ▼
Q312 (Google LLC)
    │ PART_OF
    ▼
Q380 (Alphabet)         ← first **multi-hop junction**

Q312 (Google LLC)
    │ OFFERS_PRODUCT
    ▼
Q1001 (Google Cloud)    ← second **multi-hop junction**
Q1002 (Google Search)
Q1003 (Google Workspace)

Step 4 — Context assembly: Collected nodes: Google LLC, Alphabet, Google Cloud, Google Search, Google Workspace. Retrieved passages: product description paragraphs.

Step 5 — LLM generation: Prompt: "Given this context: [subgraph facts + passages], answer: What cloud products does Sundar Pichai's company offer?"

Answer: "Sundar Pichai leads Google LLC, which offers Google Cloud, Google Search, and Google Workspace."

No hallucination. Every fact is from the knowledge graph or retrieved passage.

4) Prompt design for Graph RAG¶

The LLM needs structured context, not raw graph dumps.

Bad prompt format (raw triples):

(Sundar Pichai, CEO_OF, Google LLC)
(Google LLC, OFFERS_PRODUCT, Google Cloud)
...

Good prompt format (natural language serialisation):

Context facts:
- Sundar Pichai is the CEO of Google LLC.
- Google LLC is a subsidiary of Alphabet Inc.
- Google LLC offers Google Cloud, Google Search, and Google Workspace.

Question: What cloud products does Sundar Pichai's company offer?

See. LLMs are trained on natural language, not Cypher output. Serialise the subgraph into readable sentences before the LLM sees it.

5) Failure modes and mitigations¶

┌────────────────────────────┬──────────────────────────────────┐
│  Failure mode              │  Mitigation                      │
├────────────────────────────┼──────────────────────────────────┤
│  Wrong entity link         │  Threshold + fallback to vector  │
│  Missing **relationship**    │  Hybrid vector fallback          │
│  Subgraph too large        │  N-hop limit + degree pruning    │
│  LLM ignores graph context │  Format context clearly; cite ID │
│  Stale graph facts         │  Freshness metadata on edges     │
└────────────────────────────┴──────────────────────────────────┘

The most dangerous failure: wrong entity link. The graph query engine starts from the wrong entity and every hop is wrong. Always verify the entity link confidence before traversing.

Where this lives in the wild¶

Microsoft GraphRAG (open source) — two-mode retrieval: local subgraph for entity queries, global community summaries for thematic queries over enterprise document corpora.
Amazon Bedrock Knowledge Bases — hybrid retrieval combining Neptune graph traversal with Titan embeddings for enterprise chatbots in regulated industries.
Neo4j + LangChain integration — Graph RAG templates let engineers wire Cypher traversal results directly into LLM context in under 50 lines of code.
Salesforce Einstein Copilot — Graph RAG over CRM knowledge graph answers "Which accounts is this contact related to?" with multi-hop traversal of account-contact edges.
Palantir AIP — Graph RAG over operational intelligence graphs for defence/finance analysts; entity disambiguation and multi-hop reasoning run in secure enclaves.

Pause and recall¶

What are the two Graph RAG retrieval modes and when does each apply?
In the worked example, which node was the first multi-hop junction in the traversal?
Why must you serialise the subgraph into natural language before the LLM sees it?
What is the most dangerous failure mode in Graph RAG and why?

Interview Q&A¶

Q: Why does Graph RAG outperform vanilla RAG on multi-hop questions? A: Vanilla RAG retrieves independently ranked passages — it can't chain them into a reasoning path. Graph RAG follows relationships explicitly, collecting context that spans multiple multi-hop junctions in one coherent traversal.

Common wrong answer to avoid: "Graph RAG uses a better embedding model" — the advantage is structural traversal, not embedding quality.

Q: Why keep a vector index alongside graph traversal instead of using graph-only? A: Entity linking fails for fuzzy mentions, new entities, or paraphrased names. The graph embedding (vector index) catches these cases and provides a fallback when the graph has no exact entity for the mention.

Common wrong answer to avoid: "Vector search is faster" — latency is not the reason; coverage is.

Q: Why is the local subgraph mode insufficient for broad thematic queries? A: Local expansion follows edges from specific entities. A thematic query ("trends in AI") has no single starting entity — it spans many disconnected parts of the knowledge graph. Community summaries pre-aggregate this breadth so one query can retrieve it.

Common wrong answer to avoid: "Just increase hop depth for broad queries" — more hops from one entity still miss unconnected communities.

Q: Why does the LLM generation step need explicit grounding instructions? A: LLMs default to using parametric knowledge when context is ambiguous or formatted poorly. Without explicit grounding, the LLM may ignore the retrieved subgraph and hallucinate an answer that contradicts the knowledge graph.

Common wrong answer to avoid: "LLMs always use the context you give them" — research consistently shows LLMs blend context with parametric knowledge when context is unclear.

Apply now (5 min)¶

Exercise. Take a two-hop question from your domain. Sketch the full Graph RAG pipeline for it: entity extraction → linking → traversal → context assembly → generation. Identify which step is most likely to fail and why.

Sketch from memory. Draw the Graph RAG architecture diagram from memory. Label where the graph query engine operates and where the graph embedding operates. Mark the point where context is serialised for the LLM.

Bridge. Local subgraph retrieval handles entity questions well. But broad thematic questions need a global view of the knowledge graph — that's where community detection comes in. → 08-community-detection.md