03. Week 8 — Advanced RAG¶

For deep understanding see 02_explainer.md — narrative with failures, step-by-step transformations, diagrams, and the Module 09 bridge. This file is the quick-reference glossary: patterns, tables, formulas, and prompts.

Section 1 — Why basic RAG plateaus¶

Basic RAG is usually:

User query
  ↓
Embed once
  ↓
Top-k vector search
  ↓
Stuff chunks into prompt
  ↓
Answer

This works for direct factual questions. It breaks when the question is ambiguous, underspecified, multi-hop, or metadata-heavy.

Failure mode	Why it happens	Better pattern
Wrong entity	Query lacks exact names / aliases	Rewrite + expansion
Multi-hop question	One retrieval call cannot satisfy all hops	Decompose
Missing keyword match	Dense search ignores rare exact tokens	Hybrid dense + sparse
Duplicate chunks	Retriever returns near-identical passages	MMR
Good docs buried at rank 8	Retriever is cheap, not precise	Cross-encoder reranking
Low-confidence answer	Retrieval looked weak	Confidence gate + retry

See explainer chapter 1.

Section 2 — Mental model¶

Module 07 gave you the librarian. Module 08 promotes that librarian into a head researcher.

Named placeholders from the explainer: - the rewriter — query rewriting - the hypothesis — HyDE - the cross-checker — reranker - the multi-step plan — decomposition - the confidence gate — self-evaluation before answer / retry

Core loop:

Question
  ↓
Transform query
  ↓
Retrieve with one or more strategies
  ↓
Cross-check / filter
  ↓
Answer draft
  ↓
Confidence gate
  ├─ high confidence → answer
  └─ low confidence  → retry / reroute / abstain

Section 3 — Query transformation taxonomy¶

Pattern	What it does	Best when	Risk
Rewrite	Make the query retrieval-friendly	User asks vaguely	LLM removes an important constraint
Expand	Add synonyms, aliases, nearby terms	Domain has jargon / abbreviations	Recall improves, precision may drop
Decompose	Split one hard question into sub-queries	Multi-hop or compare/contrast	More latency, orchestration needed
Step-back	Ask for higher-level principle first	User asks narrow symptom, corpus stores general policy	Can become too abstract

Worked example¶

User query:

“Compare Q3 and Q4 revenue growth across all regions.”

Transformation stack:

Rewrite
“Compare revenue growth percentages for Q3 and Q4 for APAC, EMEA, LATAM, and North America.”
Expand
Add synonyms: revenue, sales, topline
Add aliases: NA = North America
Decompose
“What was Q3 revenue growth by region?”
“What was Q4 revenue growth by region?”
“Compute deltas and summarize winners / losers.”
Step-back
“Which documents summarize quarterly regional revenue performance?”

See explainer chapter 2.

Section 4 — HyDE¶

HyDE = Hypothetical Document Embeddings.

Process:

Ask an LLM to draft a plausible answer paragraph.
Embed that draft.
Retrieve real documents using that embedding.
Throw away the draft.
Answer only from retrieved evidence.

Why it can help: - User questions are often short and messy. - Good answer passages are longer and semantically richer. - The hypothetical answer sits closer to the real answer neighborhood.

When it shines: - Conceptual questions - Ambiguous phrasing - Questions with implied terminology

When it hurts: - Easy factual lookup already phrased clearly - When the hypothetical answer drifts into the wrong concept

See explainer §3.1-§3.2.

Section 5 — Advanced retrieval patterns¶

Parent-child retrieval¶

Index small child chunks for precision. Return larger parent chunks or document windows for context.

Parent doc
  ├─ child 1
  ├─ child 2
  ├─ child 3
  └─ child 4

Retrieve on child. Answer with parent.

Why it helps: - Small chunks match better. - Larger parents prevent context starvation.

Fusion retrieval¶

Combine different retrieval signals. Most common combination: dense + sparse.

query
 ├─ dense retriever  → semantic matches
 └─ sparse retriever → exact-token matches
           ↓
        fuse ranks
           ↓
        rerank

Common fusion method: RRF — Reciprocal Rank Fusion

score(doc) = Σ 1 / (k + rank_i(doc))

You do not need score calibration. You only need rankings.

See explainer §3.3-§3.4.

Section 6 — Reranking and filtering¶

Bi-encoder vs cross-encoder¶

Model	How it scores	Speed	Quality
Bi-encoder retriever	Embed query and docs separately	Fast	Good recall
Cross-encoder reranker	Read query+doc together	Slower	Better precision

Pattern: - Retrieve top-K cheaply. - Rerank to top-N precisely.

Typical values: - top-K = 20 to 100 - top-N = 3 to 10

Metadata filtering¶

Use before or after retrieval. Examples: - date range - product line - document type - access scope - region

MMR — Maximal Marginal Relevance¶

MMR balances relevance and diversity.

MMR = λ * relevance - (1 - λ) * redundancy

High λ → more relevance. Low λ → more diversity.

Use when top results are near-duplicates. See explainer chapter 4.

Section 7 — Agentic RAG patterns¶

Pattern	Core idea	Trigger
CRAG	Evaluate retrieved context, then correct if weak	Retrieval quality looks low
Self-RAG	Model learns to retrieve, critique, and refine	Multi-step answer generation
Iterative retrieval	Search, read, search again	Answer requires another hop
Routing	Choose one retrieval strategy from many	Different query types need different tools

Possible routes: - FAQ → sparse-heavy - policy question → hybrid + metadata filters - compare/contrast → decomposition + reranking - broad conceptual query → step-back + HyDE

See explainer chapter 5.

Section 8 — Confidence gate¶

The confidence gate asks: - Did retrieval return enough support? - Are the top documents mutually consistent? - Did the answer cite evidence for each claim? - Did the model actually answer all sub-parts?

Actions: - answer - retry with rewrite - switch retriever - decompose - abstain politely

This is the immediate conceptual bridge to Module 09. Agents will use the same pattern across many tools.

Section 9 — Retrieval prompts you can reuse¶

Prompt 1 — Query rewriter¶

Rewrite the user question for retrieval.
Preserve all constraints.
Return:
1. rewritten_query
2. synonyms_or_aliases
3. metadata_filters

Prompt 2 — Decomposer¶

Break the question into minimal sub-queries needed to answer it.
Each sub-query should be independently retrievable.
Return them in dependency order.

Prompt 3 — Confidence gate¶

Given the question, retrieved context, and draft answer,
judge whether the evidence supports the answer.
Return one of: answer, retry, abstain.
If retry, say which retrieval change to try next.

Prompt 4 — Step-back helper¶

State the higher-level concept or policy that governs this question.
Then produce one broad retrieval query and one specific retrieval query.

See explainer §5.5.

Section 10 — Practical knobs¶

Knob	Effect	Common failure
chunk size	recall vs precision	too small loses context
top-K	candidate pool size	too low misses good docs
rerank top-N	prompt budget	too high adds noise
hybrid weight / fusion	dense vs sparse balance	exact entities get lost
metadata filters	search scope	over-filtering removes useful docs
MMR lambda	novelty vs relevance	duplicates dominate
retry budget	reliability vs latency	endless loop

Section 11 — Foundation-gap audit for Module 09¶

Module 09 assumes you already understand: - advanced retrieval patterns - when to iterate / loop - self-evaluation as a control step - retrieval behaving like a tool rather than a fixed pipeline

If these feel weak, re-read explainer chapters 4-5 before moving on.

Reading list¶

HyDE paper / summaries
One reranking implementation guide
One hybrid-search or fusion-search guide
One CRAG or self-RAG overview

Self-check¶

For full answers, see explainer §6.3.

Why is raw user language often a bad retrieval query? (§2.1)
Rewrite, expansion, decomposition, step-back — how do they differ? (§2.1-§2.4)
Why can HyDE improve semantic recall? (§3.1)
Parent-child retrieval — why not just index parent chunks directly? (§3.3)
Why does dense + sparse beat dense-only on many enterprise corpora? (§3.4)
Why rerank after retrieval instead of before retrieval? (§4.1)
What does MMR solve in one sentence? (§4.4)
What is the confidence gate allowed to do? (§5.2-§5.4)
Why is this module really about loops and tools, not just better search? (§6.5-§6.6)