03. Week 8 — Advanced RAG¶
For deep understanding see
02_explainer.md— narrative with failures, step-by-step transformations, diagrams, and the Module 09 bridge. This file is the quick-reference glossary: patterns, tables, formulas, and prompts.
Section 1 — Why basic RAG plateaus¶
Basic RAG is usually:
This works for direct factual questions. It breaks when the question is ambiguous, underspecified, multi-hop, or metadata-heavy.
| Failure mode | Why it happens | Better pattern |
|---|---|---|
| Wrong entity | Query lacks exact names / aliases | Rewrite + expansion |
| Multi-hop question | One retrieval call cannot satisfy all hops | Decompose |
| Missing keyword match | Dense search ignores rare exact tokens | Hybrid dense + sparse |
| Duplicate chunks | Retriever returns near-identical passages | MMR |
| Good docs buried at rank 8 | Retriever is cheap, not precise | Cross-encoder reranking |
| Low-confidence answer | Retrieval looked weak | Confidence gate + retry |
See explainer chapter 1.
Section 2 — Mental model¶
Module 07 gave you the librarian. Module 08 promotes that librarian into a head researcher.
Named placeholders from the explainer: - the rewriter — query rewriting - the hypothesis — HyDE - the cross-checker — reranker - the multi-step plan — decomposition - the confidence gate — self-evaluation before answer / retry
Core loop:
Question
↓
Transform query
↓
Retrieve with one or more strategies
↓
Cross-check / filter
↓
Answer draft
↓
Confidence gate
├─ high confidence → answer
└─ low confidence → retry / reroute / abstain
Section 3 — Query transformation taxonomy¶
| Pattern | What it does | Best when | Risk |
|---|---|---|---|
| Rewrite | Make the query retrieval-friendly | User asks vaguely | LLM removes an important constraint |
| Expand | Add synonyms, aliases, nearby terms | Domain has jargon / abbreviations | Recall improves, precision may drop |
| Decompose | Split one hard question into sub-queries | Multi-hop or compare/contrast | More latency, orchestration needed |
| Step-back | Ask for higher-level principle first | User asks narrow symptom, corpus stores general policy | Can become too abstract |
Worked example¶
User query:
“Compare Q3 and Q4 revenue growth across all regions.”
Transformation stack:
- Rewrite
- “Compare revenue growth percentages for Q3 and Q4 for APAC, EMEA, LATAM, and North America.”
- Expand
- Add synonyms: revenue, sales, topline
- Add aliases: NA = North America
- Decompose
- “What was Q3 revenue growth by region?”
- “What was Q4 revenue growth by region?”
- “Compute deltas and summarize winners / losers.”
- Step-back
- “Which documents summarize quarterly regional revenue performance?”
See explainer chapter 2.
Section 4 — HyDE¶
HyDE = Hypothetical Document Embeddings.
Process:
- Ask an LLM to draft a plausible answer paragraph.
- Embed that draft.
- Retrieve real documents using that embedding.
- Throw away the draft.
- Answer only from retrieved evidence.
Why it can help: - User questions are often short and messy. - Good answer passages are longer and semantically richer. - The hypothetical answer sits closer to the real answer neighborhood.
When it shines: - Conceptual questions - Ambiguous phrasing - Questions with implied terminology
When it hurts: - Easy factual lookup already phrased clearly - When the hypothetical answer drifts into the wrong concept
See explainer §3.1-§3.2.
Section 5 — Advanced retrieval patterns¶
Parent-child retrieval¶
Index small child chunks for precision. Return larger parent chunks or document windows for context.
Retrieve on child. Answer with parent.
Why it helps: - Small chunks match better. - Larger parents prevent context starvation.
Fusion retrieval¶
Combine different retrieval signals. Most common combination: dense + sparse.
query
├─ dense retriever → semantic matches
└─ sparse retriever → exact-token matches
↓
fuse ranks
↓
rerank
Common fusion method: RRF — Reciprocal Rank Fusion
score(doc) = Σ 1 / (k + rank_i(doc))
You do not need score calibration. You only need rankings.
See explainer §3.3-§3.4.
Section 6 — Reranking and filtering¶
Bi-encoder vs cross-encoder¶
| Model | How it scores | Speed | Quality |
|---|---|---|---|
| Bi-encoder retriever | Embed query and docs separately | Fast | Good recall |
| Cross-encoder reranker | Read query+doc together | Slower | Better precision |
Pattern: - Retrieve top-K cheaply. - Rerank to top-N precisely.
Typical values: - top-K = 20 to 100 - top-N = 3 to 10
Metadata filtering¶
Use before or after retrieval. Examples: - date range - product line - document type - access scope - region
MMR — Maximal Marginal Relevance¶
MMR balances relevance and diversity.
MMR = λ * relevance - (1 - λ) * redundancy
High λ → more relevance.
Low λ → more diversity.
Use when top results are near-duplicates. See explainer chapter 4.
Section 7 — Agentic RAG patterns¶
| Pattern | Core idea | Trigger |
|---|---|---|
| CRAG | Evaluate retrieved context, then correct if weak | Retrieval quality looks low |
| Self-RAG | Model learns to retrieve, critique, and refine | Multi-step answer generation |
| Iterative retrieval | Search, read, search again | Answer requires another hop |
| Routing | Choose one retrieval strategy from many | Different query types need different tools |
Possible routes: - FAQ → sparse-heavy - policy question → hybrid + metadata filters - compare/contrast → decomposition + reranking - broad conceptual query → step-back + HyDE
See explainer chapter 5.
Section 8 — Confidence gate¶
The confidence gate asks: - Did retrieval return enough support? - Are the top documents mutually consistent? - Did the answer cite evidence for each claim? - Did the model actually answer all sub-parts?
Actions: - answer - retry with rewrite - switch retriever - decompose - abstain politely
This is the immediate conceptual bridge to Module 09. Agents will use the same pattern across many tools.
Section 9 — Retrieval prompts you can reuse¶
Prompt 1 — Query rewriter¶
Rewrite the user question for retrieval.
Preserve all constraints.
Return:
1. rewritten_query
2. synonyms_or_aliases
3. metadata_filters
Prompt 2 — Decomposer¶
Break the question into minimal sub-queries needed to answer it.
Each sub-query should be independently retrievable.
Return them in dependency order.
Prompt 3 — Confidence gate¶
Given the question, retrieved context, and draft answer,
judge whether the evidence supports the answer.
Return one of: answer, retry, abstain.
If retry, say which retrieval change to try next.
Prompt 4 — Step-back helper¶
State the higher-level concept or policy that governs this question.
Then produce one broad retrieval query and one specific retrieval query.
See explainer §5.5.
Section 10 — Practical knobs¶
| Knob | Effect | Common failure |
|---|---|---|
| chunk size | recall vs precision | too small loses context |
| top-K | candidate pool size | too low misses good docs |
| rerank top-N | prompt budget | too high adds noise |
| hybrid weight / fusion | dense vs sparse balance | exact entities get lost |
| metadata filters | search scope | over-filtering removes useful docs |
| MMR lambda | novelty vs relevance | duplicates dominate |
| retry budget | reliability vs latency | endless loop |
Section 11 — Foundation-gap audit for Module 09¶
Module 09 assumes you already understand: - advanced retrieval patterns - when to iterate / loop - self-evaluation as a control step - retrieval behaving like a tool rather than a fixed pipeline
If these feel weak, re-read explainer chapters 4-5 before moving on.
Reading list¶
- HyDE paper / summaries
- One reranking implementation guide
- One hybrid-search or fusion-search guide
- One CRAG or self-RAG overview
Self-check¶
For full answers, see explainer §6.3.
- Why is raw user language often a bad retrieval query? (§2.1)
- Rewrite, expansion, decomposition, step-back — how do they differ? (§2.1-§2.4)
- Why can HyDE improve semantic recall? (§3.1)
- Parent-child retrieval — why not just index parent chunks directly? (§3.3)
- Why does dense + sparse beat dense-only on many enterprise corpora? (§3.4)
- Why rerank after retrieval instead of before retrieval? (§4.1)
- What does MMR solve in one sentence? (§4.4)
- What is the confidence gate allowed to do? (§5.2-§5.4)
- Why is this module really about loops and tools, not just better search? (§6.5-§6.6)