01. Week 8 — Advanced RAG¶

Key concepts to master¶

Basic RAG fails when the user query is not retrieval-shaped.
Query rewriting turns user language into retriever language.
Query expansion improves recall with synonyms, aliases, and missing entities.
Query decomposition breaks multi-hop questions into answerable sub-queries.
Step-back prompting asks for the higher-level principle before searching details.
HyDE embeds a hypothetical answer, not the raw question.
Parent-child retrieval keeps chunk precision without losing document-level context.
Fusion retrieval combines dense and sparse search, then merges rankings.
Cross-encoder reranking boosts precision after cheap retrieval.
Metadata filtering and MMR reduce junk and duplicate context.
Corrective loops decide when to retry, switch strategy, or abstain.
The confidence gate is the precursor to agent self-evaluation in Module 09.

🧠 Mental models¶

Query rewriting: "Translate user language into the retriever's dialect."
HyDE: "Write a plausible answer ghost first, then search for documents that resemble it."
Hybrid search: "Use both a keyword flashlight and a semantic compass."
Reranking: "Do expensive background checks only on the shortlist."
Multi-hop retrieval: "Cross the river stone by stone; each retrieved fact enables the next jump."
Confidence gate: "A circuit breaker that stops the system from bluffing when evidence is weak."

⚠️ Common traps¶

Stacking rewrite, expansion, and decomposition blindly and drifting farther from the real need.
Reranking too many candidates, which can dominate latency and erase ANN speed gains.
Optimizing precision on easy head queries while recall collapses on messy long-tail questions.
Evaluating only final answers and never labeling whether retrieval itself found the right evidence.
Letting corrective loops retry indefinitely instead of abstaining, switching strategy, or escalating.

🔗 Prerequisites & connections¶

Builds on: Module 07 (RAG Fundamentals) — basic chunking, embeddings, ANN retrieval, and faithfulness metrics are assumed. Feeds into: Module 09 (Agents & Tool Calling) — corrective loops, confidence gates, and multi-step retrieval become agent planning patterns.

💬 Interview phrasing¶

"Baseline RAG fails on multi-step business questions. What would you add first and why?"
"When does HyDE help, and when can it make retrieval worse?"
"Why does hybrid search usually beat dense-only retrieval in production?"
"How would you evaluate retrieval quality separately from generation quality?"
"When should the system retry retrieval versus answer 'I don't know'?"

⏱️ Difficulty markers¶

🟢 metadata filtering and MMR
🟡 query rewriting and expansion
🟡 hybrid dense + sparse retrieval
🔴 cross-encoder reranking trade-offs
🔴 multi-hop retrieval orchestration
🔴 retrieval evaluation and confidence gating

Self-check questions¶

For full Q&A and interview-style answers, see explainer §6.3.

Why does basic RAG often fail on multi-step business questions? (§1.2)
Rewrite vs expand vs decompose — when does each help? (§2.1-§2.3)
What is step-back prompting, and why can it improve recall? (§2.4)
Why can HyDE beat direct embedding of the user query? (§3.1)
Parent-child retrieval vs flat chunk retrieval — what trade-off changes? (§3.3)
Hybrid retrieval: why does dense + sparse usually beat either alone? (§3.4)
Why rerank top-K instead of cross-encoding the whole corpus? (§4.1)
What does MMR optimize that simple top-score sorting does not? (§4.4)
What is the confidence gate, and what actions can it trigger? (§5.2-§5.4)
Why is this module the bridge into agents and tool calling? (§6.5-§6.6)

Health check¶

[ ] All 6 explainer chapters read at least once
[ ] Can transform one raw question into rewrite + expansion + decomposition
[ ] Can explain HyDE, reranking, and MMR without notes
[ ] Assignment shipped with one corrective loop and eval results
[ ] Daily-recall prompts answerable from memory
[ ] Ready to start Module 09 with loop-thinking already internalized