01. Week 8 — Advanced RAG¶
Key concepts to master¶
- Basic RAG fails when the user query is not retrieval-shaped.
- Query rewriting turns user language into retriever language.
- Query expansion improves recall with synonyms, aliases, and missing entities.
- Query decomposition breaks multi-hop questions into answerable sub-queries.
- Step-back prompting asks for the higher-level principle before searching details.
- HyDE embeds a hypothetical answer, not the raw question.
- Parent-child retrieval keeps chunk precision without losing document-level context.
- Fusion retrieval combines dense and sparse search, then merges rankings.
- Cross-encoder reranking boosts precision after cheap retrieval.
- Metadata filtering and MMR reduce junk and duplicate context.
- Corrective loops decide when to retry, switch strategy, or abstain.
- The confidence gate is the precursor to agent self-evaluation in Module 09.
🧠 Mental models¶
- Query rewriting: "Translate user language into the retriever's dialect."
- HyDE: "Write a plausible answer ghost first, then search for documents that resemble it."
- Hybrid search: "Use both a keyword flashlight and a semantic compass."
- Reranking: "Do expensive background checks only on the shortlist."
- Multi-hop retrieval: "Cross the river stone by stone; each retrieved fact enables the next jump."
- Confidence gate: "A circuit breaker that stops the system from bluffing when evidence is weak."
⚠️ Common traps¶
- Stacking rewrite, expansion, and decomposition blindly and drifting farther from the real need.
- Reranking too many candidates, which can dominate latency and erase ANN speed gains.
- Optimizing precision on easy head queries while recall collapses on messy long-tail questions.
- Evaluating only final answers and never labeling whether retrieval itself found the right evidence.
- Letting corrective loops retry indefinitely instead of abstaining, switching strategy, or escalating.
🔗 Prerequisites & connections¶
Builds on: Module 07 (RAG Fundamentals) — basic chunking, embeddings, ANN retrieval, and faithfulness metrics are assumed. Feeds into: Module 09 (Agents & Tool Calling) — corrective loops, confidence gates, and multi-step retrieval become agent planning patterns.
💬 Interview phrasing¶
- "Baseline RAG fails on multi-step business questions. What would you add first and why?"
- "When does HyDE help, and when can it make retrieval worse?"
- "Why does hybrid search usually beat dense-only retrieval in production?"
- "How would you evaluate retrieval quality separately from generation quality?"
- "When should the system retry retrieval versus answer 'I don't know'?"
⏱️ Difficulty markers¶
- 🟢 metadata filtering and MMR
- 🟡 query rewriting and expansion
- 🟡 hybrid dense + sparse retrieval
- 🔴 cross-encoder reranking trade-offs
- 🔴 multi-hop retrieval orchestration
- 🔴 retrieval evaluation and confidence gating
Self-check questions¶
For full Q&A and interview-style answers, see explainer §6.3.
- Why does basic RAG often fail on multi-step business questions? (§1.2)
- Rewrite vs expand vs decompose — when does each help? (§2.1-§2.3)
- What is step-back prompting, and why can it improve recall? (§2.4)
- Why can HyDE beat direct embedding of the user query? (§3.1)
- Parent-child retrieval vs flat chunk retrieval — what trade-off changes? (§3.3)
- Hybrid retrieval: why does dense + sparse usually beat either alone? (§3.4)
- Why rerank top-K instead of cross-encoding the whole corpus? (§4.1)
- What does MMR optimize that simple top-score sorting does not? (§4.4)
- What is the confidence gate, and what actions can it trigger? (§5.2-§5.4)
- Why is this module the bridge into agents and tool calling? (§6.5-§6.6)
Health check¶
- [ ] All 6 explainer chapters read at least once
- [ ] Can transform one raw question into rewrite + expansion + decomposition
- [ ] Can explain HyDE, reranking, and MMR without notes
- [ ] Assignment shipped with one corrective loop and eval results
- [ ] Daily-recall prompts answerable from memory
- [ ] Ready to start Module 09 with loop-thinking already internalized