Skip to content

03. Week 8 — Advanced RAG

For deep understanding see 02_explainer.md — narrative with failures, step-by-step transformations, diagrams, and the Module 09 bridge. This file is the quick-reference glossary: patterns, tables, formulas, and prompts.

Section 1 — Why basic RAG plateaus

Basic RAG is usually:

User query
Embed once
Top-k vector search
Stuff chunks into prompt
Answer

This works for direct factual questions. It breaks when the question is ambiguous, underspecified, multi-hop, or metadata-heavy.

Failure mode Why it happens Better pattern
Wrong entity Query lacks exact names / aliases Rewrite + expansion
Multi-hop question One retrieval call cannot satisfy all hops Decompose
Missing keyword match Dense search ignores rare exact tokens Hybrid dense + sparse
Duplicate chunks Retriever returns near-identical passages MMR
Good docs buried at rank 8 Retriever is cheap, not precise Cross-encoder reranking
Low-confidence answer Retrieval looked weak Confidence gate + retry

See explainer chapter 1.

Section 2 — Mental model

Module 07 gave you the librarian. Module 08 promotes that librarian into a head researcher.

Named placeholders from the explainer: - the rewriter — query rewriting - the hypothesis — HyDE - the cross-checker — reranker - the multi-step plan — decomposition - the confidence gate — self-evaluation before answer / retry

Core loop:

Question
Transform query
Retrieve with one or more strategies
Cross-check / filter
Answer draft
Confidence gate
  ├─ high confidence → answer
  └─ low confidence  → retry / reroute / abstain

Section 3 — Query transformation taxonomy

Pattern What it does Best when Risk
Rewrite Make the query retrieval-friendly User asks vaguely LLM removes an important constraint
Expand Add synonyms, aliases, nearby terms Domain has jargon / abbreviations Recall improves, precision may drop
Decompose Split one hard question into sub-queries Multi-hop or compare/contrast More latency, orchestration needed
Step-back Ask for higher-level principle first User asks narrow symptom, corpus stores general policy Can become too abstract

Worked example

User query:

“Compare Q3 and Q4 revenue growth across all regions.”

Transformation stack:

  1. Rewrite
  2. “Compare revenue growth percentages for Q3 and Q4 for APAC, EMEA, LATAM, and North America.”
  3. Expand
  4. Add synonyms: revenue, sales, topline
  5. Add aliases: NA = North America
  6. Decompose
  7. “What was Q3 revenue growth by region?”
  8. “What was Q4 revenue growth by region?”
  9. “Compute deltas and summarize winners / losers.”
  10. Step-back
  11. “Which documents summarize quarterly regional revenue performance?”

See explainer chapter 2.

Section 4 — HyDE

HyDE = Hypothetical Document Embeddings.

Process:

  1. Ask an LLM to draft a plausible answer paragraph.
  2. Embed that draft.
  3. Retrieve real documents using that embedding.
  4. Throw away the draft.
  5. Answer only from retrieved evidence.

Why it can help: - User questions are often short and messy. - Good answer passages are longer and semantically richer. - The hypothetical answer sits closer to the real answer neighborhood.

When it shines: - Conceptual questions - Ambiguous phrasing - Questions with implied terminology

When it hurts: - Easy factual lookup already phrased clearly - When the hypothetical answer drifts into the wrong concept

See explainer §3.1-§3.2.

Section 5 — Advanced retrieval patterns

Parent-child retrieval

Index small child chunks for precision. Return larger parent chunks or document windows for context.

Parent doc
  ├─ child 1
  ├─ child 2
  ├─ child 3
  └─ child 4

Retrieve on child. Answer with parent.

Why it helps: - Small chunks match better. - Larger parents prevent context starvation.

Fusion retrieval

Combine different retrieval signals. Most common combination: dense + sparse.

query
 ├─ dense retriever  → semantic matches
 └─ sparse retriever → exact-token matches
        fuse ranks
        rerank

Common fusion method: RRF — Reciprocal Rank Fusion

score(doc) = Σ 1 / (k + rank_i(doc))

You do not need score calibration. You only need rankings.

See explainer §3.3-§3.4.

Section 6 — Reranking and filtering

Bi-encoder vs cross-encoder

Model How it scores Speed Quality
Bi-encoder retriever Embed query and docs separately Fast Good recall
Cross-encoder reranker Read query+doc together Slower Better precision

Pattern: - Retrieve top-K cheaply. - Rerank to top-N precisely.

Typical values: - top-K = 20 to 100 - top-N = 3 to 10

Metadata filtering

Use before or after retrieval. Examples: - date range - product line - document type - access scope - region

MMR — Maximal Marginal Relevance

MMR balances relevance and diversity.

MMR = λ * relevance - (1 - λ) * redundancy

High λ → more relevance. Low λ → more diversity.

Use when top results are near-duplicates. See explainer chapter 4.

Section 7 — Agentic RAG patterns

Pattern Core idea Trigger
CRAG Evaluate retrieved context, then correct if weak Retrieval quality looks low
Self-RAG Model learns to retrieve, critique, and refine Multi-step answer generation
Iterative retrieval Search, read, search again Answer requires another hop
Routing Choose one retrieval strategy from many Different query types need different tools

Possible routes: - FAQ → sparse-heavy - policy question → hybrid + metadata filters - compare/contrast → decomposition + reranking - broad conceptual query → step-back + HyDE

See explainer chapter 5.

Section 8 — Confidence gate

The confidence gate asks: - Did retrieval return enough support? - Are the top documents mutually consistent? - Did the answer cite evidence for each claim? - Did the model actually answer all sub-parts?

Actions: - answer - retry with rewrite - switch retriever - decompose - abstain politely

This is the immediate conceptual bridge to Module 09. Agents will use the same pattern across many tools.

Section 9 — Retrieval prompts you can reuse

Prompt 1 — Query rewriter

Rewrite the user question for retrieval.
Preserve all constraints.
Return:
1. rewritten_query
2. synonyms_or_aliases
3. metadata_filters

Prompt 2 — Decomposer

Break the question into minimal sub-queries needed to answer it.
Each sub-query should be independently retrievable.
Return them in dependency order.

Prompt 3 — Confidence gate

Given the question, retrieved context, and draft answer,
judge whether the evidence supports the answer.
Return one of: answer, retry, abstain.
If retry, say which retrieval change to try next.

Prompt 4 — Step-back helper

State the higher-level concept or policy that governs this question.
Then produce one broad retrieval query and one specific retrieval query.

See explainer §5.5.

Section 10 — Practical knobs

Knob Effect Common failure
chunk size recall vs precision too small loses context
top-K candidate pool size too low misses good docs
rerank top-N prompt budget too high adds noise
hybrid weight / fusion dense vs sparse balance exact entities get lost
metadata filters search scope over-filtering removes useful docs
MMR lambda novelty vs relevance duplicates dominate
retry budget reliability vs latency endless loop

Section 11 — Foundation-gap audit for Module 09

Module 09 assumes you already understand: - advanced retrieval patterns - when to iterate / loop - self-evaluation as a control step - retrieval behaving like a tool rather than a fixed pipeline

If these feel weak, re-read explainer chapters 4-5 before moving on.

Reading list

  1. HyDE paper / summaries
  2. One reranking implementation guide
  3. One hybrid-search or fusion-search guide
  4. One CRAG or self-RAG overview

Self-check

For full answers, see explainer §6.3.

  1. Why is raw user language often a bad retrieval query? (§2.1)
  2. Rewrite, expansion, decomposition, step-back — how do they differ? (§2.1-§2.4)
  3. Why can HyDE improve semantic recall? (§3.1)
  4. Parent-child retrieval — why not just index parent chunks directly? (§3.3)
  5. Why does dense + sparse beat dense-only on many enterprise corpora? (§3.4)
  6. Why rerank after retrieval instead of before retrieval? (§4.1)
  7. What does MMR solve in one sentence? (§4.4)
  8. What is the confidence gate allowed to do? (§5.2-§5.4)
  9. Why is this module really about loops and tools, not just better search? (§6.5-§6.6)