00. Advanced RAG — The Five-Year-Old Version¶
Module 07 gave you basic retrieval. This module teaches the retrieval system to think before it searches.
Imagine the librarian from Module 07 got promoted.
Now that person is the head researcher for the whole library.
A child asks a question, and the head researcher does not run immediately.
First, they clean the question so the shelves can understand it better.
That helper is the rewriter.
Then they imagine what a good answer might sound like.
Not to trust the imaginary answer.
Only to search in the right neighborhood.
That helper is the hypothesis.
Sometimes one question is secretly three smaller questions.
The head researcher breaks it apart, answers each part, and then combines the pieces.
That helper is the multi-step plan.
Then many pages come back from the shelves.
Some are related, but only a few are truly useful.
So the head researcher reads the question and each page together.
That helper is the cross-checker.
Finally, the draft answer is not trusted blindly.
The head researcher asks, “Do we really have enough proof, or should we search again?”
That last stoplight is the confidence gate.
See.
Basic RAG says, “Search once and answer.”
Advanced RAG says, “Think a little, search better, check again, then answer.”
Simple, no?
This whole module is about that promotion.
You will learn how to reshape messy questions.
You will learn how to search by imagined answers.
You will learn how to split hard questions into smaller ones.
You will learn how to mix dense and sparse search.
You will learn how to rerank, filter, retry, route, and abstain.
That is why advanced RAG feels less like a lookup box.
It feels more like a careful research workflow.
The placeholders you will see called back¶
| Placeholder | Meaning |
|---|---|
| the rewriter | query transformation — reshaping the question for better search |
| the hypothesis | HyDE — imagining an answer to guide retrieval |
| the multi-step plan | decomposition — splitting complex queries into sub-questions |
| the cross-checker | reranker — second-pass deep scoring of candidates |
| the confidence gate | self-evaluation — deciding to answer or search again |
| the contents map | vectorless RAG — reading the table of contents and reasoning which section to open, instead of matching vectors |
Top resources¶
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — the original framing for retrieval plus generation.
-
HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels — the paper behind hypothetical answer search.
-
Self-RAG — retrieval, critique, and refinement inside one loop.
-
Corrective Retrieval Augmented Generation — retrieval quality checks with correction paths.
-
Pinecone Hybrid Search Guide — dense plus sparse retrieval in practice.
-
Cohere Rerank Docs — practical second-pass ranking with cross-encoders.
-
LangChain MultiQuery Retriever — one way to implement query expansion.
What's coming¶
-
01-opening-failure.md — why one-shot retrieval collapses under strong generation.
-
02-query-rewriting.md — how to preserve intent while making search easier.
-
03-query-expansion.md — how multiple query variants widen recall.
-
04-query-decomposition.md — how multi-hop questions become smaller, solvable hops.
-
05-hyde-hypothetical-embeddings.md — how imagined answers can pull real evidence closer.
-
06-parent-child-retrieval.md — how precise chunks and wide context cooperate.
-
07-hybrid-retrieval.md — how dense and sparse search cover each other’s blind spots.
-
08-cross-encoder-reranking.md — how a deeper second pass rescues the shortlist.
-
09-metadata-filtering-mmr.md — how filters and diversity stop noisy duplication.
-
10-corrective-rag.md — how systems judge retrieval quality before trusting it.
-
11-iterative-retrieval.md — how search becomes a loop instead of a single shot.
-
12-routing-strategies.md — how different queries take different retrieval paths.
-
13-confidence-gates.md — how the system decides to answer, retry, or abstain.
-
14-vectorless-rag.md — how reasoning over document structure replaces similarity search.
-
15-honest-admission.md — what even advanced RAG still cannot solve cleanly.
Memory map¶
| Concept | Prerequisite | Pressure family | Recurs later as | Layer touched |
|---|---|---|---|---|
| Opening evidence failure | basic RAG pipeline | faithfulness, data quality | confidence gates and honest admission | user complaint -> retrieval -> answer risk |
| Query rewriting | raw query retrieval | ambiguity, latency | routing pre-step and corrective retry | API text -> retriever input -> trace |
| Query expansion | rewriting | recall, cost | iterative retrieval branches | query planner -> index fanout -> reranker load |
| Query decomposition | expansion | reasoning, coordination | agent planning and graph traversal | user task -> sub-question plan -> synthesis |
| HyDE | embeddings | semantic mismatch | generated retrieval probes | prompt -> embedding space -> candidate set |
| Parent-child retrieval | chunking | locality, context budget | citation pack construction | index chunk -> parent document -> prompt window |
| Hybrid retrieval | dense retrieval + sparse search | coverage, literal precision | reranking candidate pool | vector index + lexical index -> fusion |
| Cross-encoder reranking | candidate generation | bounded compute, precision | answer evidence ordering | model scoring -> latency -> quality signal |
| Metadata filtering and MMR | ranking | data quality, diversity | route-specific retrieval policies | metadata schema -> shortlist composition |
| Corrective and iterative RAG | retrieval metrics | feedback loops, operator attention | self-reflective agent loops | evaluator -> retry route -> logs |
| Routing and confidence gates | all prior controls | cost, safety, trust | production fallback policy | product API -> policy -> escalation |
Bridge. Basic RAG fetches once and hopes for the best. The first thing we need to understand is why that hope breaks so fast. → 01-opening-failure.md