Skip to content

00. Advanced RAG — The Five-Year-Old Version

Module 07 gave you basic retrieval. This module teaches the retrieval system to think before it searches.


Imagine the librarian from Module 07 got promoted.

Now that person is the head researcher for the whole library.

A child asks a question, and the head researcher does not run immediately.

First, they clean the question so the shelves can understand it better.

That helper is the rewriter.

Then they imagine what a good answer might sound like.

Not to trust the imaginary answer.

Only to search in the right neighborhood.

That helper is the hypothesis.

Sometimes one question is secretly three smaller questions.

The head researcher breaks it apart, answers each part, and then combines the pieces.

That helper is the multi-step plan.

Then many pages come back from the shelves.

Some are related, but only a few are truly useful.

So the head researcher reads the question and each page together.

That helper is the cross-checker.

Finally, the draft answer is not trusted blindly.

The head researcher asks, “Do we really have enough proof, or should we search again?”

That last stoplight is the confidence gate.

See.

Basic RAG says, “Search once and answer.”

Advanced RAG says, “Think a little, search better, check again, then answer.”

Simple, no?

This whole module is about that promotion.

You will learn how to reshape messy questions.

You will learn how to search by imagined answers.

You will learn how to split hard questions into smaller ones.

You will learn how to mix dense and sparse search.

You will learn how to rerank, filter, retry, route, and abstain.

That is why advanced RAG feels less like a lookup box.

It feels more like a careful research workflow.


The placeholders you will see called back

Placeholder Meaning
the rewriter query transformation — reshaping the question for better search
the hypothesis HyDE — imagining an answer to guide retrieval
the multi-step plan decomposition — splitting complex queries into sub-questions
the cross-checker reranker — second-pass deep scoring of candidates
the confidence gate self-evaluation — deciding to answer or search again
the contents map vectorless RAG — reading the table of contents and reasoning which section to open, instead of matching vectors

Top resources


What's coming

  1. 01-opening-failure.md — why one-shot retrieval collapses under strong generation.

  2. 02-query-rewriting.md — how to preserve intent while making search easier.

  3. 03-query-expansion.md — how multiple query variants widen recall.

  4. 04-query-decomposition.md — how multi-hop questions become smaller, solvable hops.

  5. 05-hyde-hypothetical-embeddings.md — how imagined answers can pull real evidence closer.

  6. 06-parent-child-retrieval.md — how precise chunks and wide context cooperate.

  7. 07-hybrid-retrieval.md — how dense and sparse search cover each other’s blind spots.

  8. 08-cross-encoder-reranking.md — how a deeper second pass rescues the shortlist.

  9. 09-metadata-filtering-mmr.md — how filters and diversity stop noisy duplication.

  10. 10-corrective-rag.md — how systems judge retrieval quality before trusting it.

  11. 11-iterative-retrieval.md — how search becomes a loop instead of a single shot.

  12. 12-routing-strategies.md — how different queries take different retrieval paths.

  13. 13-confidence-gates.md — how the system decides to answer, retry, or abstain.

  14. 14-vectorless-rag.md — how reasoning over document structure replaces similarity search.

  15. 15-honest-admission.md — what even advanced RAG still cannot solve cleanly.


Memory map

Concept Prerequisite Pressure family Recurs later as Layer touched
Opening evidence failure basic RAG pipeline faithfulness, data quality confidence gates and honest admission user complaint -> retrieval -> answer risk
Query rewriting raw query retrieval ambiguity, latency routing pre-step and corrective retry API text -> retriever input -> trace
Query expansion rewriting recall, cost iterative retrieval branches query planner -> index fanout -> reranker load
Query decomposition expansion reasoning, coordination agent planning and graph traversal user task -> sub-question plan -> synthesis
HyDE embeddings semantic mismatch generated retrieval probes prompt -> embedding space -> candidate set
Parent-child retrieval chunking locality, context budget citation pack construction index chunk -> parent document -> prompt window
Hybrid retrieval dense retrieval + sparse search coverage, literal precision reranking candidate pool vector index + lexical index -> fusion
Cross-encoder reranking candidate generation bounded compute, precision answer evidence ordering model scoring -> latency -> quality signal
Metadata filtering and MMR ranking data quality, diversity route-specific retrieval policies metadata schema -> shortlist composition
Corrective and iterative RAG retrieval metrics feedback loops, operator attention self-reflective agent loops evaluator -> retry route -> logs
Routing and confidence gates all prior controls cost, safety, trust production fallback policy product API -> policy -> escalation

Bridge. Basic RAG fetches once and hopes for the best. The first thing we need to understand is why that hope breaks so fast. → 01-opening-failure.md