05. Assignment 8 — Advanced RAG System with a Corrective Loop¶

Week 8. Build a production-style advanced RAG system. Not just “retrieve once and answer.” Use at least one retry or routing decision.

Required reading first: 02_explainer.md chapters 2-5. Your system should visibly implement the rewriter, the cross-checker, and the confidence gate. Bonus if you also add the hypothesis or the multi-step plan.

Goal¶

Build a RAG system over a corpus where hard questions actually matter. Good choices: - financial reports - policy / handbook documents - product documentation - support articles with metadata - engineering design docs

Your system must answer multi-step or constraint-heavy questions better than a naive baseline.

Required architecture¶

User query
  ↓
Query transformation
  ├─ rewrite
  ├─ optional expansion
  └─ optional decomposition
  ↓
Retrieval
  ├─ dense
  ├─ sparse / BM25
  └─ optional parent-child / HyDE
  ↓
Fusion
  ↓
Reranker
  ↓
Answer draft with citations
  ↓
Confidence gate
  ├─ answer
  ├─ retry with different retrieval
  └─ abstain

Required features¶

Naive baseline for comparison
one query
one retriever
no reranker
no retry loop
Advanced path with:
query rewriting
hybrid retrieval (dense + sparse) or HyDE
reranking
metadata filtering or MMR
confidence gate that can retry or abstain
Evidence-aware answering
every major claim should cite retrieved sources
Eval set
40-60 gold queries
include at least 15 multi-step questions
include at least 10 questions where metadata matters

Strongly recommended extras¶

query decomposition for compare / contrast questions
parent-child retrieval for long documents
route broad conceptual questions to step-back prompting first
log which path was chosen for each query

Required deliverables¶

ingest.py — chunk, embed, and index documents
retrieve.py — retrieval stack and fusion logic
answer.py — prompting, citations, confidence gate
evals/gold_queries.json — question set with expected behavior notes
evals/run_evals.py — evaluation harness
README.md — architecture, diagrams, failure modes, before/after comparison
EVAL.md — metrics table + qualitative error analysis

Suggested evaluation method¶

Track both retrieval and answer quality.

Retrieval-side¶

Recall@k
MRR or NDCG
% of answers where a supporting chunk appears in top-k

Answer-side¶

faithfulness
answer relevancy
abstain quality on unsupported questions
citation correctness

Comparison table¶

System	Recall@10	NDCG@10	Faithfulness	Notes
Naive RAG	__	__	__	baseline
Advanced RAG	__	__	__	rewrite + rerank + gate

Minimum success bar¶

Advanced system beats naive baseline on hard queries
At least one retry path is triggered on real examples
Citations are visible and useful
Failure analysis is honest
README is strong enough to discuss in interviews

Hints¶

Start with the corpus and eval set, not with fancy libraries.
If exact names matter, add sparse retrieval early.
If questions are long and noisy, add rewriting early.
If answers look plausible but wrong, add reranking and a stronger confidence gate.
If one answer needs many facts, try decomposition before more prompt engineering.

Common pitfalls¶

Query rewriting that accidentally removes constraints
Using HyDE without checking drift on easy factual questions
No metadata filters, so the right quarter / region never gets isolated
Reranking only three documents, so it cannot rescue anything
Retry loop without a stop rule
Reporting only answer quality and ignoring retrieval quality

What to demonstrate in your writeup¶

Which failure you targeted first
What changed after rewrite / hybrid retrieval / reranking
One example where the confidence gate saved you
One example where advanced RAG still failed
What you would do next with more time

LinkedIn post template¶

“This week I upgraded a naive RAG system into an advanced RAG system.

Biggest lesson: better answers came from better retrieval decisions, not bigger prompts.

I added: [rewrite] + [hybrid retrieval] + [reranker] + [confidence gate].

On [N] gold queries, the advanced path beat the baseline on the hardest multi-step questions.

Repo: [link]”

Why this hands_on_lab matters¶

Module 09 will turn this loop into a full agent. If you can already transform a query, choose a retrieval strategy, inspect evidence, retry, and stop responsibly, you already understand the core control pattern.