05. Assignment 8 — Advanced RAG System with a Corrective Loop¶
Week 8. Build a production-style advanced RAG system. Not just “retrieve once and answer.” Use at least one retry or routing decision.
Required reading first:
02_explainer.mdchapters 2-5. Your system should visibly implement the rewriter, the cross-checker, and the confidence gate. Bonus if you also add the hypothesis or the multi-step plan.
Goal¶
Build a RAG system over a corpus where hard questions actually matter. Good choices: - financial reports - policy / handbook documents - product documentation - support articles with metadata - engineering design docs
Your system must answer multi-step or constraint-heavy questions better than a naive baseline.
Required architecture¶
User query
↓
Query transformation
├─ rewrite
├─ optional expansion
└─ optional decomposition
↓
Retrieval
├─ dense
├─ sparse / BM25
└─ optional parent-child / HyDE
↓
Fusion
↓
Reranker
↓
Answer draft with citations
↓
Confidence gate
├─ answer
├─ retry with different retrieval
└─ abstain
Required features¶
- Naive baseline for comparison
- one query
- one retriever
- no reranker
- no retry loop
- Advanced path with:
- query rewriting
- hybrid retrieval (dense + sparse) or HyDE
- reranking
- metadata filtering or MMR
- confidence gate that can retry or abstain
- Evidence-aware answering
- every major claim should cite retrieved sources
- Eval set
- 40-60 gold queries
- include at least 15 multi-step questions
- include at least 10 questions where metadata matters
Strongly recommended extras¶
- query decomposition for compare / contrast questions
- parent-child retrieval for long documents
- route broad conceptual questions to step-back prompting first
- log which path was chosen for each query
Required deliverables¶
ingest.py— chunk, embed, and index documentsretrieve.py— retrieval stack and fusion logicanswer.py— prompting, citations, confidence gateevals/gold_queries.json— question set with expected behavior notesevals/run_evals.py— evaluation harnessREADME.md— architecture, diagrams, failure modes, before/after comparisonEVAL.md— metrics table + qualitative error analysis
Suggested evaluation method¶
Track both retrieval and answer quality.
Retrieval-side¶
- Recall@k
- MRR or NDCG
- % of answers where a supporting chunk appears in top-k
Answer-side¶
- faithfulness
- answer relevancy
- abstain quality on unsupported questions
- citation correctness
Comparison table¶
| System | Recall@10 | NDCG@10 | Faithfulness | Notes |
|---|---|---|---|---|
| Naive RAG | __ | __ | __ | baseline |
| Advanced RAG | __ | __ | __ | rewrite + rerank + gate |
Minimum success bar¶
- Advanced system beats naive baseline on hard queries
- At least one retry path is triggered on real examples
- Citations are visible and useful
- Failure analysis is honest
- README is strong enough to discuss in interviews
Hints¶
- Start with the corpus and eval set, not with fancy libraries.
- If exact names matter, add sparse retrieval early.
- If questions are long and noisy, add rewriting early.
- If answers look plausible but wrong, add reranking and a stronger confidence gate.
- If one answer needs many facts, try decomposition before more prompt engineering.
Common pitfalls¶
- Query rewriting that accidentally removes constraints
- Using HyDE without checking drift on easy factual questions
- No metadata filters, so the right quarter / region never gets isolated
- Reranking only three documents, so it cannot rescue anything
- Retry loop without a stop rule
- Reporting only answer quality and ignoring retrieval quality
What to demonstrate in your writeup¶
- Which failure you targeted first
- What changed after rewrite / hybrid retrieval / reranking
- One example where the confidence gate saved you
- One example where advanced RAG still failed
- What you would do next with more time
LinkedIn post template¶
“This week I upgraded a naive RAG system into an advanced RAG system.
Biggest lesson: better answers came from better retrieval decisions, not bigger prompts.
I added: [rewrite] + [hybrid retrieval] + [reranker] + [confidence gate].
On [N] gold queries, the advanced path beat the baseline on the hardest multi-step questions.
Repo: [link]”
Why this hands_on_lab matters¶
Module 09 will turn this loop into a full agent. If you can already transform a query, choose a retrieval strategy, inspect evidence, retry, and stop responsibly, you already understand the core control pattern.