Skip to content

05. Assignment 8 — Advanced RAG System with a Corrective Loop

Week 8. Build a production-style advanced RAG system. Not just “retrieve once and answer.” Use at least one retry or routing decision.

Required reading first: 02_explainer.md chapters 2-5. Your system should visibly implement the rewriter, the cross-checker, and the confidence gate. Bonus if you also add the hypothesis or the multi-step plan.

Goal

Build a RAG system over a corpus where hard questions actually matter. Good choices: - financial reports - policy / handbook documents - product documentation - support articles with metadata - engineering design docs

Your system must answer multi-step or constraint-heavy questions better than a naive baseline.

Required architecture

User query
Query transformation
  ├─ rewrite
  ├─ optional expansion
  └─ optional decomposition
Retrieval
  ├─ dense
  ├─ sparse / BM25
  └─ optional parent-child / HyDE
Fusion
Reranker
Answer draft with citations
Confidence gate
  ├─ answer
  ├─ retry with different retrieval
  └─ abstain

Required features

  1. Naive baseline for comparison
  2. one query
  3. one retriever
  4. no reranker
  5. no retry loop
  6. Advanced path with:
  7. query rewriting
  8. hybrid retrieval (dense + sparse) or HyDE
  9. reranking
  10. metadata filtering or MMR
  11. confidence gate that can retry or abstain
  12. Evidence-aware answering
  13. every major claim should cite retrieved sources
  14. Eval set
  15. 40-60 gold queries
  16. include at least 15 multi-step questions
  17. include at least 10 questions where metadata matters
  • query decomposition for compare / contrast questions
  • parent-child retrieval for long documents
  • route broad conceptual questions to step-back prompting first
  • log which path was chosen for each query

Required deliverables

  1. ingest.py — chunk, embed, and index documents
  2. retrieve.py — retrieval stack and fusion logic
  3. answer.py — prompting, citations, confidence gate
  4. evals/gold_queries.json — question set with expected behavior notes
  5. evals/run_evals.py — evaluation harness
  6. README.md — architecture, diagrams, failure modes, before/after comparison
  7. EVAL.md — metrics table + qualitative error analysis

Suggested evaluation method

Track both retrieval and answer quality.

Retrieval-side

  • Recall@k
  • MRR or NDCG
  • % of answers where a supporting chunk appears in top-k

Answer-side

  • faithfulness
  • answer relevancy
  • abstain quality on unsupported questions
  • citation correctness

Comparison table

System Recall@10 NDCG@10 Faithfulness Notes
Naive RAG __ __ __ baseline
Advanced RAG __ __ __ rewrite + rerank + gate

Minimum success bar

  • Advanced system beats naive baseline on hard queries
  • At least one retry path is triggered on real examples
  • Citations are visible and useful
  • Failure analysis is honest
  • README is strong enough to discuss in interviews

Hints

  • Start with the corpus and eval set, not with fancy libraries.
  • If exact names matter, add sparse retrieval early.
  • If questions are long and noisy, add rewriting early.
  • If answers look plausible but wrong, add reranking and a stronger confidence gate.
  • If one answer needs many facts, try decomposition before more prompt engineering.

Common pitfalls

  • Query rewriting that accidentally removes constraints
  • Using HyDE without checking drift on easy factual questions
  • No metadata filters, so the right quarter / region never gets isolated
  • Reranking only three documents, so it cannot rescue anything
  • Retry loop without a stop rule
  • Reporting only answer quality and ignoring retrieval quality

What to demonstrate in your writeup

  • Which failure you targeted first
  • What changed after rewrite / hybrid retrieval / reranking
  • One example where the confidence gate saved you
  • One example where advanced RAG still failed
  • What you would do next with more time

LinkedIn post template

“This week I upgraded a naive RAG system into an advanced RAG system.

Biggest lesson: better answers came from better retrieval decisions, not bigger prompts.

I added: [rewrite] + [hybrid retrieval] + [reranker] + [confidence gate].

On [N] gold queries, the advanced path beat the baseline on the hardest multi-step questions.

Repo: [link]”

Why this hands_on_lab matters

Module 09 will turn this loop into a full agent. If you can already transform a query, choose a retrieval strategy, inspect evidence, retry, and stop responsibly, you already understand the core control pattern.