01. Week 7 — RAG Fundamentals¶

Key concepts to master¶

RAG exists because closed-book LLMs guess. Retrieval gives fresh, local evidence.
Chunking is a retrieval decision. Bad chunks poison the whole pipeline.
Chunk size is a trade-off. Smaller chunks improve precision; larger chunks preserve context.
Overlap protects boundary facts. Without overlap, answers split across chunk edges disappear.
Recursive splitting is a strong baseline for messy documents.
Semantic splitting works best when prose meaning shifts gradually.
Embeddings encode meaning similarity, not exact truth.
Cosine similarity is the usual text default; dot product depends on normalization.
Vector stores are fast lookup systems for embeddings plus metadata.
HNSW is the default ANN choice for many real systems.
IVF trades some recall for speed through clustering.
Reranking improves precision after broad retrieval.
Faithfulness matters more than fluency in production RAG.
Recall@k, MRR, and NDCG tell you if retrieval is actually working.
RAG does not solve reasoning. Multi-hop questions still break naive systems.

🧠 Mental models¶

Chunking: "Cut a textbook into flashcards; too big blurs topics, too small loses context."
Overlap: "Leave a margin between pages so facts on the seam are copied twice."
Embeddings: "Place text on a map where nearby points mean similar meaning, not verified truth."
Vector stores: "A semantic filing cabinet indexed by embedding coordinates plus metadata."
HNSW: "Navigate a small-world subway map instead of walking every street."
RAG pipeline: "Open-book answering: fetch evidence first, write second."

⚠️ Common traps¶

Setting chunk size only by token count instead of document structure and likely answer spans.
Treating high similarity as proof a chunk is correct, current, or relevant to the user's tenant or version.
Forgetting metadata filters, so retrieval finds the right concept from the wrong policy or product release.
Skipping retrieval metrics and judging only final answer quality, which hides recall failures.
Sending many near-duplicate chunks to the generator and wasting context on repetition.

🔗 Prerequisites & connections¶

Builds on: Module 06 (Quantization & Finetuning) — the key decision is when retrieval is better than changing model weights. Feeds into: Module 08 (RAG Advanced) — chunking, embeddings, and basic retrieval become the substrate for reranking, rewriting, and multi-hop systems.

💬 Interview phrasing¶

"How would you chunk a long internal wiki for a support bot?"
"Why can a strong embedding model still produce bad RAG answers?"
"HNSW vs IVF — what trade-off is each making?"
"Walk me through the retrieval-to-generation pipeline and name one failure mode per stage."
"What metric would you use to prove retrieval improved, not just the wording of answers?"

⏱️ Difficulty markers¶

🟢 chunk size and overlap
🟢 embeddings and cosine similarity
🟡 recursive vs semantic splitting
🟡 vector stores and ANN indexes
🔴 recall@k vs MRR vs NDCG
🔴 faithfulness vs relevance in production RAG

Self-check questions¶

For full Q&A framing, see 02_explainer.md §6.3.

Why is a confident answer without sources dangerous in production? (§1.1, §1.4)
Why can one huge document chunk reduce retrieval quality? (§2.1, §2.2)
What does overlap fix, and what does too much overlap hurt? (§2.3)
Recursive splitting vs semantic splitting — when would you choose each? (§2.4)
What do embeddings preserve well, and what do they miss? (§3.1)
Cosine similarity vs dot product — when do they rank the same? (§3.3)
HNSW vs IVF — what trade-off does each make? (§3.5)
Name the stages of the RAG pipeline and one failure mode per stage. (§4.1-§4.7)
Why does reranking help even after retrieval? (§4.5)
Recall@k vs MRR vs NDCG — what does each reward? (§5.2-§5.4)
Faithfulness vs answer relevance — what is the difference? (§5.5)
Why is naive RAG weak on multi-hop reasoning? (§4.8)

Health check¶

[ ] Read all 6 explainer chapters at least once
[ ] Can explain chunk size, overlap, embeddings, and HNSW cold
[ ] Assignment shipped with at least one retrieval eval table
[ ] LinkedIn post #7 published
[ ] All daily-recall prompts answerable from memory
[ ] Failure-fix table from explainer §6.1 sketched without looking
[ ] Ready for 09_advanced_rag_patterns