01. Week 7 — RAG Fundamentals¶
Key concepts to master¶
- RAG exists because closed-book LLMs guess. Retrieval gives fresh, local evidence.
- Chunking is a retrieval decision. Bad chunks poison the whole pipeline.
- Chunk size is a trade-off. Smaller chunks improve precision; larger chunks preserve context.
- Overlap protects boundary facts. Without overlap, answers split across chunk edges disappear.
- Recursive splitting is a strong baseline for messy documents.
- Semantic splitting works best when prose meaning shifts gradually.
- Embeddings encode meaning similarity, not exact truth.
- Cosine similarity is the usual text default; dot product depends on normalization.
- Vector stores are fast lookup systems for embeddings plus metadata.
- HNSW is the default ANN choice for many real systems.
- IVF trades some recall for speed through clustering.
- Reranking improves precision after broad retrieval.
- Faithfulness matters more than fluency in production RAG.
- Recall@k, MRR, and NDCG tell you if retrieval is actually working.
- RAG does not solve reasoning. Multi-hop questions still break naive systems.
🧠 Mental models¶
- Chunking: "Cut a textbook into flashcards; too big blurs topics, too small loses context."
- Overlap: "Leave a margin between pages so facts on the seam are copied twice."
- Embeddings: "Place text on a map where nearby points mean similar meaning, not verified truth."
- Vector stores: "A semantic filing cabinet indexed by embedding coordinates plus metadata."
- HNSW: "Navigate a small-world subway map instead of walking every street."
- RAG pipeline: "Open-book answering: fetch evidence first, write second."
⚠️ Common traps¶
- Setting chunk size only by token count instead of document structure and likely answer spans.
- Treating high similarity as proof a chunk is correct, current, or relevant to the user's tenant or version.
- Forgetting metadata filters, so retrieval finds the right concept from the wrong policy or product release.
- Skipping retrieval metrics and judging only final answer quality, which hides recall failures.
- Sending many near-duplicate chunks to the generator and wasting context on repetition.
🔗 Prerequisites & connections¶
Builds on: Module 06 (Quantization & Finetuning) — the key decision is when retrieval is better than changing model weights. Feeds into: Module 08 (RAG Advanced) — chunking, embeddings, and basic retrieval become the substrate for reranking, rewriting, and multi-hop systems.
💬 Interview phrasing¶
- "How would you chunk a long internal wiki for a support bot?"
- "Why can a strong embedding model still produce bad RAG answers?"
- "HNSW vs IVF — what trade-off is each making?"
- "Walk me through the retrieval-to-generation pipeline and name one failure mode per stage."
- "What metric would you use to prove retrieval improved, not just the wording of answers?"
⏱️ Difficulty markers¶
- 🟢 chunk size and overlap
- 🟢 embeddings and cosine similarity
- 🟡 recursive vs semantic splitting
- 🟡 vector stores and ANN indexes
- 🔴 recall@k vs MRR vs NDCG
- 🔴 faithfulness vs relevance in production RAG
Self-check questions¶
For full Q&A framing, see 02_explainer.md §6.3.
- Why is a confident answer without sources dangerous in production? (§1.1, §1.4)
- Why can one huge document chunk reduce retrieval quality? (§2.1, §2.2)
- What does overlap fix, and what does too much overlap hurt? (§2.3)
- Recursive splitting vs semantic splitting — when would you choose each? (§2.4)
- What do embeddings preserve well, and what do they miss? (§3.1)
- Cosine similarity vs dot product — when do they rank the same? (§3.3)
- HNSW vs IVF — what trade-off does each make? (§3.5)
- Name the stages of the RAG pipeline and one failure mode per stage. (§4.1-§4.7)
- Why does reranking help even after retrieval? (§4.5)
- Recall@k vs MRR vs NDCG — what does each reward? (§5.2-§5.4)
- Faithfulness vs answer relevance — what is the difference? (§5.5)
- Why is naive RAG weak on multi-hop reasoning? (§4.8)
Health check¶
- [ ] Read all 6 explainer chapters at least once
- [ ] Can explain chunk size, overlap, embeddings, and HNSW cold
- [ ] Assignment shipped with at least one retrieval eval table
- [ ] LinkedIn post #7 published
- [ ] All daily-recall prompts answerable from memory
- [ ] Failure-fix table from explainer §6.1 sketched without looking
- [ ] Ready for
09_advanced_rag_patterns