04. Week 7 — Daily Recall¶

Spaced practice. Answer from memory. If you get stuck, jump to the explainer section in parentheses.

Monday (after ELI5 + Chapter 1)¶

In the library analogy, who are the librarian, bookshelf, index card, reading desk, and answer brief? (ELI5)
Why is a confident wrong answer about Q4 revenue more dangerous than a polite refusal? (§1.1)
Why do closed-book LLMs hallucinate on company-specific facts? (§1.2)
RAG in one sentence: what changes at answer time? (§1.4)
Explain the difference between a model sounding sure and a system being grounded. (§1.5)

Why can’t you just stuff the whole PDF into the prompt every time? (§2.1)
Chunk size trade-off: what breaks when chunks are too small? Too large? (§2.2)
What does overlap protect against? Give one concrete example. (§2.3)
Recursive splitting vs semantic splitting — what is the basic difference? (§2.4)
Same document chunked three ways: which version would best answer a precise refund-policy query? Why? (§2.5)
What metadata would you store beside each chunk? (§2.6)

What does an embedding represent in plain language? (§3.1)
Why do semantically similar sentences land close in embedding space? (§3.2)
Cosine similarity vs dot product — when do they behave the same? (§3.3)
Name four things you would check before choosing an embedding model. (§3.4)
HNSW in one sentence: what structure does it use? (§3.5)
IVF in one sentence: what idea makes it fast? (§3.5)
Why do numbers, negation, and acronyms often trip embeddings? (§3.6)

Draw the naive RAG pipeline from memory. (§4.1)
Give one failure mode for query understanding, retrieval, reranking, and prompt augmentation. (§4.2-§4.6)
Why is reranking useful even after you already have top-k retrieved chunks? (§4.5)
What should the augmented prompt force the model to do when evidence is missing? (§4.6)
Write one retrieval prompt that asks the model to rewrite a user query for search. (§4.8)
Why does RAG not automatically solve multi-hop reasoning? (§4.9)

Recall@k — what exactly are you counting? (§5.2)
MRR — what does it reward that recall@k ignores? (§5.3)
NDCG — why do we discount lower-ranked items? (§5.4)
Faithfulness vs answer relevance — explain the difference. (§5.5)
What does RAGAS measure well, and why is human review still needed? (§5.6)
Why do teams get fooled when they evaluate only answer fluency? (§5.1)

Sketch the failure-fix table from explainer §6.1 from memory.
Give a production latency budget for embed → retrieve → rerank → generate. (§6.4)
Answer five interview questions from explainer §6.3 without notes.
Which five basics must feel automatic before 09_advanced_rag_patterns? (§6.6)
Say the bridge sentence to the next module out loud. (§6.7)