06. Module 03 Review — Transformer Architecture¶
Focus: residual stream, residual connections, layer normalization, pre-norm transformer block, encoder-decoder vs decoder-only, causal masking, and KV cache.
Review loop¶
- Skim the TOC in
02_explainer.md, then re-read any weak chapter. - Re-answer the self-check questions in
01_weekly_plan.mdwithout notes. - Re-do the hardest prompts in
04_daily_recall.mdaloud. - Draw the pre-norm block from explainer §4.2 on blank paper.
- Rebuild the failure-fix table from explainer §6.1 with at least eight rows.
- Re-open
05_hands_on_lab.mdand confirm your code matches the conceptual diagram.
Reflection¶
- Which part of the block now feels physically intuitive, not just memorized?
- Where do you still confuse self-attention, cross-attention, and causal masking?
- What must feel automatic before starting Module 04 coding work?
Completion gate¶
- [ ] All 6 explainer chapters read at least once
- [ ] Can define the residual stream clearly
- [ ] Can draw the pre-norm transformer block without notes
- [ ] Can explain causal masking with a lower-triangular matrix
- [ ] Can explain KV cache latency benefits with a toy example
- [ ] Assignment completed and explained clearly
- [ ] Ready to move to Module 04