Skip to content

06. Module 03 Review — Transformer Architecture

Focus: residual stream, residual connections, layer normalization, pre-norm transformer block, encoder-decoder vs decoder-only, causal masking, and KV cache.

Review loop

  1. Skim the TOC in 02_explainer.md, then re-read any weak chapter.
  2. Re-answer the self-check questions in 01_weekly_plan.md without notes.
  3. Re-do the hardest prompts in 04_daily_recall.md aloud.
  4. Draw the pre-norm block from explainer §4.2 on blank paper.
  5. Rebuild the failure-fix table from explainer §6.1 with at least eight rows.
  6. Re-open 05_hands_on_lab.md and confirm your code matches the conceptual diagram.

Reflection

  • Which part of the block now feels physically intuitive, not just memorized?
  • Where do you still confuse self-attention, cross-attention, and causal masking?
  • What must feel automatic before starting Module 04 coding work?

Completion gate

  • [ ] All 6 explainer chapters read at least once
  • [ ] Can define the residual stream clearly
  • [ ] Can draw the pre-norm transformer block without notes
  • [ ] Can explain causal masking with a lower-triangular matrix
  • [ ] Can explain KV cache latency benefits with a toy example
  • [ ] Assignment completed and explained clearly
  • [ ] Ready to move to Module 04