Home / AI Foundation / 03. Transformer Mechanics Transformer Mechanics¶ The chapters in this module, in reading order. # Chapter 00 Transformer architecture in kid words — the assembly line 01 The stacking failure — why depth without rails produces garbage 02 Residual connections — the shortcut pipe 03 The residual stream — the shared canvas 04 Layer normalization — the quality inspector 05 Pre-norm vs post-norm — where the inspector stands 06 The transformer block — two benches, one station 07 Attention inside the block — the social bench 08 The feed-forward network — the private bench 09 Encoder, decoder, encoder-decoder — three factory layouts 10 Causal masking — blocking the future 11 KV cache — not rewriting yesterday's notes 12 GQA and MQA — fewer notebooks for the same crews 13 Flash Attention — same answer, far less memory traffic 14 Honest admission — what this module glossed over