Home / AI Foundation / 06. Adaptation Compression Adaptation Compression¶ The chapters in this module, in reading order. # Chapter 00 Quantization & Fine-Tuning — The Five-Year-Old Version 01 Opening failure — when the giant model cannot enter the room 02 Number formats — how the same weight wears different clothing 03 Precision vs range — why bf16 survives the rough road 04 Quantization core — snapping rich numbers into tiny buckets 05 Per-tensor vs per-channel — one ruler for all, or one ruler per row 06 GPTQ — preserve the output, not the illusion 07 AWQ — protect what traffic actually uses 08 KV cache memory — the bill that grows with traffic 09 MQA and GQA — share the heavy memory parts 10 PagedAttention and serving — pack memory like a systems engineer 11 LoRA — thin adapters, not full rewrites 12 QLoRA — compressed base, tiny trainable overlay 13 Choosing the right lever — cheapest fix first 14 Honest admission — the parts we still learn by testing