Adaptation Compression¶

The chapters in this module, in reading order.

#	Chapter
00	Quantization & Fine-Tuning — The Five-Year-Old Version
01	Opening failure — when the giant model cannot enter the room
02	Number formats — how the same weight wears different clothing
03	Precision vs range — why bf16 survives the rough road
04	Quantization core — snapping rich numbers into tiny buckets
05	Per-tensor vs per-channel — one ruler for all, or one ruler per row
06	GPTQ — preserve the output, not the illusion
07	AWQ — protect what traffic actually uses
08	KV cache memory — the bill that grows with traffic
09	MQA and GQA — share the heavy memory parts
10	PagedAttention and serving — pack memory like a systems engineer
11	LoRA — thin adapters, not full rewrites
12	QLoRA — compressed base, tiny trainable overlay
13	Choosing the right lever — cheapest fix first
14	Honest admission — the parts we still learn by testing