Tokens Embeddings Context¶

The chapters in this module, in reading order.

#	Chapter
00	Tokenization & attention in kid words — the office message room
01	The tokenizer failure — why naive splitters collapse on real text
02	Character vs word level — two extremes, both broken
03	Subword tokenization and BPE — the practical middle path
04	Embeddings — the badge board turns IDs into geometry
05	Positional encoding — the seat number that saves word order
06	RoPE and ALiBi — relative position for long context
07	Attention as soft lookup — the spotlight beam
08	Scaled dot-product attention — the scorecard math
09	Causal masking — blocking the future in decoders
10	Multi-head attention — parallel crews with different habits
11	The full pipeline — raw text to contextual vectors
12	WordPiece and Unigram — same destination, different training logic
13	Cross-attention — one sequence consulting another
14	Honest admission — what still feels unsolved