Agent Performance Economics¶

The chapters in this module, in reading order.

#	Chapter
00	Cost & Latency Optimization for LLM Applications — The Five-Year-Old Version
01	Cost anatomy — count the whole workflow before optimizing the token price
02	Latency anatomy — separate first-token silence from total completion time
03	Prompt caching — design stable prefixes so the model stops rereading them
04	Model routing — match capability to task difficulty without hiding failure
05	Streaming first-token latency — make early progress useful, cancellable, and safe
06	Batching strategies — trade tiny waits for higher throughput only where the product allows it
07	KV cache optimization — memory, not math, often limits long conversations
08	Prompt compression — shrink context without deleting the reason the answer is correct
09	Output length control — stop paying for words the user did not need
10	Cost dashboards — make regressions visible before the invoice arrives
11	Edge and local deployment — move inference closer only when the constraints truly fit
12	Capacity planning — forecast tokens, peaks, limits, and headroom before launch week
13	Honest limits — optimization is empirical because workloads, models, and vendors move