Home / Applied AI / 02. AI Infrastructure / 05. Agent Performance Economics Agent Performance Economics¶ The chapters in this module, in reading order. # Chapter 00 Cost & Latency Optimization for LLM Applications — The Five-Year-Old Version 01 Cost anatomy — count the whole workflow before optimizing the token price 02 Latency anatomy — separate first-token silence from total completion time 03 Prompt caching — design stable prefixes so the model stops rereading them 04 Model routing — match capability to task difficulty without hiding failure 05 Streaming first-token latency — make early progress useful, cancellable, and safe 06 Batching strategies — trade tiny waits for higher throughput only where the product allows it 07 KV cache optimization — memory, not math, often limits long conversations 08 Prompt compression — shrink context without deleting the reason the answer is correct 09 Output length control — stop paying for words the user did not need 10 Cost dashboards — make regressions visible before the invoice arrives 11 Edge and local deployment — move inference closer only when the constraints truly fit 12 Capacity planning — forecast tokens, peaks, limits, and headroom before launch week 13 Honest limits — optimization is empirical because workloads, models, and vendors move