Home / Applied AI / 02. AI Infrastructure / 09. GPU Acceleration Stack GPU Acceleration Stack¶ The chapters in this module, in reading order. # Chapter 00 GPU acceleration & inference-serving stack — First-principles overview 01 GPU execution and the roofline — which ceiling are you actually hitting? 02 CUDA kernels and fusion — the tax you pay between operations 03 NCCL collectives and interconnect — the wire between GPUs is now the wall 04 TensorRT-LLM compilation — pay at build time so you don't pay every token 05 Triton Inference Server — the serving layer above the engine 06 NVIDIA NIM — when to take the prebuilt engine instead of building your own 07 NeMo customization — the training-side framework for the model NIM serves 08 GPU cluster scheduling and MIG — the idle GPU bills the same as the busy one 09 Boundaries and tradeoffs — what's physics, what's a vendor convention, and what it cost to forget the difference