AI Tooling Landscape — 2026 Reference¶

A flat reference (not a weekly module) to keep modern-framework fluency current. Lead AI Eng interviews probe whether you've heard of and tried the right tools — depth in one or two, vocabulary across the rest.

Tooling rots fast. Re-validate this list every quarter; flag dead links and add new entrants.

How to use¶

Skim every section before any senior+ screen
Pick one tool per section to actually use; vocabulary alone reads as cargo-cult
If a tool's headline pitch surprises you, click in and read the docs

1. Orchestration / agent frameworks¶

Framework	Style	When
LangChain	Composable chains and tools, Python + JS	Default if you want broad ecosystem and don't mind the abstraction tax
LangGraph	Stateful graphs (DAGs / state machines) for agents	Long-running agents, branching workflows, human-in-the-loop
LlamaIndex	RAG-first: ingestion, indexing, query engines	RAG-heavy applications
DSPy	Declarative programs over LLMs; auto-compiles prompts	Research / experimentation; eval-as-optimization
Pydantic AI	Type-safe agents with Pydantic models for tools and outputs	Python apps that already use Pydantic; structured output discipline
Vercel AI SDK	TypeScript-first; streaming UI primitives, hooks for React/Next/Svelte	Web apps with LLM features
OpenAI Agents SDK	OpenAI-native agent loop with tools, handoffs, guardrails	OpenAI-stack alignment
Anthropic Claude Agent SDK	Anthropic-native agents, tool use, computer use	Anthropic-stack alignment, especially for tool-heavy agents
Mastra	TypeScript-native agent framework with workflows, memory	TS / Node.js apps
CrewAI	Role-based multi-agent ("crews")	Multi-agent orchestration with role abstractions
AutoGen	Microsoft's multi-agent conversation framework	Research / multi-agent prototypes
Inngest / Trigger.dev	Durable workflows; not LLM-specific but used a lot for agents	Long-running background agents, retries, observability

Default opinions: - Python web/agent → LangGraph or Pydantic AI - TypeScript web → Vercel AI SDK for UI + Mastra or LangGraph.js for agents - Multi-agent → CrewAI for fast prototypes; LangGraph for production - Closed-vendor stack → use the vendor's own SDK (OpenAI Agents, Anthropic Agent SDK)

Anti-pattern: picking the framework before you understand the problem. Most LLM apps need less framework than people think.

2. Eval and observability¶

Tool	Style	When
LangSmith	LangChain-native tracing, eval, datasets	If you're on LangChain
Langfuse	Open-source observability + eval; framework-agnostic	OSS-friendly stack; self-hostable
Braintrust	Eval-first; great UI for golden sets and regressions	Eval-driven development
Helicone	Drop-in proxy for LLM calls — cost, latency, caching, logs	Quick observability with minimal integration
Promptfoo	CLI-first eval; YAML test definitions	Local/CI eval; comparison across models
OpenAI Evals	Reference eval framework	Reading prior art
Arize Phoenix	Open observability for LLM apps; OpenTelemetry-based	Already on OTel; want structured tracing
Confident AI / DeepEval	Pytest-style LLM tests	Engineering teams that prefer test code over UIs

Practical pattern: 1. Helicone as a quick proxy for cost/latency visibility 2. Langfuse or Phoenix for structured tracing 3. Promptfoo or Braintrust for golden-set regression evals 4. DeepEval if you want pytest-style LLM testing

For interviews, name two of these and have a one-line tradeoff. "We used Langfuse for traces and Promptfoo for regression evals because [reason]" is a strong answer.

3. Local LLM tooling¶

Tool	What	When
Ollama	Easy local LLM runner; single binary, model registry	"I want to run a model on my laptop"
llama.cpp	C++ inference for GGUF; runs on almost anything	On-device, embedded, low-end hardware
MLX (Apple)	Optimized for Apple Silicon GPU/ANE	Mac-native voice / agent products
GGUF / GGML	Quantized model file format	Distribution format for llama.cpp / LM Studio / Ollama
LM Studio	GUI on top of llama.cpp; model browser	Non-CLI users; ad-hoc local exploration
vLLM	Production server; PagedAttention, continuous batching	Self-hosted production LLMs
TGI (HF)	HF's production server	HF-aligned stack
Text Generation WebUI	Gradio UI for many backends	Hobbyist exploration
ExLlamaV2	High-throughput GPTQ inference	Single-GPU max throughput on Llama-family
TensorRT-LLM	NVIDIA optimized inference	Lowest-latency NVIDIA deployments
MLC LLM	Cross-platform compile-once-run-anywhere (mobile, web, server)	Mobile / WASM deployment

Quantization formats to know: - GGUF — llama.cpp ecosystem standard; Q4_K_M / Q5_K_M / Q8_0 are common - AWQ — activation-aware weight quantization; vLLM supports - GPTQ — older, still common; ExLlama supports - FP8 / INT8 / INT4 — newer; vLLM and TensorRT-LLM support; vendor-specific kernels

On-device strategy: 1. Pick a quantization (Q4_K_M or Q5_K_M for general; lower for memory-constrained) 2. Pick a runtime (whisper.cpp + llama.cpp on cross-platform; MLX on Apple Silicon) 3. Bundle the model with the app or fetch on first-run 4. Watch RAM, thermal, battery — these dominate UX

For voice agents especially, on-device STT (whisper.cpp / MLX-Whisper) + on-device LLM (Llama 3 8B Q4 / Phi-3.5 mini Q5) is increasingly viable on flagship phones and laptops.

4. Computer use and browser agents¶

The fastest-moving category in 2026. Agents that drive a browser, file system, or full computer.

Tool	What	When
Anthropic Computer Use	Claude takes screenshots, clicks, types	Complex GUI automation; Anthropic stack
OpenAI Operator (CUA)	Browser-using agent	OpenAI stack
Browserbase	Managed headless browser infra for agents	Production browser agents at scale
Stagehand	TypeScript SDK over Playwright with LLM-native actions	TS-first browser automation
browser-use	Python library; works with any LLM	Python-first browser automation
Skyvern	Production browser-agent platform	Enterprise web automation
E2B	Cloud sandboxes (file system + shell) for agents	Code-execution agents
Modal Sandboxes	Modal's sandbox primitive	Modal-native code-execution agents
Daytona	Managed dev environments for agents	Long-running coding agents

Common pattern: orchestration framework (LangGraph / Agents SDK) + execution sandbox (E2B / Browserbase) + LLM with tool use (Claude / GPT-4o).

Interview signal: can you articulate why browser agents are still flaky (state-tracking, latency per action, error recovery, cost per action) and what makes them work better (memory between actions, narrow scope, structured tool definitions)?

5. Multimodal model SOTA (as of 2026)¶

Vocabulary check. For each, know: input modalities, headline strength, where you'd reach for it.

Model	Modalities	Strength
GPT-4o / 4.1 (OpenAI)	Text, image, audio in/out	Realtime audio, broad capability
Claude Sonnet / Opus 4 (Anthropic)	Text, image	Reasoning, long context, computer use
Gemini 2.5 Pro / Flash (Google)	Text, image, audio, video	Long context, video understanding
Llama 3.2 / 4 (Meta, open)	Text, image (3.2-V)	Open-weights flagship
Pixtral (Mistral, open)	Text, image	Open vision-language; strong on documents
Qwen 2.5 VL (Alibaba, open)	Text, image, video	Multilingual, video, very capable open VLM
Llava / Llava-NeXT (open)	Text, image	Reference open VLM family
Molmo (Allen AI, open)	Text, image	Open VLM with high transparency
DeepSeek-V3 / R1 (open)	Text (R1 = reasoning)	Strong open reasoning
o1 / o3 (OpenAI)	Text	Reasoning-tuned

Image generation: GPT-4o image, Imagen, DALL-E 3, Flux, Stable Diffusion 3.5, Midjourney Video generation: Sora, Veo, Runway, Pika, Kling, Hunyuan Video, Wan Voice / speech-to-speech: GPT-4o Realtime, Gemini Live, Moshi, Sesame

Watch for capability drift; this section is the most likely to age. Re-check before any senior screen.

6. Vector stores¶

Quick reference (deeper coverage in Module 13-14).

Store	When
pgvector	Already on Postgres; small to mid-scale
Pinecone	Managed, fast to ship, expensive at scale
Weaviate	Self-host or managed; hybrid search baked in
Qdrant	Self-host friendly; performant
FAISS	In-process library; not a service
Chroma	Embedded; great for prototypes
Turbopuffer	Low-cost serverless vector store
Vespa / Elasticsearch / Typesense	Search engines with vector support; hybrid retrieval

Default: pgvector until you have a reason to leave.

7. Embedding models¶

Family	Notes
OpenAI text-embedding-3-small / large	Default vendor; 1536 / 3072 dim
Cohere Embed v3	Strong multilingual, classification flavors
Voyage	Domain-specific (code, finance, law)
BGE / BGE-M3 (open)	Strong open baseline; multi-functional
E5 (open)	Microsoft's open embeddings
Nomic Embed (open)	Open, strong on retrieval benchmarks
Jina Embeddings	Long-context embeddings
Sentence-Transformers	Library; many models

Eval: MTEB leaderboard — re-check before choosing.

Pattern: start with OpenAI or Cohere for speed; switch to open (BGE-M3) when cost / latency / privacy demand it.

8. Prompt management and experimentation¶

Tool	What	When
Promptlayer	Prompt versioning, A/B, cost	Teams managing many prompts
Humanloop	Prompt management + eval	Eval-driven prompt iteration
PromptHub	Git-style prompt versioning	If you want a "prompts repo" pattern
Custom (Git + your own loader)	Prompts as files in your repo	Most engineers; preferred for code-adjacent prompts

Default: prompts as files in your repo, loaded by your code. Add a tool only when prompts need to ship without a deploy.

9. AI gateways and proxies¶

Tool	What	When
OpenRouter	One API to many models	Vendor-agnostic experimentation
Together / Fireworks / Anyscale / Replicate	Hosted open-weights as APIs	Try open-weights without infra
LiteLLM	Library that abstracts provider differences	Code-side vendor portability
Portkey	Gateway with retries, fallbacks, caching, observability	Production LLM gateway
Helicone	Proxy + observability	Quick visibility
Cloudflare AI Gateway	Edge gateway with caching, observability	Cloudflare stack
Martian / Not Diamond	Routing models for cost/quality	Model-routing as a service

Default opinion: start with LiteLLM in code for portability and Helicone as a proxy for observability. Add a heavier gateway (Portkey, custom) if production demands rate-shaping, fallbacks, secrets management at the edge.

How this fits the curriculum¶

Section	Tightest module link
Orchestration / agents	`01_ai_engineering/01_agentic_system_design/`, `16_multi_agent_coordination/`
Eval / observability	`04_ai_product_evals/00_ai_evals_release_gates/`, `01_ai_engineering/03_agent_observability_debugging/`
Local LLM tooling	`00_ai_foundation/06_adaptation_compression/`, `04_ml_platform_operations/`
Computer use / browser	`01_ai_engineering/01_agentic_system_design/`, `16_multi_agent_coordination/`
Multimodal SOTA	`05_ai_specializations/01_multimodal_vision_systems/`, `05_ai_specializations/02_diffusion_media_generation/`, `05_ai_specializations/00_realtime_voice_agents/`
Vector stores / embeddings	`01_ai_engineering/08_rag_system_design/`, `09_advanced_rag_patterns/`

Maintenance protocol¶

Quarterly: re-skim each section. Drop dead tools. Add prominent new ones.
After every screen: add any tool you got asked about that wasn't here.
Don't expand into deep tutorials. This is a map, not a textbook. Module folders hold depth.