AI Tooling Landscape — 2026 Reference¶
A flat reference (not a weekly module) to keep modern-framework fluency current. Lead AI Eng interviews probe whether you've heard of and tried the right tools — depth in one or two, vocabulary across the rest.
Tooling rots fast. Re-validate this list every quarter; flag dead links and add new entrants.
How to use¶
- Skim every section before any senior+ screen
- Pick one tool per section to actually use; vocabulary alone reads as cargo-cult
- If a tool's headline pitch surprises you, click in and read the docs
1. Orchestration / agent frameworks¶
| Framework | Style | When |
|---|---|---|
| LangChain | Composable chains and tools, Python + JS | Default if you want broad ecosystem and don't mind the abstraction tax |
| LangGraph | Stateful graphs (DAGs / state machines) for agents | Long-running agents, branching workflows, human-in-the-loop |
| LlamaIndex | RAG-first: ingestion, indexing, query engines | RAG-heavy applications |
| DSPy | Declarative programs over LLMs; auto-compiles prompts | Research / experimentation; eval-as-optimization |
| Pydantic AI | Type-safe agents with Pydantic models for tools and outputs | Python apps that already use Pydantic; structured output discipline |
| Vercel AI SDK | TypeScript-first; streaming UI primitives, hooks for React/Next/Svelte | Web apps with LLM features |
| OpenAI Agents SDK | OpenAI-native agent loop with tools, handoffs, guardrails | OpenAI-stack alignment |
| Anthropic Claude Agent SDK | Anthropic-native agents, tool use, computer use | Anthropic-stack alignment, especially for tool-heavy agents |
| Mastra | TypeScript-native agent framework with workflows, memory | TS / Node.js apps |
| CrewAI | Role-based multi-agent ("crews") | Multi-agent orchestration with role abstractions |
| AutoGen | Microsoft's multi-agent conversation framework | Research / multi-agent prototypes |
| Inngest / Trigger.dev | Durable workflows; not LLM-specific but used a lot for agents | Long-running background agents, retries, observability |
Default opinions: - Python web/agent → LangGraph or Pydantic AI - TypeScript web → Vercel AI SDK for UI + Mastra or LangGraph.js for agents - Multi-agent → CrewAI for fast prototypes; LangGraph for production - Closed-vendor stack → use the vendor's own SDK (OpenAI Agents, Anthropic Agent SDK)
Anti-pattern: picking the framework before you understand the problem. Most LLM apps need less framework than people think.
2. Eval and observability¶
| Tool | Style | When |
|---|---|---|
| LangSmith | LangChain-native tracing, eval, datasets | If you're on LangChain |
| Langfuse | Open-source observability + eval; framework-agnostic | OSS-friendly stack; self-hostable |
| Braintrust | Eval-first; great UI for golden sets and regressions | Eval-driven development |
| Helicone | Drop-in proxy for LLM calls — cost, latency, caching, logs | Quick observability with minimal integration |
| Promptfoo | CLI-first eval; YAML test definitions | Local/CI eval; comparison across models |
| OpenAI Evals | Reference eval framework | Reading prior art |
| Arize Phoenix | Open observability for LLM apps; OpenTelemetry-based | Already on OTel; want structured tracing |
| Confident AI / DeepEval | Pytest-style LLM tests | Engineering teams that prefer test code over UIs |
Practical pattern: 1. Helicone as a quick proxy for cost/latency visibility 2. Langfuse or Phoenix for structured tracing 3. Promptfoo or Braintrust for golden-set regression evals 4. DeepEval if you want pytest-style LLM testing
For interviews, name two of these and have a one-line tradeoff. "We used Langfuse for traces and Promptfoo for regression evals because [reason]" is a strong answer.
3. Local LLM tooling¶
| Tool | What | When |
|---|---|---|
| Ollama | Easy local LLM runner; single binary, model registry | "I want to run a model on my laptop" |
| llama.cpp | C++ inference for GGUF; runs on almost anything | On-device, embedded, low-end hardware |
| MLX (Apple) | Optimized for Apple Silicon GPU/ANE | Mac-native voice / agent products |
| GGUF / GGML | Quantized model file format | Distribution format for llama.cpp / LM Studio / Ollama |
| LM Studio | GUI on top of llama.cpp; model browser | Non-CLI users; ad-hoc local exploration |
| vLLM | Production server; PagedAttention, continuous batching | Self-hosted production LLMs |
| TGI (HF) | HF's production server | HF-aligned stack |
| Text Generation WebUI | Gradio UI for many backends | Hobbyist exploration |
| ExLlamaV2 | High-throughput GPTQ inference | Single-GPU max throughput on Llama-family |
| TensorRT-LLM | NVIDIA optimized inference | Lowest-latency NVIDIA deployments |
| MLC LLM | Cross-platform compile-once-run-anywhere (mobile, web, server) | Mobile / WASM deployment |
Quantization formats to know: - GGUF — llama.cpp ecosystem standard; Q4_K_M / Q5_K_M / Q8_0 are common - AWQ — activation-aware weight quantization; vLLM supports - GPTQ — older, still common; ExLlama supports - FP8 / INT8 / INT4 — newer; vLLM and TensorRT-LLM support; vendor-specific kernels
On-device strategy: 1. Pick a quantization (Q4_K_M or Q5_K_M for general; lower for memory-constrained) 2. Pick a runtime (whisper.cpp + llama.cpp on cross-platform; MLX on Apple Silicon) 3. Bundle the model with the app or fetch on first-run 4. Watch RAM, thermal, battery — these dominate UX
For voice agents especially, on-device STT (whisper.cpp / MLX-Whisper) + on-device LLM (Llama 3 8B Q4 / Phi-3.5 mini Q5) is increasingly viable on flagship phones and laptops.
4. Computer use and browser agents¶
The fastest-moving category in 2026. Agents that drive a browser, file system, or full computer.
| Tool | What | When |
|---|---|---|
| Anthropic Computer Use | Claude takes screenshots, clicks, types | Complex GUI automation; Anthropic stack |
| OpenAI Operator (CUA) | Browser-using agent | OpenAI stack |
| Browserbase | Managed headless browser infra for agents | Production browser agents at scale |
| Stagehand | TypeScript SDK over Playwright with LLM-native actions | TS-first browser automation |
| browser-use | Python library; works with any LLM | Python-first browser automation |
| Skyvern | Production browser-agent platform | Enterprise web automation |
| E2B | Cloud sandboxes (file system + shell) for agents | Code-execution agents |
| Modal Sandboxes | Modal's sandbox primitive | Modal-native code-execution agents |
| Daytona | Managed dev environments for agents | Long-running coding agents |
Common pattern: orchestration framework (LangGraph / Agents SDK) + execution sandbox (E2B / Browserbase) + LLM with tool use (Claude / GPT-4o).
Interview signal: can you articulate why browser agents are still flaky (state-tracking, latency per action, error recovery, cost per action) and what makes them work better (memory between actions, narrow scope, structured tool definitions)?
5. Multimodal model SOTA (as of 2026)¶
Vocabulary check. For each, know: input modalities, headline strength, where you'd reach for it.
| Model | Modalities | Strength |
|---|---|---|
| GPT-4o / 4.1 (OpenAI) | Text, image, audio in/out | Realtime audio, broad capability |
| Claude Sonnet / Opus 4 (Anthropic) | Text, image | Reasoning, long context, computer use |
| Gemini 2.5 Pro / Flash (Google) | Text, image, audio, video | Long context, video understanding |
| Llama 3.2 / 4 (Meta, open) | Text, image (3.2-V) | Open-weights flagship |
| Pixtral (Mistral, open) | Text, image | Open vision-language; strong on documents |
| Qwen 2.5 VL (Alibaba, open) | Text, image, video | Multilingual, video, very capable open VLM |
| Llava / Llava-NeXT (open) | Text, image | Reference open VLM family |
| Molmo (Allen AI, open) | Text, image | Open VLM with high transparency |
| DeepSeek-V3 / R1 (open) | Text (R1 = reasoning) | Strong open reasoning |
| o1 / o3 (OpenAI) | Text | Reasoning-tuned |
Image generation: GPT-4o image, Imagen, DALL-E 3, Flux, Stable Diffusion 3.5, Midjourney Video generation: Sora, Veo, Runway, Pika, Kling, Hunyuan Video, Wan Voice / speech-to-speech: GPT-4o Realtime, Gemini Live, Moshi, Sesame
Watch for capability drift; this section is the most likely to age. Re-check before any senior screen.
6. Vector stores¶
Quick reference (deeper coverage in Module 13-14).
| Store | When |
|---|---|
| pgvector | Already on Postgres; small to mid-scale |
| Pinecone | Managed, fast to ship, expensive at scale |
| Weaviate | Self-host or managed; hybrid search baked in |
| Qdrant | Self-host friendly; performant |
| FAISS | In-process library; not a service |
| Chroma | Embedded; great for prototypes |
| Turbopuffer | Low-cost serverless vector store |
| Vespa / Elasticsearch / Typesense | Search engines with vector support; hybrid retrieval |
Default: pgvector until you have a reason to leave.
7. Embedding models¶
| Family | Notes |
|---|---|
| OpenAI text-embedding-3-small / large | Default vendor; 1536 / 3072 dim |
| Cohere Embed v3 | Strong multilingual, classification flavors |
| Voyage | Domain-specific (code, finance, law) |
| BGE / BGE-M3 (open) | Strong open baseline; multi-functional |
| E5 (open) | Microsoft's open embeddings |
| Nomic Embed (open) | Open, strong on retrieval benchmarks |
| Jina Embeddings | Long-context embeddings |
| Sentence-Transformers | Library; many models |
Eval: MTEB leaderboard — re-check before choosing.
Pattern: start with OpenAI or Cohere for speed; switch to open (BGE-M3) when cost / latency / privacy demand it.
8. Prompt management and experimentation¶
| Tool | What | When |
|---|---|---|
| Promptlayer | Prompt versioning, A/B, cost | Teams managing many prompts |
| Humanloop | Prompt management + eval | Eval-driven prompt iteration |
| PromptHub | Git-style prompt versioning | If you want a "prompts repo" pattern |
| Custom (Git + your own loader) | Prompts as files in your repo | Most engineers; preferred for code-adjacent prompts |
Default: prompts as files in your repo, loaded by your code. Add a tool only when prompts need to ship without a deploy.
9. AI gateways and proxies¶
| Tool | What | When |
|---|---|---|
| OpenRouter | One API to many models | Vendor-agnostic experimentation |
| Together / Fireworks / Anyscale / Replicate | Hosted open-weights as APIs | Try open-weights without infra |
| LiteLLM | Library that abstracts provider differences | Code-side vendor portability |
| Portkey | Gateway with retries, fallbacks, caching, observability | Production LLM gateway |
| Helicone | Proxy + observability | Quick visibility |
| Cloudflare AI Gateway | Edge gateway with caching, observability | Cloudflare stack |
| Martian / Not Diamond | Routing models for cost/quality | Model-routing as a service |
Default opinion: start with LiteLLM in code for portability and Helicone as a proxy for observability. Add a heavier gateway (Portkey, custom) if production demands rate-shaping, fallbacks, secrets management at the edge.
How this fits the curriculum¶
| Section | Tightest module link |
|---|---|
| Orchestration / agents | 01_ai_engineering/01_agentic_system_design/, 16_multi_agent_coordination/ |
| Eval / observability | 04_ai_product_evals/00_ai_evals_release_gates/, 01_ai_engineering/03_agent_observability_debugging/ |
| Local LLM tooling | 00_ai_foundation/06_adaptation_compression/, 04_ml_platform_operations/ |
| Computer use / browser | 01_ai_engineering/01_agentic_system_design/, 16_multi_agent_coordination/ |
| Multimodal SOTA | 05_ai_specializations/01_multimodal_vision_systems/, 05_ai_specializations/02_diffusion_media_generation/, 05_ai_specializations/00_realtime_voice_agents/ |
| Vector stores / embeddings | 01_ai_engineering/08_rag_system_design/, 09_advanced_rag_patterns/ |
Maintenance protocol¶
- Quarterly: re-skim each section. Drop dead tools. Add prominent new ones.
- After every screen: add any tool you got asked about that wasn't here.
- Don't expand into deep tutorials. This is a map, not a textbook. Module folders hold depth.