Skip to content

AI Tooling Landscape — 2026 Reference

A flat reference (not a weekly module) to keep modern-framework fluency current. Lead AI Eng interviews probe whether you've heard of and tried the right tools — depth in one or two, vocabulary across the rest.

Tooling rots fast. Re-validate this list every quarter; flag dead links and add new entrants.

How to use

  • Skim every section before any senior+ screen
  • Pick one tool per section to actually use; vocabulary alone reads as cargo-cult
  • If a tool's headline pitch surprises you, click in and read the docs

1. Orchestration / agent frameworks

Framework Style When
LangChain Composable chains and tools, Python + JS Default if you want broad ecosystem and don't mind the abstraction tax
LangGraph Stateful graphs (DAGs / state machines) for agents Long-running agents, branching workflows, human-in-the-loop
LlamaIndex RAG-first: ingestion, indexing, query engines RAG-heavy applications
DSPy Declarative programs over LLMs; auto-compiles prompts Research / experimentation; eval-as-optimization
Pydantic AI Type-safe agents with Pydantic models for tools and outputs Python apps that already use Pydantic; structured output discipline
Vercel AI SDK TypeScript-first; streaming UI primitives, hooks for React/Next/Svelte Web apps with LLM features
OpenAI Agents SDK OpenAI-native agent loop with tools, handoffs, guardrails OpenAI-stack alignment
Anthropic Claude Agent SDK Anthropic-native agents, tool use, computer use Anthropic-stack alignment, especially for tool-heavy agents
Mastra TypeScript-native agent framework with workflows, memory TS / Node.js apps
CrewAI Role-based multi-agent ("crews") Multi-agent orchestration with role abstractions
AutoGen Microsoft's multi-agent conversation framework Research / multi-agent prototypes
Inngest / Trigger.dev Durable workflows; not LLM-specific but used a lot for agents Long-running background agents, retries, observability

Default opinions: - Python web/agent → LangGraph or Pydantic AI - TypeScript web → Vercel AI SDK for UI + Mastra or LangGraph.js for agents - Multi-agent → CrewAI for fast prototypes; LangGraph for production - Closed-vendor stack → use the vendor's own SDK (OpenAI Agents, Anthropic Agent SDK)

Anti-pattern: picking the framework before you understand the problem. Most LLM apps need less framework than people think.


2. Eval and observability

Tool Style When
LangSmith LangChain-native tracing, eval, datasets If you're on LangChain
Langfuse Open-source observability + eval; framework-agnostic OSS-friendly stack; self-hostable
Braintrust Eval-first; great UI for golden sets and regressions Eval-driven development
Helicone Drop-in proxy for LLM calls — cost, latency, caching, logs Quick observability with minimal integration
Promptfoo CLI-first eval; YAML test definitions Local/CI eval; comparison across models
OpenAI Evals Reference eval framework Reading prior art
Arize Phoenix Open observability for LLM apps; OpenTelemetry-based Already on OTel; want structured tracing
Confident AI / DeepEval Pytest-style LLM tests Engineering teams that prefer test code over UIs

Practical pattern: 1. Helicone as a quick proxy for cost/latency visibility 2. Langfuse or Phoenix for structured tracing 3. Promptfoo or Braintrust for golden-set regression evals 4. DeepEval if you want pytest-style LLM testing

For interviews, name two of these and have a one-line tradeoff. "We used Langfuse for traces and Promptfoo for regression evals because [reason]" is a strong answer.


3. Local LLM tooling

Tool What When
Ollama Easy local LLM runner; single binary, model registry "I want to run a model on my laptop"
llama.cpp C++ inference for GGUF; runs on almost anything On-device, embedded, low-end hardware
MLX (Apple) Optimized for Apple Silicon GPU/ANE Mac-native voice / agent products
GGUF / GGML Quantized model file format Distribution format for llama.cpp / LM Studio / Ollama
LM Studio GUI on top of llama.cpp; model browser Non-CLI users; ad-hoc local exploration
vLLM Production server; PagedAttention, continuous batching Self-hosted production LLMs
TGI (HF) HF's production server HF-aligned stack
Text Generation WebUI Gradio UI for many backends Hobbyist exploration
ExLlamaV2 High-throughput GPTQ inference Single-GPU max throughput on Llama-family
TensorRT-LLM NVIDIA optimized inference Lowest-latency NVIDIA deployments
MLC LLM Cross-platform compile-once-run-anywhere (mobile, web, server) Mobile / WASM deployment

Quantization formats to know: - GGUF — llama.cpp ecosystem standard; Q4_K_M / Q5_K_M / Q8_0 are common - AWQ — activation-aware weight quantization; vLLM supports - GPTQ — older, still common; ExLlama supports - FP8 / INT8 / INT4 — newer; vLLM and TensorRT-LLM support; vendor-specific kernels

On-device strategy: 1. Pick a quantization (Q4_K_M or Q5_K_M for general; lower for memory-constrained) 2. Pick a runtime (whisper.cpp + llama.cpp on cross-platform; MLX on Apple Silicon) 3. Bundle the model with the app or fetch on first-run 4. Watch RAM, thermal, battery — these dominate UX

For voice agents especially, on-device STT (whisper.cpp / MLX-Whisper) + on-device LLM (Llama 3 8B Q4 / Phi-3.5 mini Q5) is increasingly viable on flagship phones and laptops.


4. Computer use and browser agents

The fastest-moving category in 2026. Agents that drive a browser, file system, or full computer.

Tool What When
Anthropic Computer Use Claude takes screenshots, clicks, types Complex GUI automation; Anthropic stack
OpenAI Operator (CUA) Browser-using agent OpenAI stack
Browserbase Managed headless browser infra for agents Production browser agents at scale
Stagehand TypeScript SDK over Playwright with LLM-native actions TS-first browser automation
browser-use Python library; works with any LLM Python-first browser automation
Skyvern Production browser-agent platform Enterprise web automation
E2B Cloud sandboxes (file system + shell) for agents Code-execution agents
Modal Sandboxes Modal's sandbox primitive Modal-native code-execution agents
Daytona Managed dev environments for agents Long-running coding agents

Common pattern: orchestration framework (LangGraph / Agents SDK) + execution sandbox (E2B / Browserbase) + LLM with tool use (Claude / GPT-4o).

Interview signal: can you articulate why browser agents are still flaky (state-tracking, latency per action, error recovery, cost per action) and what makes them work better (memory between actions, narrow scope, structured tool definitions)?


5. Multimodal model SOTA (as of 2026)

Vocabulary check. For each, know: input modalities, headline strength, where you'd reach for it.

Model Modalities Strength
GPT-4o / 4.1 (OpenAI) Text, image, audio in/out Realtime audio, broad capability
Claude Sonnet / Opus 4 (Anthropic) Text, image Reasoning, long context, computer use
Gemini 2.5 Pro / Flash (Google) Text, image, audio, video Long context, video understanding
Llama 3.2 / 4 (Meta, open) Text, image (3.2-V) Open-weights flagship
Pixtral (Mistral, open) Text, image Open vision-language; strong on documents
Qwen 2.5 VL (Alibaba, open) Text, image, video Multilingual, video, very capable open VLM
Llava / Llava-NeXT (open) Text, image Reference open VLM family
Molmo (Allen AI, open) Text, image Open VLM with high transparency
DeepSeek-V3 / R1 (open) Text (R1 = reasoning) Strong open reasoning
o1 / o3 (OpenAI) Text Reasoning-tuned

Image generation: GPT-4o image, Imagen, DALL-E 3, Flux, Stable Diffusion 3.5, Midjourney Video generation: Sora, Veo, Runway, Pika, Kling, Hunyuan Video, Wan Voice / speech-to-speech: GPT-4o Realtime, Gemini Live, Moshi, Sesame

Watch for capability drift; this section is the most likely to age. Re-check before any senior screen.


6. Vector stores

Quick reference (deeper coverage in Module 13-14).

Store When
pgvector Already on Postgres; small to mid-scale
Pinecone Managed, fast to ship, expensive at scale
Weaviate Self-host or managed; hybrid search baked in
Qdrant Self-host friendly; performant
FAISS In-process library; not a service
Chroma Embedded; great for prototypes
Turbopuffer Low-cost serverless vector store
Vespa / Elasticsearch / Typesense Search engines with vector support; hybrid retrieval

Default: pgvector until you have a reason to leave.


7. Embedding models

Family Notes
OpenAI text-embedding-3-small / large Default vendor; 1536 / 3072 dim
Cohere Embed v3 Strong multilingual, classification flavors
Voyage Domain-specific (code, finance, law)
BGE / BGE-M3 (open) Strong open baseline; multi-functional
E5 (open) Microsoft's open embeddings
Nomic Embed (open) Open, strong on retrieval benchmarks
Jina Embeddings Long-context embeddings
Sentence-Transformers Library; many models

Eval: MTEB leaderboard — re-check before choosing.

Pattern: start with OpenAI or Cohere for speed; switch to open (BGE-M3) when cost / latency / privacy demand it.


8. Prompt management and experimentation

Tool What When
Promptlayer Prompt versioning, A/B, cost Teams managing many prompts
Humanloop Prompt management + eval Eval-driven prompt iteration
PromptHub Git-style prompt versioning If you want a "prompts repo" pattern
Custom (Git + your own loader) Prompts as files in your repo Most engineers; preferred for code-adjacent prompts

Default: prompts as files in your repo, loaded by your code. Add a tool only when prompts need to ship without a deploy.


9. AI gateways and proxies

Tool What When
OpenRouter One API to many models Vendor-agnostic experimentation
Together / Fireworks / Anyscale / Replicate Hosted open-weights as APIs Try open-weights without infra
LiteLLM Library that abstracts provider differences Code-side vendor portability
Portkey Gateway with retries, fallbacks, caching, observability Production LLM gateway
Helicone Proxy + observability Quick visibility
Cloudflare AI Gateway Edge gateway with caching, observability Cloudflare stack
Martian / Not Diamond Routing models for cost/quality Model-routing as a service

Default opinion: start with LiteLLM in code for portability and Helicone as a proxy for observability. Add a heavier gateway (Portkey, custom) if production demands rate-shaping, fallbacks, secrets management at the edge.


How this fits the curriculum

Section Tightest module link
Orchestration / agents 01_ai_engineering/01_agentic_system_design/, 16_multi_agent_coordination/
Eval / observability 04_ai_product_evals/00_ai_evals_release_gates/, 01_ai_engineering/03_agent_observability_debugging/
Local LLM tooling 00_ai_foundation/06_adaptation_compression/, 04_ml_platform_operations/
Computer use / browser 01_ai_engineering/01_agentic_system_design/, 16_multi_agent_coordination/
Multimodal SOTA 05_ai_specializations/01_multimodal_vision_systems/, 05_ai_specializations/02_diffusion_media_generation/, 05_ai_specializations/00_realtime_voice_agents/
Vector stores / embeddings 01_ai_engineering/08_rag_system_design/, 09_advanced_rag_patterns/

Maintenance protocol

  • Quarterly: re-skim each section. Drop dead tools. Add prominent new ones.
  • After every screen: add any tool you got asked about that wasn't here.
  • Don't expand into deep tutorials. This is a map, not a textbook. Module folders hold depth.