Skip to content

00. Agentic platform evaluation — First-principles overview

You can design an agent. Now someone hands you a budget and a deadline and asks: do we build this on a framework, a hyperscaler runtime, or a SaaS vertical agent? Choose wrong and the choice outlives the people who made it.


A retail bank wants an agent fleet: a customer-support agent that answers card, loan, and account questions across web and WhatsApp, plus an internal-ops agent that helps relationship managers pull KYC documents and draft compliance memos. The platform team builds the support agent on a SaaS vertical product because the demo was stunning and the sales engineer wired up a working bot in four days. Eight months later the bank needs the agent to call an internal fraud-scoring service that lives inside its own VPC, sign every model call for an audit the regulator demands, and fall back to a cheaper model for simple FAQ traffic that is now 70% of volume. None of these are possible. The vertical platform owns the orchestration, owns the model routing, owns the prompts, and stores conversation state in its own cloud in a region the bank's data-residency policy forbids. The bank is not blocked by a missing feature. It is blocked by a boundary it agreed to in month one without noticing.

That is the shape of every agentic-platform decision. You are not picking a tool. You are picking a boundary — a line between what you control and what the platform controls — and you are picking it under uncertainty, because the market reshuffles every quarter and the use case you scope today is not the use case you run in a year. The dominant pressure is fit before lock-in: match capability, cost, control, and portability to the real use case before the decision becomes load-bearing and expensive to reverse.

Three families of platform exist, and they trade the same four things in opposite directions. A hyperscaler runtime (AWS Bedrock AgentCore, Google Vertex AI Agent Builder / Agent Engine, Azure / Microsoft Foundry Agent Service) gives you managed infrastructure, deep cloud-native security, and model breadth, but binds you to that cloud's identity, networking, and billing. An AI-native framework (LangGraph, CrewAI, LlamaIndex, Microsoft Agent Framework, OpenAI Agents SDK) gives you maximum control and portability, but hands you the operational burden — you run the servers, wire the tracing, own the on-call. A SaaS vertical agent (Sierra, Decagon, Salesforce Agentforce, NiCE Cognigy, Glean) gives you fastest time-to-value and a packaged domain solution, but the least control and the deepest data gravity. The whole module is learning to read which boundary each one draws, and which one your use case can live behind.

The expensive mistake is not choosing the "wrong" platform in some absolute sense. There is no best platform. The expensive mistake is choosing without naming the boundary, the exit cost, and the cost curve at 100× volume — and discovering all three at once during an incident, a procurement review, or a budget meeting where the number is now ten times what the pilot suggested.


The recurring pressures and concepts

These names recur in every file. They are the levers you weigh in every platform decision.

Pressure / concept Meaning
fit-before-lock-in the dominant pressure: match capability, cost, control, portability to the use case before the decision hardens
the four levers capability, cost, control, portability — every platform trades these against each other
the three families hyperscaler runtime / AI-native framework / SaaS vertical agent — the structural choice
the boundary the line between what you control and what the platform controls; chosen on day one, felt on day two hundred
the wall the felt failure: a need the platform structurally cannot serve, discovered after you are committed
the capability rubric a weighted score across orchestration, tools, memory, RAG, multi-agent, observability, eval, human-in-the-loop, model flexibility
lock-in surface the specific things that are hard to move: orchestration code, prompt/tool formats, data, identity
exit cost what it actually costs in engineer-months to leave a platform; the number nobody computes until they must
the escape hatch the architecture choices that keep leaving cheap: portable formats, owned data, abstracted model calls
data gravity conversation logs, memory, and embeddings accumulate inside a platform and pull everything else toward it
the procurement gate the security, residency, audit, and isolation checks that kill a platform before a single agent ships
the 100× curve how cost behaves when volume grows two orders of magnitude; the three families diverge sharply here

Top resources

  • Anthropic — Building effective agents — https://www.anthropic.com/engineering/building-effective-agents
  • Amazon Bedrock AgentCore — overview & pricing — https://aws.amazon.com/bedrock/agentcore/ and https://aws.amazon.com/bedrock/agentcore/pricing/
  • Google Vertex AI Agent Builder / Agent Engine docs — https://docs.cloud.google.com/agent-builder
  • Microsoft Foundry Agent Service overview — https://learn.microsoft.com/en-us/azure/foundry/agents/overview
  • OpenAI AgentKit & Agents SDK — https://openai.com/index/introducing-agentkit/ and https://openai.github.io/openai-agents-python/
  • LangGraph docs — https://langchain-ai.github.io/langgraph/
  • Model Context Protocol — https://modelcontextprotocol.io/introduction

What's coming

  1. 01-build-vs-buy-decision.md — Framework vs managed platform vs SaaS vertical. The felt failure of hitting a wall you cannot patch, and the decision lens that prevents it.
  2. 02-platform-landscape-map.md — The three families with current real players and what each optimizes for. Reading the map before you walk it.
  3. 03-capability-evaluation-rubric.md — A weighted scoring rubric across nine capability axes. Turning "it felt good in the demo" into a defensible score.
  4. 04-lock-in-and-portability.md — Lock-in surfaces, exit cost, and the escape hatch. Keeping leaving cheap.
  5. 05-cost-and-scaling-model.md — Token + platform fees + infra, and how the three families' cost curves diverge at 10× and 100× volume.
  6. 06-security-governance-and-compliance.md — Data residency, PII, tenant isolation, audit, who owns the model call. The procurement gates that kill platforms early.
  7. 07-boundary-tradeoff-review.md — A fast-moving market, contested claims, and what to re-evaluate in six months.

Memory map

Concept Prerequisite Pressure family Recurs later as Layer touched
the three families agent topologies (module 01) fit-before-lock-in every later file scores or stresses them product → platform → cloud
build-vs-buy decision the three families control vs speed rubric weights, exit-cost math org → architecture
capability rubric build-vs-buy capability vs cost the score that procurement and cost re-use feature → score
lock-in & portability the boundary portability vs convenience the escape hatch in cost and security data → orchestration → identity
cost & scaling rubric + lock-in cost at scale the 100× curve that re-opens the decision usage → billing
security & governance lock-in (data) safety vs speed the gate that overrides the rubric data residency → audit
boundary review all of the above ambiguity the six-month re-evaluation habit market → roadmap

Three traversal paths use this map. Prerequisite path — read 00 → 07 in order; each file assumes the previous one. Failure path — when a platform hits a wall, jump to 01 (was build-vs-buy wrong?), 04 (is it lock-in?), or 06 (is it a governance gate?). Synthesis path — combine the rubric (03), the cost curve (05), and the exit cost (04) into one weighted decision; that synthesis is exactly the bank's choice walked in 03 and revisited in 07.

Bridge. Before we map the landscape or score a rubric, we have to make the single decision that frames all the rest: should this agent be built on a framework you own, a managed runtime you rent, or a SaaS vertical you configure? That choice sets the boundary every later file lives inside — so the next file walks the build-vs-buy decision and the wall a team hits when they get it wrong. → 01-build-vs-buy-decision.md