00. Agentic platform evaluation — First-principles overview¶

You can design an agent. Now someone hands you a budget and a deadline and asks: do we build this on a framework, a hyperscaler runtime, or a SaaS vertical agent? Choose wrong and the choice outlives the people who made it.

A retail bank wants an agent fleet: a customer-support agent that answers card, loan, and account questions across web and WhatsApp, plus an internal-ops agent that helps relationship managers pull KYC documents and draft compliance memos. The platform team builds the support agent on a SaaS vertical product because the demo was stunning and the sales engineer wired up a working bot in four days. Eight months later the bank needs the agent to call an internal fraud-scoring service that lives inside its own VPC, sign every model call for an audit the regulator demands, and fall back to a cheaper model for simple FAQ traffic that is now 70% of volume. None of these are possible. The vertical platform owns the orchestration, owns the model routing, owns the prompts, and stores conversation state in its own cloud in a region the bank's data-residency policy forbids. The bank is not blocked by a missing feature. It is blocked by a boundary it agreed to in month one without noticing.

That is the shape of every agentic-platform decision. You are not picking a tool. You are picking a boundary — a line between what you control and what the platform controls — and you are picking it under uncertainty, because the market reshuffles every quarter and the use case you scope today is not the use case you run in a year. The dominant pressure is fit before lock-in: match capability, cost, control, and portability to the real use case before the decision becomes load-bearing and expensive to reverse.

Three families of platform exist, and they trade the same four things in opposite directions. A hyperscaler runtime (AWS Bedrock AgentCore, Google Vertex AI Agent Builder / Agent Engine, Azure / Microsoft Foundry Agent Service) gives you managed infrastructure, deep cloud-native security, and model breadth, but binds you to that cloud's identity, networking, and billing. An AI-native framework (LangGraph, CrewAI, LlamaIndex, Microsoft Agent Framework, OpenAI Agents SDK) gives you maximum control and portability, but hands you the operational burden — you run the servers, wire the tracing, own the on-call. A SaaS vertical agent (Sierra, Decagon, Salesforce Agentforce, NiCE Cognigy, Glean) gives you fastest time-to-value and a packaged domain solution, but the least control and the deepest data gravity. The whole module is learning to read which boundary each one draws, and which one your use case can live behind.

The expensive mistake is not choosing the "wrong" platform in some absolute sense. There is no best platform. The expensive mistake is choosing without naming the boundary, the exit cost, and the cost curve at 100× volume — and discovering all three at once during an incident, a procurement review, or a budget meeting where the number is now ten times what the pilot suggested.

The recurring pressures and concepts¶

These names recur in every file. They are the levers you weigh in every platform decision.

Pressure / concept	Meaning
fit-before-lock-in	the dominant pressure: match capability, cost, control, portability to the use case before the decision hardens
the four levers	capability, cost, control, portability — every platform trades these against each other
the three families	hyperscaler runtime / AI-native framework / SaaS vertical agent — the structural choice
the boundary	the line between what you control and what the platform controls; chosen on day one, felt on day two hundred
the wall	the felt failure: a need the platform structurally cannot serve, discovered after you are committed
the capability rubric	a weighted score across orchestration, tools, memory, RAG, multi-agent, observability, eval, human-in-the-loop, model flexibility
lock-in surface	the specific things that are hard to move: orchestration code, prompt/tool formats, data, identity
exit cost	what it actually costs in engineer-months to leave a platform; the number nobody computes until they must
the escape hatch	the architecture choices that keep leaving cheap: portable formats, owned data, abstracted model calls
data gravity	conversation logs, memory, and embeddings accumulate inside a platform and pull everything else toward it
the procurement gate	the security, residency, audit, and isolation checks that kill a platform before a single agent ships
the 100× curve	how cost behaves when volume grows two orders of magnitude; the three families diverge sharply here

Top resources¶

Anthropic — Building effective agents — https://www.anthropic.com/engineering/building-effective-agents
Amazon Bedrock AgentCore — overview & pricing — https://aws.amazon.com/bedrock/agentcore/ and https://aws.amazon.com/bedrock/agentcore/pricing/
Google Vertex AI Agent Builder / Agent Engine docs — https://docs.cloud.google.com/agent-builder
Microsoft Foundry Agent Service overview — https://learn.microsoft.com/en-us/azure/foundry/agents/overview
OpenAI AgentKit & Agents SDK — https://openai.com/index/introducing-agentkit/ and https://openai.github.io/openai-agents-python/
LangGraph docs — https://langchain-ai.github.io/langgraph/
Model Context Protocol — https://modelcontextprotocol.io/introduction

What's coming¶

01-build-vs-buy-decision.md — Framework vs managed platform vs SaaS vertical. The felt failure of hitting a wall you cannot patch, and the decision lens that prevents it.
02-platform-landscape-map.md — The three families with current real players and what each optimizes for. Reading the map before you walk it.
03-capability-evaluation-rubric.md — A weighted scoring rubric across nine capability axes. Turning "it felt good in the demo" into a defensible score.
04-lock-in-and-portability.md — Lock-in surfaces, exit cost, and the escape hatch. Keeping leaving cheap.
05-cost-and-scaling-model.md — Token + platform fees + infra, and how the three families' cost curves diverge at 10× and 100× volume.
06-security-governance-and-compliance.md — Data residency, PII, tenant isolation, audit, who owns the model call. The procurement gates that kill platforms early.
07-boundary-tradeoff-review.md — A fast-moving market, contested claims, and what to re-evaluate in six months.

Memory map¶

Concept	Prerequisite	Pressure family	Recurs later as	Layer touched
the three families	agent topologies (module 01)	fit-before-lock-in	every later file scores or stresses them	product → platform → cloud
build-vs-buy decision	the three families	control vs speed	rubric weights, exit-cost math	org → architecture
capability rubric	build-vs-buy	capability vs cost	the score that procurement and cost re-use	feature → score
lock-in & portability	the boundary	portability vs convenience	the escape hatch in cost and security	data → orchestration → identity
cost & scaling	rubric + lock-in	cost at scale	the 100× curve that re-opens the decision	usage → billing
security & governance	lock-in (data)	safety vs speed	the gate that overrides the rubric	data residency → audit
boundary review	all of the above	ambiguity	the six-month re-evaluation habit	market → roadmap

Three traversal paths use this map. Prerequisite path — read 00 → 07 in order; each file assumes the previous one. Failure path — when a platform hits a wall, jump to 01 (was build-vs-buy wrong?), 04 (is it lock-in?), or 06 (is it a governance gate?). Synthesis path — combine the rubric (03), the cost curve (05), and the exit cost (04) into one weighted decision; that synthesis is exactly the bank's choice walked in 03 and revisited in 07.

Bridge. Before we map the landscape or score a rubric, we have to make the single decision that frames all the rest: should this agent be built on a framework you own, a managed runtime you rent, or a SaaS vertical you configure? That choice sets the boundary every later file lives inside — so the next file walks the build-vs-buy decision and the wall a team hits when they get it wrong. → 01-build-vs-buy-decision.md