00. Agentic platform evaluation — First-principles overview¶
You can design an agent. Now someone hands you a budget and a deadline and asks: do we build this on a framework, a hyperscaler runtime, or a SaaS vertical agent? Choose wrong and the choice outlives the people who made it.
A retail bank wants an agent fleet: a customer-support agent that answers card, loan, and account questions across web and WhatsApp, plus an internal-ops agent that helps relationship managers pull KYC documents and draft compliance memos. The platform team builds the support agent on a SaaS vertical product because the demo was stunning and the sales engineer wired up a working bot in four days. Eight months later the bank needs the agent to call an internal fraud-scoring service that lives inside its own VPC, sign every model call for an audit the regulator demands, and fall back to a cheaper model for simple FAQ traffic that is now 70% of volume. None of these are possible. The vertical platform owns the orchestration, owns the model routing, owns the prompts, and stores conversation state in its own cloud in a region the bank's data-residency policy forbids. The bank is not blocked by a missing feature. It is blocked by a boundary it agreed to in month one without noticing.
That is the shape of every agentic-platform decision. You are not picking a tool. You are picking a boundary — a line between what you control and what the platform controls — and you are picking it under uncertainty, because the market reshuffles every quarter and the use case you scope today is not the use case you run in a year. The dominant pressure is fit before lock-in: match capability, cost, control, and portability to the real use case before the decision becomes load-bearing and expensive to reverse.
Three families of platform exist, and they trade the same four things in opposite directions. A hyperscaler runtime (AWS Bedrock AgentCore, Google Vertex AI Agent Builder / Agent Engine, Azure / Microsoft Foundry Agent Service) gives you managed infrastructure, deep cloud-native security, and model breadth, but binds you to that cloud's identity, networking, and billing. An AI-native framework (LangGraph, CrewAI, LlamaIndex, Microsoft Agent Framework, OpenAI Agents SDK) gives you maximum control and portability, but hands you the operational burden — you run the servers, wire the tracing, own the on-call. A SaaS vertical agent (Sierra, Decagon, Salesforce Agentforce, NiCE Cognigy, Glean) gives you fastest time-to-value and a packaged domain solution, but the least control and the deepest data gravity. The whole module is learning to read which boundary each one draws, and which one your use case can live behind.
The expensive mistake is not choosing the "wrong" platform in some absolute sense. There is no best platform. The expensive mistake is choosing without naming the boundary, the exit cost, and the cost curve at 100× volume — and discovering all three at once during an incident, a procurement review, or a budget meeting where the number is now ten times what the pilot suggested.
The recurring pressures and concepts¶
These names recur in every file. They are the levers you weigh in every platform decision.
| Pressure / concept | Meaning |
|---|---|
| fit-before-lock-in | the dominant pressure: match capability, cost, control, portability to the use case before the decision hardens |
| the four levers | capability, cost, control, portability — every platform trades these against each other |
| the three families | hyperscaler runtime / AI-native framework / SaaS vertical agent — the structural choice |
| the boundary | the line between what you control and what the platform controls; chosen on day one, felt on day two hundred |
| the wall | the felt failure: a need the platform structurally cannot serve, discovered after you are committed |
| the capability rubric | a weighted score across orchestration, tools, memory, RAG, multi-agent, observability, eval, human-in-the-loop, model flexibility |
| lock-in surface | the specific things that are hard to move: orchestration code, prompt/tool formats, data, identity |
| exit cost | what it actually costs in engineer-months to leave a platform; the number nobody computes until they must |
| the escape hatch | the architecture choices that keep leaving cheap: portable formats, owned data, abstracted model calls |
| data gravity | conversation logs, memory, and embeddings accumulate inside a platform and pull everything else toward it |
| the procurement gate | the security, residency, audit, and isolation checks that kill a platform before a single agent ships |
| the 100× curve | how cost behaves when volume grows two orders of magnitude; the three families diverge sharply here |
Top resources¶
- Anthropic — Building effective agents — https://www.anthropic.com/engineering/building-effective-agents
- Amazon Bedrock AgentCore — overview & pricing — https://aws.amazon.com/bedrock/agentcore/ and https://aws.amazon.com/bedrock/agentcore/pricing/
- Google Vertex AI Agent Builder / Agent Engine docs — https://docs.cloud.google.com/agent-builder
- Microsoft Foundry Agent Service overview — https://learn.microsoft.com/en-us/azure/foundry/agents/overview
- OpenAI AgentKit & Agents SDK — https://openai.com/index/introducing-agentkit/ and https://openai.github.io/openai-agents-python/
- LangGraph docs — https://langchain-ai.github.io/langgraph/
- Model Context Protocol — https://modelcontextprotocol.io/introduction
What's coming¶
- 01-build-vs-buy-decision.md — Framework vs managed platform vs SaaS vertical. The felt failure of hitting a wall you cannot patch, and the decision lens that prevents it.
- 02-platform-landscape-map.md — The three families with current real players and what each optimizes for. Reading the map before you walk it.
- 03-capability-evaluation-rubric.md — A weighted scoring rubric across nine capability axes. Turning "it felt good in the demo" into a defensible score.
- 04-lock-in-and-portability.md — Lock-in surfaces, exit cost, and the escape hatch. Keeping leaving cheap.
- 05-cost-and-scaling-model.md — Token + platform fees + infra, and how the three families' cost curves diverge at 10× and 100× volume.
- 06-security-governance-and-compliance.md — Data residency, PII, tenant isolation, audit, who owns the model call. The procurement gates that kill platforms early.
- 07-boundary-tradeoff-review.md — A fast-moving market, contested claims, and what to re-evaluate in six months.
Memory map¶
| Concept | Prerequisite | Pressure family | Recurs later as | Layer touched |
|---|---|---|---|---|
| the three families | agent topologies (module 01) | fit-before-lock-in | every later file scores or stresses them | product → platform → cloud |
| build-vs-buy decision | the three families | control vs speed | rubric weights, exit-cost math | org → architecture |
| capability rubric | build-vs-buy | capability vs cost | the score that procurement and cost re-use | feature → score |
| lock-in & portability | the boundary | portability vs convenience | the escape hatch in cost and security | data → orchestration → identity |
| cost & scaling | rubric + lock-in | cost at scale | the 100× curve that re-opens the decision | usage → billing |
| security & governance | lock-in (data) | safety vs speed | the gate that overrides the rubric | data residency → audit |
| boundary review | all of the above | ambiguity | the six-month re-evaluation habit | market → roadmap |
Three traversal paths use this map. Prerequisite path — read 00 → 07 in order; each file assumes the previous one. Failure path — when a platform hits a wall, jump to 01 (was build-vs-buy wrong?), 04 (is it lock-in?), or 06 (is it a governance gate?). Synthesis path — combine the rubric (03), the cost curve (05), and the exit cost (04) into one weighted decision; that synthesis is exactly the bank's choice walked in 03 and revisited in 07.
Bridge. Before we map the landscape or score a rubric, we have to make the single decision that frames all the rest: should this agent be built on a framework you own, a managed runtime you rent, or a SaaS vertical you configure? That choice sets the boundary every later file lives inside — so the next file walks the build-vs-buy decision and the wall a team hits when they get it wrong. → 01-build-vs-buy-decision.md