13. Honest admission¶
~7 min read. Twelve chapters of sandbox. None of them eliminate the problem. This chapter is the calibrated list of what the sandbox cannot fix, where the community is young, and the limits a thoughtful lead should be transparent about.
Continues from
12-architect-checklist.md. The previous chapters built confidence; this one is the counterweight. The sandbox is load-bearing; the discipline has boundaries.
The tool execution sandbox is the production boundary between AI's reach and what AI can affect. It bounds blast radius, encodes least privilege, and makes the worst case survivable. None of that makes it complete.
1 — The sandbox does not improve the AI¶
A well-designed sandbox catches and contains tool-execution incidents. It does not make the model behave better. Model quality, prompt design, safety training — none of these are improved by the sandbox. The sandbox makes the AI's failures recoverable; the AI's own quality is the work of the AI engineering modules. Teams that hope the sandbox compensates for fragile model behaviour will be disappointed.
2 — Sandboxes are escaped¶
Every isolation technology has had escapes. Containers, hypervisors, language sandboxes — each has accumulated CVE history. The discipline is not "the sandbox prevents all escapes" but "the sandbox raises the cost of escape and the monitoring catches the escape that succeeds." Promising un-escapable sandboxes is a promise that will not hold.
3 — Defence-in-depth has bounded depth¶
Every additional layer adds cost — latency, memory, operational complexity, maintenance burden. Beyond a certain depth, the marginal cost exceeds the marginal value. The lead's job is to balance depth against cost per tool, not to maximise depth indefinitely. A 5-layer sandbox is not 5× better than a 3-layer one; the value tapers.
4 — The credential broker is a single point of policy¶
The broker enforces credential policy centrally. A misconfiguration in the broker affects every tool that uses it. The broker's own change management must match its centrality — review, canary rollout, testing — or the broker becomes the platform's single failure point. Centralisation is power; centralisation is risk.
5 — Approval gates depend on user attention¶
The approval layer is structural only when users actually read approvals. Approval fatigue degrades the signal: users click through without reviewing. The apparatus can detect fatigue (rubber-stamp rate, time-to-approval, post-approval incident rate) but cannot fully prevent it. Some decisions a user clicks "approve" on will be wrong; the sandbox's other layers must catch those.
6 — Output validation is imperfect¶
Schema validation catches malformed outputs. Size caps catch bloat. Content sanitisation catches known patterns. Provenance tags inform model behaviour. None of these is complete. A determined adversary can craft outputs that fit the schema, evade sanitisation, and convince the model. The discipline is depth — multiple imperfect defences combine to raise the difficulty — not any one perfect defence.
7 — Multi-tenant isolation depends on the runtime's correctness¶
Per-tenant scoping is enforced by the runtime. If the runtime has a bug — a path canonicalisation error, a cache key omission, a credential broker mistake — tenant isolation can fail in subtle ways. The discipline is to treat the runtime as load-bearing code, with the same rigour as the platform's core services. A bug in the runtime is a bug in the isolation guarantee.
8 — The sandbox does not eliminate side channels¶
Hardware-level side channels (cache timing, branch prediction, contention) leak some information across the isolation boundary. Defences are imperfect: microcode updates, co-location policy, cache partitioning. A determined attacker with co-location can extract information through side channels regardless of sandbox depth. The sandbox is not a defence against nation-state-level threat models; it is a defence against the failure modes of normal operation.
9 — Sandbox debt is silent¶
A sandbox configured today and not maintained will degrade as runtimes evolve, CVEs accumulate, and tool needs shift. The degradation does not produce visible engineering pain until the next incident. The checklist (chapter 12) is the visible artefact; teams that skip the quarterly audit will accumulate sandbox debt.
10 — Cost is real¶
The sandbox costs latency (isolation startup), memory (pool warming, per-VM allocation), and operational overhead (configuration, audit, monitoring). For low-stakes tools, the cost is hard to justify; teams under-sandbox these. For high-stakes tools, the cost is justified but real. The trade-off should be conscious: which tools are sandboxed to what depth, with what cost, against what blast radius.
11 — Some workflows do not fit¶
Some agent workflows are structurally awkward in a sandbox: high-frequency automated batches that approval fatigue makes impractical, long-running compute that resource caps interrupt, workflows that span many tools where each call's isolation costs compound. The discipline is to redesign the workflow when sandbox patterns make it infeasible, not to abandon sandboxing. Sometimes the right answer is to do less work, more carefully.
12 — The sandbox is platform code¶
Every chapter described a subsystem. Each subsystem is software, with its own bugs. A bug in the policy engine grants more than intended; a bug in the credential broker issues over-broad credentials; a bug in the audit pipeline drops records. The sandbox itself needs the discipline it imposes on tools: testing, code review, SLOs, postmortems. Treating the sandbox as infrastructure-that-just-works is the same mistake as treating any platform component as inert.
What the module did not cover¶
- Network security for the agent itself. The agent runtime's network exposure (model gateway, downstream services) is the agent's own concern; this module focuses on tool execution.
- AI model safety upstream. Pre-training data filtering, fine-tuning safety, RLHF — these are upstream disciplines.
- Specific provider sandbox products. AWS Lambda, GCP Cloud Run, Azure Functions, Vercel — each has features and limits this module does not enumerate.
- Hardware-trust models. Confidential computing, hardware enclaves (SGX, SEV) — covered in specialised modules.
- Regulatory specifics. Compliance frameworks differ; this module sketches the patterns.
A reader who needs depth in any of these should treat the module as a foundation.
The lead engineer's honest position¶
When the sandbox is failing, the lead's job is to diagnose the layer:
- Is it isolation? The tool reaches resources it should not.
- Is it resource limits? A tool starves the host or hits caps too often.
- Is it policy? The envelope is too broad or too narrow.
- Is it credentials? Scopes are wrong or lifetimes are too long.
- Is it approval? Gates are missing or fatigued.
- Is it output validation? Injection via tool output succeeds.
- Is it multi-tenant? Cross-tenant access is not structural.
- Is it observability? Incidents are diagnosed by guessing.
Each layer is different work. The discipline is to know which layer is failing rather than declaring "the sandbox is broken."
The unsettled patterns¶
Some patterns in this module are not stable yet:
- Isolation choice trade-offs. Containers vs. gVisor vs. microVMs — community consensus is evolving as costs and threats shift.
- Output validation depth. Sanitisation effectiveness varies; classifier-based detection is improving but not standardised.
- Approval UI design. Approval fatigue defences are an active area; no clear best practice.
- Multi-tenant isolation guarantees. Side-channel defences and confidential computing are reshaping what is achievable.
- Sandbox cost / coverage trade-offs. Per-call vs. per-tool isolation, pool sizing, warm-vs-cold scheduling — operational patterns are still settling.
A reader returning to this module in two years should expect these to have shifted.
Interview Q&A¶
Q1. The sandbox is mature; incidents still happen. What is the honest expectation? The sandbox's value is bounded blast radius, not zero incidents. A mature sandbox catches the common failures, bounds the rare ones, and produces the audit trail for the residual ones. The aim is incidents within acceptable cost; expecting zero misaligns the team. Common wrong answer to avoid: "if we still have incidents, the sandbox failed" — incidents are the sandbox's load, not its failure.
Q2. Walk through the layer diagnosis when the sandbox is performing poorly. Six layers: isolation (the tool reaches too broadly), resource (caps wrong or hit too often), policy (envelope wrong), credentials (scope or lifetime wrong), approval (gates missing or fatigued), output (injection succeeds), multi-tenant (cross-tenant possible), observability (incidents undebuggable). For each layer, the metric and the intervention differ. Common wrong answer to avoid: "the sandbox needs a redesign" — usually one layer is failing.
Q3. The team wants to claim the sandbox is "secure." How do you frame the claim honestly? "The sandbox is hardened against known failure shapes within the threat model, monitored for escapes that succeed, and audited for incidents that bypass any layer. Side channels and novel escapes remain residual risks; the apparatus catches what the sandbox misses." The claim is qualified, specific, and grounded. Promising "secure" without qualification is over-promise that will fail at the first novel incident. Common wrong answer to avoid: "we have a sandbox, we're secure" — secure is not a state; it's a discipline.
Q4. What is the most important thing a lead should not promise about the sandbox? The sandbox will not catch every escape. Novel escapes, side channels, and supply-chain compromises can succeed. The discipline is depth and monitoring; the residual risk is real. Promising no escapes is a promise that will not hold; the team's credibility suffers when the inevitable novel incident arrives. Common wrong answer to avoid: "we have all the bases covered" — known bases, yes; novel ones, no.
Q5. How does sandbox debt manifest, and how do you make it visible? Sandbox debt manifests as: outdated isolation runtimes, resource caps that no longer match tool behaviour, policy envelopes that drift broader, credentials with lifetimes that crept up, approval classifications that have become stale, output validation that has not been updated for new tool patterns, multi-tenant assumptions that no longer hold after retrofits, audit schemas that have not kept up with new tool fields. Visibility: the checklist (chapter 12), the quarterly audit, the apparatus's observability signals (chapter 11). Common wrong answer to avoid: "we'll know if there's a problem" — sandbox debt is invisible until the incident.
What to do differently after reading this¶
- Treat the sandbox as a discipline with boundaries; do not promise more than it delivers.
- Diagnose the layer when failures occur; do not treat every failure as the whole sandbox failing.
- Treat the sandbox itself as platform code with bugs, tests, SLOs, postmortems.
- Balance depth against cost per tool; do not maximise depth indefinitely.
- Make sandbox debt visible through quarterly audits and checklist trending.
- Treat this module as a foundation; revisit as the unsettled patterns settle.
Bridge. This closes the tool execution sandboxes module. The on-call apparatus (
06_ai_runbooks_oncall) and the tool execution sandbox together form the production safety side of AI infrastructure. From here, the curriculum continues into the broader AI security and safety modules — prompt injection, data governance, safety guardrails. Each builds on the apparatus and sandbox this side of the curriculum has produced. →../../03_ai_security_safety/00_safety_guardrail_design/00-eli5.md