00. Tool execution sandboxes — First-principles overview¶
Module 19_tool_integration_contracts defined the AI-side contract — what the model is told it can call. Module 01_model_gateway_provider_ops enforced the provider-side boundary. This module is the third boundary — between the AI's tool calls and the systems they touch.
A platform engineer at a Chennai data-engineering SaaS is paged at 04:00 IST on a Wednesday because production cloud storage has been wiped clean. The investigation arrives at an embarrassing answer: the company's new AI-powered "data assistant" includes a Python-execution tool the model uses to analyse customer data. A user — neither malicious nor unusual — asked the assistant to "clean up the temporary files in our workspace." The model wrote a rm -rf command targeting a path it inferred from the system prompt. The Python tool executed the command. The path resolved, due to a misconfigured base directory, to a production bucket mount. Five terabytes of customer data, gone in 12 seconds. The cause is not the model, not the prompt, not the user. The cause is that the Python tool ran with the credentials, filesystem mounts, and network access of the AI service itself — no sandbox, no allowlist, no resource limit, no audit trail. By 09:00 the team is restoring from backup and writing a postmortem that will lead to a sandbox redesign. The honest sentence the lead writes: nothing structural prevented this.
That structural prevention is the tool execution sandbox. It is the boundary between what the model can call and what those calls can do. Every chapter of this module is one surface of that boundary. The opening incident is what happens when the boundary is missing; the rest of the module is what is on either side of it once the boundary exists.
What a tool execution sandbox is, in one sentence¶
A tool execution sandbox is the production boundary between the AI's tool calls and the systems those calls touch, designed so that the blast radius of a wrong, malicious, or model-hallucinated call is bounded by enforcement, not by trust.
Read the sentence right to left.
- Bounded by enforcement, not by trust — the sandbox does not rely on the model behaving correctly. The model is treated as an untrusted input to the tool layer.
- Blast radius of a wrong call — the sandbox shapes what damage is possible, not whether wrong calls occur. Wrong calls are inevitable; bounded blast radius is the design goal.
- Tool calls and the systems they touch — the boundary is between the agent's intent and the production systems (filesystems, databases, networks, credentials).
- Production boundary — not a library abstraction inside the agent runtime. A real enforcement layer with its own SLOs, audits, and on-call.
If a team has shipped an agent that executes code, calls APIs, or manipulates data on behalf of users, and does not have a sandbox layer, the question is not whether an incident will happen but how large it will be.
The six sandbox surfaces¶
Every production sandbox has six load-bearing surfaces. Memorise them; the rest of the module is consequences.
| Surface | One-liner | Pressure it answers |
|---|---|---|
| The isolation layer | Run the tool in a process, container, or VM that cannot reach the host | confinement: the tool should not be able to access systems beyond its allowance |
| The resource limit layer | CPU, memory, time, I/O caps enforced by the runtime | exhaustion: a tool that loops or allocates without bound starves the host |
| The policy layer | Filesystem, network, syscall allowlists / denylists per tool | least privilege: each tool can only do what its purpose requires |
| The credential layer | Tool-specific credentials with scoped permissions, never the agent's | secret leakage: a compromised tool should not yield the agent's keys |
| The approval layer | Irreversible or high-blast-radius actions require human approval | reversibility: some actions cannot be undone after the fact |
| The observability layer | Per-call audit, syscalls observed, resource use measured | accountability: every tool call leaves a trail |
The module's twelve chapters explore each surface in turn, then synthesise. The final two files are the architect checklist and the honest admission.
What this module is not about¶
- Designing the tool contracts. That is
19_tool_integration_contracts. This module is the enforcement layer the contracts assume exists. - Prompt injection defence in general. Covered in
03_ai_security_safety/01_prompt_injection_security. This module's job is to bound the damage when injection succeeds, not to prevent injection itself. - General application security. OWASP, network security, identity management — assumed as a baseline.
- Model-side controls. Refusals, capability restrictions, system-prompt design — covered upstream. The sandbox treats the model's output as untrusted regardless of upstream controls.
The recurring vocabulary¶
These terms appear in every chapter.
| Name | Surface | What it is |
|---|---|---|
| the isolation boundary | Isolation | the process, container, or VM that runs the tool |
| the syscall allowlist | Policy | the set of system calls the tool is permitted to make |
| the resource budget | Resource | the CPU, memory, time, and I/O caps enforced per call |
| the scoped credential | Credential | the tool-specific permission, never the agent's full access |
| the approval gate | Approval | the human-in-the-loop checkpoint for irreversible actions |
| the audit record | Observability | the per-call record: caller, tool, parameters, outcome, cost |
| the escape vector | All | a path by which a tool reaches beyond its sandbox |
| the blast radius | All | the worst-case scope of damage a tool can cause |
| the policy envelope | Policy | the union of filesystem, network, syscall, resource policies |
| the multi-tenant isolation | All | the guarantee that one tenant's tool calls cannot affect another |
The journey: bound the boundary, then operate it¶
This module has two acts.
Act 1 — Build the sandbox (files 01–07). The case for the sandbox, the surfaces, isolation models, resource limits, filesystem/network policy, credentials, approval gates. By file 07 the sandbox exists as a defensible production layer.
Act 2 — Operate the sandbox (files 08–11). Output validation, escape vectors and defences, multi-tenant isolation, observability. The sandbox does not become more powerful; it becomes resilient to time, scale, and attack.
Synthesis (files 12–13). Architect checklist and honest admission.
Memory map¶
| # | File | Surface | Pressure answered | What it adds |
|---|---|---|---|---|
| 01 | why-unsandboxed-execution-fails | — | the cost of unbounded tool calls | the case that forces the boundary |
| 02 | the-sandbox-surfaces | All | what the sandbox actually does | the six surfaces as one architecture |
| 03 | isolation-models | Isolation | one isolation does not fit all | process, container, VM, language isolation |
| 04 | resource-limits | Resource | tools can exhaust the host | CPU, memory, time, I/O caps |
| 05 | filesystem-and-network-policy | Policy | filesystem and network are the largest blast radius | allowlists, denylists, namespacing |
| 06 | secrets-and-credentials-in-tools | Credential | the tool's credentials are not the agent's | scoped tokens, broker patterns, rotation |
| 07 | irreversible-actions-and-approvals | Approval | some actions cannot be undone after | approval gates, dry-runs, idempotency |
| — milestone: sandbox is defensible — | ||||
| 08 | output-validation-and-sanitisation | All | tool output flows back to the model | sanitisation, injection prevention |
| 09 | escape-vectors-and-defenses | All | sandboxes are escaped | known vectors, hardening, monitoring |
| 10 | multi-tenant-isolation | All | one tenant's calls must not affect another | per-tenant boundaries, namespacing |
| 11 | observability-and-audit | Observability | every call must leave a trail | per-call audit, syscall tracing, cost |
| — milestone: sandbox is operable — | ||||
| 12 | architect-checklist | Synthesis | completeness | 20-item design / build / launch / operate |
| 13 | honest-admission | Boundaries | humility | what sandbox design cannot solve |
Three traversal paths use this map. Prerequisite path — top to bottom. Failure path — when a security incident points to a tool, find which surface failed. Synthesis path — pick two surfaces and ask how they compose (Isolation + Credential = how tightly-scoped tokens limit a sandbox escape).
How this module relates to its neighbours¶
19_tool_integration_contracts— defines the tool's contract; this module enforces the contract's boundary.01_prompt_injection_security— prevents injection at the model layer; this module bounds the damage when injection succeeds.03_data_access_governance— defines what data a tool may access; this module enforces the data-access boundary at execution time.01_model_gateway_provider_ops— the boundary between agent and model provider; this module is the boundary between agent and downstream systems.06_ai_runbooks_oncall— the apparatus for handling sandbox incidents.16_multi_agent_coordination— multi-agent systems multiply the sandbox surface.
Top resources¶
- gVisor — Application Kernel for Containers — https://gvisor.dev/
- Firecracker — microVMs for serverless — https://firecracker-microvm.github.io/
- Open Container Initiative (OCI) runtime spec — https://opencontainers.org/
- AWS Lambda execution environment — https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html
- OWASP API Security Top 10 — https://owasp.org/API-Security/
These are the technical baselines. The AI-specific surface on top is the contribution of this module.
What's coming¶
- 01-why-unsandboxed-execution-fails.md — the cost of unbounded tool calls.
- 02-the-sandbox-surfaces.md — six surfaces as a service architecture.
- 03-isolation-models.md — process, container, VM, language.
- 04-resource-limits.md — CPU, memory, time, I/O.
- 05-filesystem-and-network-policy.md — allowlists, denylists, namespacing.
- 06-secrets-and-credentials-in-tools.md — scoped tokens, broker patterns.
- 07-irreversible-actions-and-approvals.md — approval gates, dry-runs, idempotency.
- 08-output-validation-and-sanitisation.md — tool output as untrusted input.
- 09-escape-vectors-and-defenses.md — known sandbox escapes.
- 10-multi-tenant-isolation.md — per-tenant boundaries.
- 11-observability-and-audit.md — per-call audit.
- 12-architect-checklist.md — twenty items.
- 13-honest-admission.md — limits.
Bridge. Before designing isolation, resource limits, or approval gates, we feel why the sandbox is needed at all. The opening incident showed what happens without a sandbox; the first chapter walks through the failure shapes the sandbox must absorb. → 01-why-unsandboxed-execution-fails.md