00. Tool execution sandboxes — First-principles overview¶

Module 19_tool_integration_contracts defined the AI-side contract — what the model is told it can call. Module 01_model_gateway_provider_ops enforced the provider-side boundary. This module is the third boundary — between the AI's tool calls and the systems they touch.

A platform engineer at a Chennai data-engineering SaaS is paged at 04:00 IST on a Wednesday because production cloud storage has been wiped clean. The investigation arrives at an embarrassing answer: the company's new AI-powered "data assistant" includes a Python-execution tool the model uses to analyse customer data. A user — neither malicious nor unusual — asked the assistant to "clean up the temporary files in our workspace." The model wrote a rm -rf command targeting a path it inferred from the system prompt. The Python tool executed the command. The path resolved, due to a misconfigured base directory, to a production bucket mount. Five terabytes of customer data, gone in 12 seconds. The cause is not the model, not the prompt, not the user. The cause is that the Python tool ran with the credentials, filesystem mounts, and network access of the AI service itself — no sandbox, no allowlist, no resource limit, no audit trail. By 09:00 the team is restoring from backup and writing a postmortem that will lead to a sandbox redesign. The honest sentence the lead writes: nothing structural prevented this.

That structural prevention is the tool execution sandbox. It is the boundary between what the model can call and what those calls can do. Every chapter of this module is one surface of that boundary. The opening incident is what happens when the boundary is missing; the rest of the module is what is on either side of it once the boundary exists.

What a tool execution sandbox is, in one sentence¶

A tool execution sandbox is the production boundary between the AI's tool calls and the systems those calls touch, designed so that the blast radius of a wrong, malicious, or model-hallucinated call is bounded by enforcement, not by trust.

Read the sentence right to left.

Bounded by enforcement, not by trust — the sandbox does not rely on the model behaving correctly. The model is treated as an untrusted input to the tool layer.
Blast radius of a wrong call — the sandbox shapes what damage is possible, not whether wrong calls occur. Wrong calls are inevitable; bounded blast radius is the design goal.
Tool calls and the systems they touch — the boundary is between the agent's intent and the production systems (filesystems, databases, networks, credentials).
Production boundary — not a library abstraction inside the agent runtime. A real enforcement layer with its own SLOs, audits, and on-call.

If a team has shipped an agent that executes code, calls APIs, or manipulates data on behalf of users, and does not have a sandbox layer, the question is not whether an incident will happen but how large it will be.

The six sandbox surfaces¶

Every production sandbox has six load-bearing surfaces. Memorise them; the rest of the module is consequences.

Surface	One-liner	Pressure it answers
The isolation layer	Run the tool in a process, container, or VM that cannot reach the host	confinement: the tool should not be able to access systems beyond its allowance
The resource limit layer	CPU, memory, time, I/O caps enforced by the runtime	exhaustion: a tool that loops or allocates without bound starves the host
The policy layer	Filesystem, network, syscall allowlists / denylists per tool	least privilege: each tool can only do what its purpose requires
The credential layer	Tool-specific credentials with scoped permissions, never the agent's	secret leakage: a compromised tool should not yield the agent's keys
The approval layer	Irreversible or high-blast-radius actions require human approval	reversibility: some actions cannot be undone after the fact
The observability layer	Per-call audit, syscalls observed, resource use measured	accountability: every tool call leaves a trail

The module's twelve chapters explore each surface in turn, then synthesise. The final two files are the architect checklist and the honest admission.

What this module is not about¶

Designing the tool contracts. That is 19_tool_integration_contracts. This module is the enforcement layer the contracts assume exists.
Prompt injection defence in general. Covered in 03_ai_security_safety/01_prompt_injection_security. This module's job is to bound the damage when injection succeeds, not to prevent injection itself.
General application security. OWASP, network security, identity management — assumed as a baseline.
Model-side controls. Refusals, capability restrictions, system-prompt design — covered upstream. The sandbox treats the model's output as untrusted regardless of upstream controls.

The recurring vocabulary¶

These terms appear in every chapter.

Name	Surface	What it is
the isolation boundary	Isolation	the process, container, or VM that runs the tool
the syscall allowlist	Policy	the set of system calls the tool is permitted to make
the resource budget	Resource	the CPU, memory, time, and I/O caps enforced per call
the scoped credential	Credential	the tool-specific permission, never the agent's full access
the approval gate	Approval	the human-in-the-loop checkpoint for irreversible actions
the audit record	Observability	the per-call record: caller, tool, parameters, outcome, cost
the escape vector	All	a path by which a tool reaches beyond its sandbox
the blast radius	All	the worst-case scope of damage a tool can cause
the policy envelope	Policy	the union of filesystem, network, syscall, resource policies
the multi-tenant isolation	All	the guarantee that one tenant's tool calls cannot affect another

The journey: bound the boundary, then operate it¶

This module has two acts.

Act 1 — Build the sandbox (files 01–07). The case for the sandbox, the surfaces, isolation models, resource limits, filesystem/network policy, credentials, approval gates. By file 07 the sandbox exists as a defensible production layer.

Act 2 — Operate the sandbox (files 08–11). Output validation, escape vectors and defences, multi-tenant isolation, observability. The sandbox does not become more powerful; it becomes resilient to time, scale, and attack.

Synthesis (files 12–13). Architect checklist and honest admission.

Memory map¶

#	File	Surface	Pressure answered	What it adds
01	why-unsandboxed-execution-fails	—	the cost of unbounded tool calls	the case that forces the boundary
02	the-sandbox-surfaces	All	what the sandbox actually does	the six surfaces as one architecture
03	isolation-models	Isolation	one isolation does not fit all	process, container, VM, language isolation
04	resource-limits	Resource	tools can exhaust the host	CPU, memory, time, I/O caps
05	filesystem-and-network-policy	Policy	filesystem and network are the largest blast radius	allowlists, denylists, namespacing
06	secrets-and-credentials-in-tools	Credential	the tool's credentials are not the agent's	scoped tokens, broker patterns, rotation
07	irreversible-actions-and-approvals	Approval	some actions cannot be undone after	approval gates, dry-runs, idempotency
	— milestone: sandbox is defensible —
08	output-validation-and-sanitisation	All	tool output flows back to the model	sanitisation, injection prevention
09	escape-vectors-and-defenses	All	sandboxes are escaped	known vectors, hardening, monitoring
10	multi-tenant-isolation	All	one tenant's calls must not affect another	per-tenant boundaries, namespacing
11	observability-and-audit	Observability	every call must leave a trail	per-call audit, syscall tracing, cost
	— milestone: sandbox is operable —
12	architect-checklist	Synthesis	completeness	20-item design / build / launch / operate
13	honest-admission	Boundaries	humility	what sandbox design cannot solve

Three traversal paths use this map. Prerequisite path — top to bottom. Failure path — when a security incident points to a tool, find which surface failed. Synthesis path — pick two surfaces and ask how they compose (Isolation + Credential = how tightly-scoped tokens limit a sandbox escape).

How this module relates to its neighbours¶

19_tool_integration_contracts — defines the tool's contract; this module enforces the contract's boundary.
01_prompt_injection_security — prevents injection at the model layer; this module bounds the damage when injection succeeds.
03_data_access_governance — defines what data a tool may access; this module enforces the data-access boundary at execution time.
01_model_gateway_provider_ops — the boundary between agent and model provider; this module is the boundary between agent and downstream systems.
06_ai_runbooks_oncall — the apparatus for handling sandbox incidents.
16_multi_agent_coordination — multi-agent systems multiply the sandbox surface.

Top resources¶

gVisor — Application Kernel for Containers — https://gvisor.dev/
Firecracker — microVMs for serverless — https://firecracker-microvm.github.io/
Open Container Initiative (OCI) runtime spec — https://opencontainers.org/
AWS Lambda execution environment — https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html
OWASP API Security Top 10 — https://owasp.org/API-Security/

These are the technical baselines. The AI-specific surface on top is the contribution of this module.

What's coming¶

01-why-unsandboxed-execution-fails.md — the cost of unbounded tool calls.
02-the-sandbox-surfaces.md — six surfaces as a service architecture.
03-isolation-models.md — process, container, VM, language.
04-resource-limits.md — CPU, memory, time, I/O.
05-filesystem-and-network-policy.md — allowlists, denylists, namespacing.
06-secrets-and-credentials-in-tools.md — scoped tokens, broker patterns.
07-irreversible-actions-and-approvals.md — approval gates, dry-runs, idempotency.
08-output-validation-and-sanitisation.md — tool output as untrusted input.
09-escape-vectors-and-defenses.md — known sandbox escapes.
10-multi-tenant-isolation.md — per-tenant boundaries.
11-observability-and-audit.md — per-call audit.
12-architect-checklist.md — twenty items.
13-honest-admission.md — limits.

Bridge. Before designing isolation, resource limits, or approval gates, we feel why the sandbox is needed at all. The opening incident showed what happens without a sandbox; the first chapter walks through the failure shapes the sandbox must absorb. → 01-why-unsandboxed-execution-fails.md