04. Resource limits¶

~8 min read. The isolation layer bounds where a tool can reach. The resource layer bounds what the tool can consume while it reaches. CPU, memory, time, and I/O caps are how the sandbox prevents one call from starving the host.

Continues from 03-isolation-models.md. This chapter develops the resource layer. Recurring concepts in bold: CPU cap, wall-time cap, memory cap, I/O cap, fork cap, enforcement boundary, resource budget.

A tool with no resource cap can loop until terminated externally. A tool with overly-permissive caps can starve the host on bad input. The discipline is to cap every tool, sized to the tool's expected work plus a small margin.

The five resource dimensions¶

Dimension	Why it matters	Typical enforcement
CPU time	A loop without termination consumes CPU indefinitely	cgroup CPU quota, language runtime timeout
Wall time	A blocking call can hold a slot for minutes	watchdog timer; kill at threshold
Memory	An allocation in a tight loop exhausts host RAM	cgroup memory limit, language runtime cap
File handles / sockets	A connection storm exhausts handle limits	rlimit, cgroup pids/io
I/O bytes	A tool can saturate disk or network bandwidth	cgroup blkio, network bandwidth limit

The five together describe a resource budget per call. Each tool's budget reflects its expected work plus a small margin; outside the budget, the call is terminated.

CPU and wall time¶

CPU time and wall time are not the same. A tool waiting on a network call uses no CPU but consumes wall time. A tool in a tight CPU loop uses both.

The cap pattern:

CPU time cap. The amount of CPU the call can consume across all its threads. Enforced by cgroups or language runtime timers.
Wall time cap. The total elapsed time from call start to call end. Enforced by a watchdog process that kills the call.

The wall time cap is usually 2-5× the CPU cap to allow for I/O wait. A tool that waits on a slow database call should not be terminated for CPU non-use; it should be terminated for taking too long overall.

A common error: setting only wall time cap. A CPU-loop tool consumes CPU but stays within wall time briefly, then the wall timer fires; meanwhile other tools on the host are starved. Both caps are needed.

Memory¶

Memory caps are enforced by the runtime. cgroups in containers, microVM allocation for VMs, language runtime caps for sandboxes.

The pattern:

Hard cap. Above this, the call is terminated. OOM-kill is the typical mechanism.
Soft cap. Above this, the call is throttled or warned. Useful for catching memory growth before termination.
Per-call allocation. Each call starts with a fresh allocation; memory does not persist across calls in the same pool slot.

The tool author estimates memory need; the platform requires the estimate during configuration; the actual usage is observed and the cap revised if needed.

Fork and concurrency caps¶

A tool that forks subprocesses or spawns concurrent operations can multiply its resource impact. Caps:

Fork cap. Maximum number of subprocesses; enforced by cgroups pids controller or rlimit RLIMIT_NPROC.
Concurrency cap. Maximum threads or async tasks within the call.
Connection cap. Maximum open sockets or file handles.

The fork cap is particularly important for fork bomb defence. A fork bomb in an unsandboxed environment is a host-crash; in a capped environment, the cap fires before the fork rate damages anything else.

I/O caps¶

I/O bandwidth is the resource most often forgotten in cap design. A tool reading or writing at full bandwidth saturates the disk or network, affecting other tools on the host.

Pattern:

Disk I/O cap. Bytes per second; enforced by cgroups blkio controller.
Network I/O cap. Bytes per second; enforced by traffic shaping or per-connection rate limits.
Connection rate cap. New connections per second; defence against connection storms.

I/O caps are particularly important for multi-tenant systems where one tenant's heavy tool call should not affect another tenant's.

A worked example — the data analysis tool¶

The Hyderabad fintech's data analysis tool is a Python execution sandbox. The team's resource configuration:

CPU time cap. 10 seconds.
Wall time cap. 30 seconds.
Memory cap. 1 GB (hard); 800 MB (soft, with a warning logged).
Fork cap. 16 subprocesses.
Disk I/O cap. 100 MB/s.
Network I/O cap. 10 MB/s.
Open files. 64.
Open sockets. 8.

A typical analysis call uses 2-5 seconds of CPU, 300-500 MB of memory, no subprocesses, and modest I/O. The caps are 3-5× the typical usage, leaving headroom for legitimate variation while bounding extreme cases.

A malicious call (a fork bomb) hits the fork cap immediately; the call is terminated; the agent sees a tool error. A runaway loop hits the CPU cap; same outcome. A memory leak hits the soft cap and logs a warning, then the hard cap and terminates. The tool author sees the warning logs and refines the implementation.

Configuration vs. defaults¶

The platform provides defaults; the tool author overrides for their tool. Defaults should be conservative — small CPU, small memory, no forks, narrow I/O. The tool author opts into more for tools that need it.

The pattern:

A tool's manifest declares its resource needs.
Platform reviews the declaration; large requests are scrutinised.
The sandbox enforces the declared caps.
Actual usage is observed; the cap is revised if the tool consistently hits or stays well below the cap.

This produces a tight per-tool envelope with platform oversight.

Operational signals¶

Healthy. Every tool has explicit resource caps. Cap violations are logged and reviewed weekly. Cap revisions are tracked and approved.

First degrading metric. Cap violation rate climbing for a tool. Either the tool is being abused, the tool's implementation has regressed, or the cap is too tight; investigation distinguishes.

Misleading metric. Aggregate resource use. A platform with healthy aggregate can have one tool consistently violating; the per-tool view is the truth.

Expert graph. Per-tool cap violation rate, per-tool actual usage distribution, cap revision frequency.

Boundary of applicability¶

Strong fit. Production tools with measurable resource profiles. The full cap set is justified.

Pathology. Setting caps based on what is convenient (the platform default) rather than what the tool needs. Either the tool is consistently throttled and unreliable, or the cap is too generous and incidents bleed through.

Scale limit. Very large platforms may need multi-tier caps — different caps per tenant tier, per workload class. The pattern remains; the configuration matrix grows.

Failure-prone assumption¶

The seductive wrong belief: resource caps prevent only fork bombs and memory leaks. They also prevent the cascading host degradation that follows. A tool consuming the host's CPU starves co-located tools, which retry, which consumes more CPU, and the host enters a downward spiral. The cap is the structural protection against the spiral.

Where this appears in production¶

AWS Lambda enforces CPU and memory caps per function execution.
A fintech AI has CPU + wall + memory caps on every tool; cap violations are alarmed.
A coding assistant had a fork bomb in user code; the fork cap fired; the host was unaffected.
A data platform has tiered caps: standard tools at 10s CPU, heavy-analysis tools at 60s with explicit approval.
A retail AI sets default caps tighter than the typical tool; tool authors opt in to higher caps.
A telecom AI has I/O caps per tenant; multi-tenant fairness is enforced.
A consumer chatbot had no fork cap; an adversarial input forked until the host crashed; cap added post-incident.
A healthcare AI has memory soft caps with warning; the warnings drove tool optimisation.
A logistics AI has wall time caps tuned per workload; user-interactive tools get tight caps, batch tools get generous caps.
A media AI has connection rate caps; a tool storming a downstream API was throttled.
A travel platform has cap violations tracked; high-violation tools are remediated by the tool author.
A government AI has caps audited quarterly; the actual usage distribution drives revisions.
A B2B SaaS has the cap configuration in the tool's repository; PR review includes cap review.
A legal AI has memory caps low enough that ineffective tools are caught quickly.
A staffing AI sets I/O caps to prevent disk saturation by data-export tools.
A search-ops AI has wall time cap 30s for code execution; longer is async with explicit job submission.
A document AI has CPU caps that allow only short bursts; long-running parsing is offloaded to dedicated workers.
An ad-tech AI treats cap violations as security signals; sudden cap-hit pattern triggers investigation.
A real-estate AI has caps configured per environment (looser in dev, tight in prod).
A medical AI has all caps documented for regulatory audit.

Recall / checkpoint¶

Name the five resource dimensions.
What is the difference between CPU and wall time cap, and why are both needed?
What is the role of soft caps?
What is the fork cap and what does it defend against?
Why are I/O caps load-bearing in multi-tenant systems?
How is the resource configuration validated and revised over time?
What signals a degrading resource layer?

Interview Q&A¶

Q1. A team's tool has only a wall time cap. Walk through the failure mode. A tool that loops on CPU consumes CPU without exceeding wall time briefly. While the call is running, other tools on the host are starved. The wall time eventually fires, but the damage has been done — co-located calls slowed, possibly some failed, the host's responsiveness degraded. The fix is to add a CPU cap; the two caps together bound both pathologies — CPU loops and slow I/O-bound calls. Common wrong answer to avoid: "wall time is enough" — misses the cascading degradation.

Q2. The team's tool consistently hits the memory cap; investigation finds the tool is correctly implemented but the workload has grown. What is the right response? Revise the cap upward, with platform oversight. The cap is too tight for the actual work; the tool is rationally consuming what it needs. The platform approves the revision based on usage data; the new cap is enforced. The opposite pattern — leaving the cap tight and accepting frequent failures — is worse for users and trust. The discipline is to size caps to the work, not to convenience. Common wrong answer to avoid: "the tool author should fix the tool" — when the tool is correct, the cap is the variable.

Q3. Walk through the fork cap and what it defends against. The fork cap limits the number of subprocesses a tool can spawn. Defence: fork bombs (a process that creates two processes that each create two more, recursively, exhausting the host); legitimate but unbounded fork patterns (a tool that spawns a process per file in a large directory). The cap is set per tool based on its expected work — a tool that legitimately needs subprocesses gets a higher cap with platform review. Common wrong answer to avoid: "fork bombs are obscure" — they're prevented routinely by the cap; without it, one adversarial input can crash the host.

Q4. Why are I/O caps important in multi-tenant systems? A tool's I/O bandwidth affects co-located tools on the same host. One tenant's heavy data-export tool can saturate the disk, slowing every other tool. The I/O cap bounds this — each call is rate-limited so one tool cannot dominate. Without I/O caps, multi-tenant fairness becomes a feature of luck (which tools happen to be co-located). Common wrong answer to avoid: "the host's I/O is large enough" — large I/O capacity is consumed faster than CPU; I/O is often the first contended resource.

Q5. The team has a soft cap that fires warnings. The warnings are ignored. What is the apparatus failure? The warnings are noise. Either they fire too often (cap too tight) or the team has not built the discipline to triage them. The fix is either to tighten the cap so violations are rare (and treated as real signals) or to wire the warnings to a triage queue with assigned ownership. A warning that no one reads is dead weight; either fix it or remove it. Common wrong answer to avoid: "warnings are useful information" — only if someone reads them.

Q6. How does the resource layer interact with the isolation layer? The isolation layer determines where caps are enforced. cgroups in containers; rlimits in process isolation; per-VM allocation in microVMs; language runtime caps in language sandboxes. The two layers compose: isolation provides the boundary where enforcement happens; resource layer is what is enforced. A weak isolation with strong resource caps still has escape vectors; a strong isolation with no caps has resource exhaustion. Both are needed. Common wrong answer to avoid: "isolation includes resources" — they overlap in implementation but are conceptually distinct.

Design / debug exercise (10 minutes)¶

Modelled example. Walk through the worked example (the Hyderabad data analysis tool). Compare the five caps to the typical usage; identify where the headroom is reasonable versus excessive.

Your turn. Pick one tool. List its five resource caps (or note where they are unset). Estimate the call's typical usage in each dimension. Identify gaps and over-provisioning.

Reproduce from memory. Write the five dimensions and the typical enforcement mechanism. The signal of internalisation is that you can size caps for a hypothetical new tool quickly.

Operational memory¶

This chapter explained the resource layer: caps on CPU, wall time, memory, forks, and I/O, sized per tool and enforced by the runtime. The important idea is that resource caps prevent both direct exhaustion and the cascading host degradation that follows.

You learned to set caps per dimension, distinguish soft and hard caps, validate caps against actual usage, and revise them with platform oversight. That solves the opening failure because no tool call can consume more than its declared budget.

Carry this diagnostic forward: when a tool's cap violation rate is high, ask whether the tool is being abused, the implementation has regressed, or the cap is wrong. Each diagnosis leads to a different action.

Remember:

Five dimensions: CPU, wall, memory, forks, I/O.
CPU and wall are both needed; one is not the other.
Fork caps defend against fork bombs and unbounded subprocess patterns.
I/O caps are load-bearing in multi-tenant systems.
Caps are validated against actual usage; revised with oversight.

Bridge. Resource limits bound what a call consumes. Policy limits bound where it reaches. The next chapter is the filesystem and network policy layer — the allowlist and denylist that defines a tool's reach. → 05-filesystem-and-network-policy.md