11. Multi-tenant orchestration — isolation, fairness, and policy when workflows share infrastructure¶

~19 min read. Every concept from files 01–10 assumed one workflow running for one user. Production systems run thousands of workflows for hundreds of organisations, simultaneously, on shared infrastructure. This file adds the constraints that appear when the control plane serves multiple tenants: state isolation, scheduling fairness, budget enforcement, and policy-aware routing.

Built on the first-principles overview in 00-first-principles.md. Tenant isolation — the pressure that one customer's workflow must never read, corrupt, or starve another customer's workflow — is the central tension. The tenant boundary is the mechanism: a scoping primitive that partitions state, checkpoints, budgets, and scheduling across organisational boundaries.

What file 10 established and what remains¶

File 10 showed how a single workflow adapts when its plan breaks: replan triggers, scoped revision, state trust classification. All of that assumed a single-tenant context — one workflow, one user, one budget, one policy. The gap: when the same control plane serves Bank A, Bank B, and Fintech C simultaneously, every resource (model capacity, tool rate limits, human reviewer queues, checkpoint storage) is shared. Without explicit multi-tenant controls, one noisy tenant can starve others, state can leak across boundaries, and one tenant's policy constraints can be violated by routing logic optimised for another.

The fintech platform where one customer's outage degraded everyone¶

A loan-processing platform serves 200 financial institutions. Each institution submits loan applications that flow through the same orchestration engine: verify → credit → compliance → decide. One Monday morning, Institution X experiences a systems migration and submits 5,000 applications in bulk (normal volume: 50/day). The orchestrator's worker pool is shared.

Normal state:
  200 institutions × ~50 apps/day = ~10,000 workflows/day
  Worker pool: 100 concurrent slots
  Average completion: 45s per workflow
  Throughput: comfortable

Monday morning:
  Institution X: 5,000 applications in 30 minutes
  199 other institutions: normal volume
  Worker pool: 100 slots consumed by Institution X's burst
  Other institutions' workflows: queued behind X's flood
  Average latency for others: 45s → 15 minutes
  SLA violations: 47 institutions affected

Without multi-tenant controls: Institution X's burst consumes all worker slots. The credit bureau rate limit is exhausted by X's calls. Human reviewer queues fill with X's compliance reviews. Every other customer waits.

With multi-tenant controls: per-tenant concurrency cap (max 20 slots per institution), per-tenant rate limiting (max 200 bureau calls/hour), per-tenant reviewer queue isolation. Institution X's burst fills their 20 slots and queues the rest internally. Other institutions continue at normal latency.

Without tenant controls:         With tenant controls:
  X: fast (consuming everything)   X: 20 concurrent, rest queued
  Others: starved (15min+ wait)    Others: normal latency (45s)
  SLA: 47 violations               SLA: 0 violations
  Platform trust: damaged           Platform trust: maintained

Teacher voice. Multi-tenant orchestration is not about being "fair for niceness." It's about platform reliability. Without per-tenant isolation, any single tenant's behaviour can create a platform-wide outage. The noisy-neighbour problem is the defining challenge of shared orchestration infrastructure.

The invariant: a tenant boundary is a hard partition in every control-plane resource¶

Tenant isolation isn't just "separate database rows." It means every resource the workflow touches — state storage, checkpoint storage, model capacity, tool rate limits, human review queues, scheduling slots, audit logs, and budget tracking — is partitioned by tenant. A leak in any one dimension can compromise security, fairness, or compliance.

Five dimensions of tenant isolation¶

┌─────────────────────────────────────────────────────────────────────┐
│                     TENANT BOUNDARY                                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  1. STATE ISOLATION                                                  │
│     Workflow state, checkpoints, and audit logs partitioned by       │
│     tenant_id. Cross-tenant reads are impossible by construction.    │
│                                                                      │
│  2. COMPUTE ISOLATION                                                │
│     Per-tenant concurrency caps. Burst from one tenant can't         │
│     consume the entire worker pool.                                  │
│                                                                      │
│  3. RATE LIMIT ISOLATION                                             │
│     External API rate limits (credit bureau, model endpoints)        │
│     tracked per-tenant. One tenant's flood can't exhaust             │
│     another's allocation.                                            │
│                                                                      │
│  4. QUEUE ISOLATION                                                  │
│     Human review queues, escalation queues, and approval             │
│     queues separated by tenant or priority class.                    │
│                                                                      │
│  5. BUDGET ISOLATION                                                 │
│     Cost tracking and budget enforcement per-tenant.                 │
│     One tenant exceeding budget doesn't affect others' capacity.     │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Most systems get dimension 1 right (separate data). Dimensions 2–5 are where the noisy-neighbour problem lives.

Tenant identity: the context every workflow carries¶

Every workflow execution must carry tenant context from first step to last. This context determines routing, policy, limits, and isolation:

tenant_context:
├── tenant_id: "institution-x-4921"
├── org_id: "parent-org-holding-company"
├── policy_tier: "enterprise" | "standard" | "free"
├── budget_class: "high-volume" (determines rate limits)
├── data_residency: "eu-west-1" (regulatory constraint)
├── allowed_tools: ["credit_bureau_v2", "compliance_db", "notification_api"]
├── denied_tools: ["experimental_model_v3"] (not approved for this tenant)
├── model_tier: "gpt-4" (enterprise) | "gpt-4o-mini" (standard)
├── concurrency_cap: 20 (max simultaneous workflows)
├── reviewer_pool: "institution-x-reviewers" (dedicated or shared)
└── audit_retention: "7 years" (regulatory requirement)

This is the handoff contract (file 02) extended to the platform level. Every node in the workflow graph receives tenant_context as implicit input. Routing decisions, model selection, tool access, and checkpoint storage all reference it.

Mini-FAQ. "Does every node really need all this context?" The node code doesn't need to examine every field. But the control plane infrastructure (rate limiter, scheduler, checkpointer) needs tenant_id and policy_tier to enforce isolation. The context travels with the workflow; individual nodes see only what they declare as input.

Scheduling fairness: weighted fair queuing for workflows¶

When 200 tenants share 100 worker slots, the scheduler determines who waits and who executes. Naive FIFO (first-in, first-out) is trivially gamed by burst submitters.

Scheduling strategies:

FIFO (naive):
  Process in arrival order
  Problem: one tenant can flood the queue and starve others

PER-TENANT ROUND-ROBIN:
  Each tenant gets equal turns regardless of volume
  Problem: low-volume tenants get same share as high-volume

WEIGHTED FAIR QUEUE:
  Each tenant's share proportional to their tier/contract
  High-volume tenants get more capacity but not unlimited
  Problem: needs weight configuration per tenant

PRIORITY LANES + CAPS:
  Urgent workflows (compliance deadlines, SLA-bound) get priority
  But each tenant still has a concurrency cap regardless of priority
  Problem: priority abuse (everything marked urgent)

The practical pattern combines weighted fair queuing with per-tenant concurrency caps:

Scheduling algorithm:
1. Workflow arrives with tenant_id
2. Check: is tenant at concurrency cap? → if yes, queue internally per-tenant
3. Check: which tenant has the most unused fair-share? → schedule them next
4. Within a tenant's queue: respect priority (urgent > normal > background)
5. Global safety valve: if total queue depth > threshold → reject new submissions with backpressure signal

Scheduling approach	Fairness	Burst tolerance	Complexity
FIFO	None	Zero (flood = starve)	Trivial
Per-tenant round-robin	Equal share	Good	Low
Weighted fair queue	Proportional	Good	Moderate
Priority + caps	Policy-driven	Best	Highest

Threaded example: loan-approval platform with three tenant types¶

The same loan-approval workflow (verify → credit → compliance → decide) serves three tenant categories:

Tenant: Regional Bank (Enterprise tier)
├── concurrency: 50 workflows
├── model: gpt-4 for compliance reasoning
├── rate limit: 500 bureau calls/hour
├── reviewer: dedicated compliance team (3 people)
├── data residency: us-east-1 only
├── SLA: 95% of decisions within 4 hours
└── budget: $2.00/workflow (generous)

Tenant: Credit Union (Standard tier)
├── concurrency: 10 workflows
├── model: gpt-4o-mini for compliance
├── rate limit: 50 bureau calls/hour
├── reviewer: shared reviewer pool
├── data residency: no constraint
├── SLA: 95% within 24 hours
└── budget: $0.50/workflow

Tenant: Fintech Startup (Free tier)
├── concurrency: 3 workflows
├── model: gpt-4o-mini
├── rate limit: 10 bureau calls/hour
├── reviewer: shared pool, lowest priority
├── data residency: no constraint
├── SLA: best-effort
└── budget: $0.10/workflow (constrained)

Now: the Regional Bank submits a burst of 200 applications (quarterly review). Without per-tenant caps, they'd consume all capacity. With caps: 50 run concurrently, 150 queue internally. The Credit Union's 10 slots remain available. The Fintech Startup's 3 slots remain available.

The Regional Bank's 200 applications consume their 500 bureau calls/hour within 25 minutes (some workflows retry). After that, their remaining workflows wait for the next hour's quota. This does not affect other tenants' quotas.

Resource consumption timeline:

t=0:   Bank submits 200 apps
t=0-5m:  50 Bank workflows running, 150 queued per-tenant
         Credit Union: 10 slots available (unaffected)
         Fintech: 3 slots available (unaffected)
t=25m: Bank hits bureau rate limit (500/hr)
         Bank workflows with pending bureau calls: wait
         Credit Union: still processing normally (own quota: 50/hr)
         Fintech: still processing normally (own quota: 10/hr)
t=60m: Bank quota refreshes, remaining workflows resume

Budget enforcement: preventing runaway cost across tiers¶

Each tenant has a budget allocation (per-workflow or per-month). The control plane enforces this at the step level, not just at workflow entry:

Budget enforcement points:

1. WORKFLOW ADMISSION
   "Does this tenant have remaining budget to start a workflow?"

2. PRE-STEP CHECK
   "Does estimated step cost fit within remaining workflow budget?"

3. MODEL SELECTION
   "Which model tier is this tenant allowed? Route accordingly."

4. GRACEFUL DEGRADATION
   "Budget nearly exhausted → use cheaper model, skip optional steps,
    or escalate to human (who doesn't consume model budget)"

Why enforce at step level, not just admission? Because a workflow may have been admitted with sufficient budget, but multiple retries or replanning (file 10) can consume more than expected. Without mid-workflow budget checks, a replanning loop can exhaust a tenant's monthly allocation on a single stuck workflow.

Budget enforcement failure example:

Workflow admitted with $0.50 budget (Fintech tier)
Step 1: verify_identity → $0.02 (fine, $0.48 remaining)
Step 2: pull_credit → $0.05 (fine, $0.43 remaining)
Step 3: compliance_check → fails → replan → $0.08 (ok, $0.35 remaining)
Step 3b: deep_verification → $0.12 (ok, $0.23 remaining)
Step 3c: another replan → $0.08 + step → $0.15 ($0.00 remaining)
Step 4: issue_decision → cannot afford model call

Without mid-workflow check: step 4 executes, cost exceeds tenant budget
With mid-workflow check: pause at $0.05 remaining, offer: cheaper model or escalate

Data residency and policy-aware routing¶

Some tenants have regulatory constraints on where data is processed and stored:

Routing constraint examples:

Regional Bank (US-regulated):
├── Workflow state: must reside in us-east-1
├── Model calls: must use US-hosted endpoints
├── Checkpoints: US-region storage only
└── Bureau calls: domestic bureau only

European Institution (GDPR-bound):
├── Workflow state: eu-west-1 only
├── Model calls: EU-hosted endpoints OR approved US with adequacy decision
├── Human reviewers: must be EU-based employees
└── Data retention: max 30 days post-decision (right to erasure)

The control plane must route based on tenant policy, not just cost or latency optimisation. A workflow that routes a GDPR-bound tenant's data through a US model endpoint violates compliance — even if that endpoint is faster.

This is where tenant_context.data_residency directly affects routing (file 03), checkpointing (file 09), and state storage (file 05). Multi-tenant orchestration is policy-aware orchestration.

Operational signals — healthy platform, degrading platform, broken platform¶

Healthy behaviour: - Per-tenant latency within SLA for > 95% of workflows - Concurrency cap utilisation < 80% for most tenants (headroom for bursts) - Zero cross-tenant state access events - Budget overruns < 1% of workflows - Reviewer queue depth balanced across priority classes

First degrading signal: - One tenant's latency spiking while others are normal → likely hitting their own cap (expected) vs infrastructure issue (investigate) - Multiple tenants' latency rising simultaneously → shared resource bottleneck (model endpoint, database, worker pool) - Budget overruns climbing → replan loops consuming budget, or step cost estimates stale - Reviewer queue depth growing for one priority class → reviewer pool undersized for that tier

Misleading metric: - "Average latency across all tenants" — hides that one tenant is fast (hogging resources) while others are slow (starved) - "Total throughput" — high throughput doesn't mean fair throughput. Break down by tenant.

Expert signal: - Fairness index (Jain's fairness) across tenants: are all tenants getting proportional service relative to their tier? - Cross-tenant resource correlation: when tenant A's usage spikes, does tenant B's latency increase? If yes, isolation is incomplete. - Budget utilisation variance: are some tenants consistently under-utilising while others consistently over-running?

Boundary of applicability¶

Works unusually well: - SaaS platforms serving multiple organisations with the same workflow type but different policies - Regulated industries where data residency and processing rules vary by customer - Platforms with mixed tier levels (free, standard, enterprise) sharing infrastructure

Becomes pathological: - Single-tenant deployments (all this isolation is overhead without value) - Platforms where all tenants are identical (same policies, same budgets, same SLAs — simpler scheduling suffices) - Extremely low-volume platforms (< 10 tenants, < 100 workflows/day — manual management works)

Scale that invalidates naive intuition: - At 1000+ tenants, per-tenant scheduling metadata itself becomes a storage and lookup concern — need efficient tenant-context caching - At 100K+ concurrent workflows, per-workflow tenant policy lookup must be O(1) not O(n) — typically solved by injecting policy into the workflow at admission time rather than looking it up at each step

Failure-prone assumption: "tenant isolation is just a database concern"¶

The seductive wrong idea: "We partitioned the database by tenant_id, so we have isolation."

The correction: Data isolation is necessary but nowhere near sufficient. Compute isolation (scheduling fairness), rate limit isolation (API quotas), queue isolation (reviewer pools), budget isolation (cost tracking), and policy isolation (routing rules) must all be tenant-aware. A system with perfect data isolation but shared scheduling is still vulnerable to noisy-neighbour starvation. A system with fair scheduling but shared rate limits still lets one tenant exhaust API quotas for everyone.

Real-world implementations¶

Intercom Fin — support automation platform isolates customer data per workspace, applies per-workspace confidence thresholds and handoff policies, and prevents one customer's ticket spike from degrading others
ServiceNow Now Assist — enterprise platform serves thousands of organisations with different approval policies, data residency requirements, and model tier access — all on shared infrastructure
GitHub Copilot Business/Enterprise — organisation-level policies (IP indemnity, telemetry settings, allowed repositories) are tenant-context that affects model routing and data handling
Stripe Connect — platform accounts (sub-merchants) have independent fraud thresholds, payout schedules, and compliance requirements — all served by shared orchestration
Azure OpenAI Service — per-deployment rate limits, region constraints, and content filtering policies are tenant-level controls on shared model infrastructure
Salesforce Einstein — multi-tenant AI features enforce per-org data boundaries, model usage quotas, and compliance policies across shared compute
Vercel AI SDK — platform serves multiple projects with independent rate limits, model access, and billing enforcement on shared infrastructure
Datadog Workflow Automation — multi-org workflows enforce data isolation, per-org action quotas, and team-level approval routing

Recall checkpoint¶

Why is database partitioning insufficient for multi-tenant isolation?
What five dimensions of isolation does a tenant boundary enforce?
How does weighted fair queuing prevent noisy-neighbour starvation?
Why enforce budgets at the step level, not just at workflow admission?
How does data residency affect workflow routing and checkpointing?
What makes "average latency" a misleading metric in multi-tenant systems?
When does multi-tenant orchestration add overhead without value?

Interview Q&A¶

Q: Why is compute isolation (scheduling fairness) as important as data isolation? A: A tenant with perfect data isolation but no scheduling cap can still consume all worker slots, starving every other tenant. Noisy-neighbour starvation is a platform-wide availability problem, not just a fairness problem. Common wrong answer to avoid: "Because one tenant might be malicious." Malice is rare; accidental bursts (batch jobs, outages, seasonal spikes) cause most noisy-neighbour problems.

Q: Why track rate limits per-tenant rather than globally? A: Global rate limits create invisible coupling: tenant A's burst exhausts a rate limit that tenant B needs. Per-tenant tracking ensures one tenant's usage pattern doesn't affect another's capacity. Common wrong answer to avoid: "Because APIs charge per-tenant." Billing matters, but the operational reason is independence — one tenant's consumption shouldn't create another's failure.

Q: Why must data residency constraints be enforced at the control-plane level? A: Individual agents and tools can't reliably enforce geographic routing. The control plane selects model endpoints, checkpoint storage locations, and reviewer pools based on tenant policy — before any step executes. Leaving residency to individual nodes means every node must independently check compliance. Common wrong answer to avoid: "Because of GDPR fines." Fines are a consequence, but the engineering reason is that control-plane enforcement is simpler and more reliable than per-node enforcement.

Q: When does per-tenant budget enforcement become harmful? A: When budget limits are set too tight relative to workflow complexity, causing legitimate workflows to fail mid-execution. Budget should be calibrated to worst-case expected cost (including retries and replanning), not average-case cost. Common wrong answer to avoid: "When tenants complain about limits." Complaints may signal miscalibration, but they may also signal that tenants want unlimited resources — which isn't a valid engineering response.

Q: Why is "average latency" misleading in multi-tenant monitoring? A: It hides the distribution. If one tenant is getting 10ms latency (consuming all resources) and others get 5000ms (starved), the average might look acceptable (2500ms) while the platform is clearly broken for most users. Common wrong answer to avoid: "Because averages hide outliers." True generically, but the specific multi-tenant problem is that the average can look fine while one tenant dominates and others suffer.

Q: How should the platform handle a tenant that legitimately needs to exceed their concurrency cap? A: Provide burst mechanisms with backfill (temporarily exceed cap, then rate-limit until fair share is restored), or offer tier upgrades. Never silently raise caps — that defeats isolation. Always log and alert when caps are hit. Common wrong answer to avoid: "Just increase their cap." Increasing without considering platform impact may starve others during coincident load.

Design/debug exercise (10 min)¶

Modeled: Institution X submits 5,000 loan applications in bulk. Platform has 100 worker slots. Per-tenant cap: 50 for enterprise tier. Result: X gets 50 concurrent slots, 4,950 queue internally. Other institutions unaffected. X's workflows complete over ~50 minutes (50 at a time × 45s each). Total capacity: maintained. SLA for other tenants: maintained.

Your turn: A GDPR-bound European tenant submits a workflow that requires a compliance check. The platform's compliance model is hosted in us-east-1 (fastest) and eu-west-1 (slower, +30ms). Design: (1) the routing decision based on tenant_context, (2) the checkpoint storage location selection, (3) what happens if the EU model endpoint is down — can the system fall back to US? Why or why not?

From memory: Close this file and sketch: the five dimensions of tenant isolation, the tenant_context structure (10 fields), the scheduling algorithm (5 steps), and the budget enforcement points (4 levels).

Operational memory¶

Multi-tenant orchestration transforms workflow design from a single-user problem to a platform problem. Every resource the control plane manages — compute slots, model capacity, API rate limits, reviewer queues, budget tracking, checkpoint storage — must be partitioned by tenant. Without this, any single tenant's behaviour (intentional or accidental) can create platform-wide degradation. The five dimensions of isolation — state, compute, rate limits, queues, and budget — are all necessary; getting four right and missing one still leaves the system vulnerable.

The practical patterns: weighted fair queuing for scheduling (proportional share with caps), per-tenant rate tracking for external APIs, policy-driven routing for data residency, and step-level budget enforcement (not just admission-level). The expert signal: when tenant A's usage spike causes tenant B's latency to rise, isolation is incomplete somewhere in the stack.

Remember: - Tenant isolation = state + compute + rate limits + queues + budget (all five required) - Per-tenant concurrency caps prevent noisy-neighbour starvation - Budget enforcement at step level catches replan loops and retries that exceed admission estimates - Data residency is a control-plane routing decision, not a per-node responsibility - "Average latency" hides unfair distribution — monitor per-tenant percentiles - tenant_context travels with every workflow; control-plane infrastructure reads it at every decision point - Noisy-neighbour problems are usually accidental (batch jobs, migrations, spikes), not malicious

Bridge. We now have isolation, fairness, and policy. The control plane is a platform. But how do we know it actually works? Testing orchestration is harder than testing a single model output — we need to verify plans, branches, checkpoints, multi-tenant fairness, and failure recovery all together. That's the testing challenge. → 12-testing-orchestration.md