10. Multi-region and privacy¶

Drift is provider behaviour changing. Region and privacy are your requirements landing on the gateway — data residency, jurisdictional rules, on-prem constraints, and latency-driven regional routing. This chapter is the discipline that turns those requirements from promises in a sales deck into enforced routing rules with an audit posture that survives a regulator's question.

A platform tech lead at a Bengaluru healthcare-tech company gets a regulator's letter: the company has claimed its agent platform processes patient data in India only. Prove it for the last twelve months. The team queries the gateway's audit log: tenant in healthcare_in AND region NOT IN [ap-south-1, ap-south-2]. The query returns zero rows. The team writes back with the query and the result, the audit log's retention policy, and the routing-policy configuration that enforced the constraint per call. The regulator is satisfied in under a week.

Three years earlier, the same company had no gateway. A similar question would have produced a quarter-long investigation across product teams, partial coverage, and a hedged answer. The difference is what this chapter builds: regional routing as a first-class policy, privacy zones enforced at the gateway, on-prem fallbacks where required, and an audit posture that proves the platform honoured its claims.

Why multi-region exists¶

Three reasons drive regional design.

Latency. Provider endpoints in your callers' region serve faster. A round trip from Bengaluru to us-east-1 is ~250 ms baseline; to ap-south-1 is under 50 ms. For interactive workloads, this matters.

Data residency. Regulators in many jurisdictions (RBI in India, GDPR in EU, sectoral rules elsewhere) require data processing to remain in or out of named regions. The gateway is the place to enforce this once for every product.

Availability. A region-down event on the primary provider should not be a platform outage. Regional failover lives inside the fallback chain (chapter 04); regional routing is what makes it work.

These three sometimes conflict — the lowest-latency region may not be in the jurisdiction the tenant requires; the most-available cross-region failover may breach residency. The privacy zone arbitrates.

Privacy zones, properly enforced¶

Chapter 03 introduced the privacy zone as a routing-key dimension. This chapter operationalises it.

The zone is a property of the tenant or the feature, not the request. The caller does not "ask for" a zone; the gateway imposes the zone based on configured policy.

tenants:
  healthcare-in-1:
    privacy_zone: in-region-strict
    allowed_regions: [ap-south-1, ap-south-2]
    forbidden_providers: []
  globex-eu:
    privacy_zone: eu-strict
    allowed_regions: [eu-west-1, eu-central-1]
    forbidden_providers: []
  contoso-onprem:
    privacy_zone: on-prem-only
    allowed_regions: []
    allowed_providers: [internal-vllm-cluster]
  acme-corp:
    privacy_zone: any-cloud
    allowed_regions: ["*"]

Three policy variants emerge in practice:

Zone	Meaning
any	Any region from any provider; default for non-regulated tenants
regional-soft	Prefer the named region; allow cross-region failover only with explicit per-call consent
regional-strict	Only the named region(s); failure outside the zone is a refusal, not a fallback
on-prem-only	No cloud provider; only allowlisted on-prem deployments

The strict variants are the ones that need the most discipline. The whole point of the discipline is that no path — routing, fallback, cache, audit retention — leaves the zone without an alarm.

What the routing plane does¶

The route resolver (chapter 03's procedure, extended):

1. Look up tenant's privacy zone and allowed regions/providers.
2. Filter candidates: drop any (provider, region) outside the zone.
3. Continue normal routing (capability, cost, latency, weight).
4. If no candidates remain: refuse with NO_ROUTE_IN_ZONE.
5. The fallback chain is also filtered: every step must be in-zone.

Two operational rules:

The filter is the first step after candidate enumeration, so any disallowed candidate never even appears in scoring.
The filter applies to every step of the chain. A primary in-zone candidate with a "fail over to a closer-but-out-of-zone region" pattern is not allowed; the failover step must also be in-zone.

The discipline is not "we usually stay in-zone." It is "we cannot leave the zone."

Regional deployment of the gateway¶

The gateway itself is a service. It has a physical location. For latency-sensitive interactive workloads, the gateway should be in the caller's region — otherwise the call from a caller in India to a gateway in the US to a provider in India is the worst of both worlds.

A reasonable topology:

- Region A (e.g., ap-south-1)
    - Gateway instance(s)
    - Caches (regional)
    - Audit ingestion (regional with cross-region replication)
- Region B (e.g., eu-west-1)
    - Gateway instance(s)
    - Caches (regional)
    - Audit ingestion
- Region C (e.g., us-east-1)
    - Gateway instance(s)
    - ...

- Global plane:
    - Routing policy (read-mostly; replicated)
    - Quota plane (region-local with global aggregation)
    - Credential plane (region-local)
    - Cost dashboards (global aggregation of regional data)

Callers go to the regional gateway. The regional gateway routes within the tenant's privacy zone. The audit log is region-local for residency but is also aggregated globally for cross-region queries (with the aggregation respecting zone boundaries).

For tenants in regional-strict zones, the regional gateway is also the only one that has access to that tenant's data; no cross-region replication occurs.

On-prem and isolated networks¶

Some tenants will not allow any cloud-provider involvement. The gateway then routes to local model deployments (typically vLLM, TGI, or Ollama clusters).

A few notes:

The unified request shape applies. Local providers are just another candidate in the routing policy. The gateway issues calls in the same shape, with transform layers per local engine.
Capacity is the tenant's responsibility. Quota is enforced against the local cluster's own throughput, not a public provider's limit.
Eval and drift handling are different. Local models do not retire on a provider's schedule; they may be replaced when the tenant upgrades. The gateway's promotion process is the same; the schedule is different.
Audit retention may be tenant-owned. Some on-prem tenants insist their audit logs do not leave their premises; the gateway emits to a tenant-owned sink with no platform-side copy.

The discipline is the same. The implementation varies.

Latency-routing within a zone¶

Within a zone of more than one region, latency-routing is appropriate.

A simple policy: pick the closest in-zone region for the caller's location.

tenants:
  globex-eu:
    privacy_zone: eu-strict
    allowed_regions: [eu-west-1, eu-central-1]
    region_preference:
      caller_region_to_provider_region:
        eu-west-1: [eu-west-1, eu-central-1]
        eu-central-1: [eu-central-1, eu-west-1]

When the caller is in eu-west-1, the gateway prefers eu-west-1 provider region, with eu-central-1 as failover. When the caller is in eu-central-1, the order reverses.

For tenants where every region in the zone is acceptable, the latency is small and the gateway can simply use weight-based candidate selection without per-caller-region routing.

What the audit log carries for privacy¶

The per-call audit record includes:

tenant_id
privacy_zone (the zone in effect at call time)
provider, model_version, region actually used
caller_region (where the call originated)
A zone_check field affirming the call was in-zone

The zone_check is redundant with cross-referencing tenant_id and region, but explicit fields make audit queries simpler and harder to get wrong. Regulators and internal compliance reviews query on these fields directly.

A standing query for any regional-strict tenant: tenant_id = X AND region NOT IN allowed_regions(X). Expected result: zero rows. If non-zero ever, an alarm has fired and the on-call has investigated.

Cache and privacy¶

Caches are storage. A cache in region A holding a response to a tenant whose zone is region B is a breach.

Two rules:

Caches are region-local. A regional gateway has a regional cache. Calls from a tenant restricted to that region cannot hit a cross-region cache.
Cache keys partition by tenant_id. Even within a region, a tenant's cached responses cannot serve another tenant.

For semantic caches (chapter 08), the partition is even stricter: per-tenant, per-user where applicable.

How privacy interacts with the other surfaces¶

Routing (chapter 03) — privacy zone filters candidates at the first step.
Fallback (chapter 04) — every step of the chain must be in-zone.
Quota (chapter 05) — buckets are per-tenant and may be per-region inside the zone.
Credentials (chapter 06) — gateway credentials can carry zone scope, refusing to issue calls outside.
Cost (chapter 07) — attribution by region is a routine dashboard.
Cache (chapter 08) — caches are region-local and tenant-partitioned.
Audit (chapter 11) — every record carries region and zone fields.

How to recognise broken regional/privacy in the wild¶

Privacy claims exist in product collateral with no enforcement at the gateway
Routing has no region filter — candidates are picked globally
The audit log lacks region or caller_region
The cache is global and not region-partitioned
A cross-region failover policy exists without per-tenant zone gating
Compliance queries are ad-hoc rather than standing dashboards

Interview Q&A¶

Q1. A tenant claims eu-only residency. The gateway's primary provider has an EU outage. The next candidate is in the US. What does the gateway do? Refuse with NO_ROUTE_IN_ZONE. The privacy zone is non-negotiable. The product's response to its end-user is "service unavailable; we are working on it." The platform team works the incident — either bringing the EU region back, or expanding the in-zone candidate list to include another EU provider, or operating on a degraded basis until recovery. What the gateway must not do is "make an exception" and route US. That exception, once made, undermines every claim made to that tenant. Wrong-answer notes: "fall back to US because the user wants the answer" is the breach.

Q2. How does the gateway prove to a regulator that no call from tenant X processed data outside the EU last quarter? A query on the audit log: tenant_id = X AND ts in [Q] AND region NOT IN [eu-*]. The expected result is zero rows. The answer is the query, the result, the routing-policy configuration that enforced the zone, the audit retention policy, and (if available) the gateway's own integrity-check logs showing the audit was not tampered with. The proof takes minutes to produce. Wrong-answer notes: "we'd ask each product team" loses the precision and the answerability.

Q3. The gateway is deployed in one region globally. Callers in three regions all hit the same gateway instance. What is the problem? Latency for callers far from the gateway. Single-region failure mode (the gateway-down is a global outage). And for regional-strict tenants, the call physically transits a non-allowed region in flight, which may or may not be a residency breach depending on the regulator's interpretation. The fix is regional gateway deployment, with the routing policy and audit replicated as needed while respecting zone boundaries on data. Wrong-answer notes: "it's fine if the gateway is just metadata" — call payloads transit the gateway and metadata is not the whole concern.

Q4. How does on-prem differ from cloud regions, operationally? The mechanics in the gateway are similar — local provider candidates in the routing policy, transform layers for the local engine, quota enforced on the local cluster's capacity. The differences are organisational: the tenant owns capacity planning, the tenant may own the audit sink, the provider release cadence is the tenant's not a vendor's. The gateway becomes a thinner layer for on-prem — most of the platform value is in routing and audit; the provider stability and rate limits are the tenant's to manage. Wrong-answer notes: "the same as cloud" misses the ownership boundary.

What to do differently after reading this¶

Define privacy zones explicitly per tenant. The default is opt-in to a permissive zone; regulated tenants opt in to strict.
Filter routing candidates by zone before any other policy. Reject the call if the zone empties the candidate set.
Deploy the gateway regionally; the regional gateway is the only one that serves the tenants pinned to that region.
Region-partition the cache. Tenant-partition within region.
Surface region and caller_region on every audit record. Maintain standing queries for compliance verification.
Practice answering a regulator's question. Time it. The right answer is minutes; if it is hours, the audit shape is wrong.

Bridge. Privacy is one operational dimension. Visibility is another. The next chapter builds the gateway's observability — per-provider dashboards, error taxonomies, latency baselines, and the alarm panel the on-call reads first. → 11-observability-per-provider.md