06. Keys and credentials¶
Quotas decide how often the gateway calls a provider. Credentials decide who the gateway is when it calls. A leaked provider key is a company-wide bill and, on some providers, a security event. A misused gateway credential is a per-product incident. The discipline is keeping the wide-blast credential narrow to one consumer — the gateway — and the gateway's own credentials narrow to one consumer per product.
A platform engineer at a Bengaluru SaaS company finds an Anthropic API key on a public PyPI package's GitHub mirror. The package belongs to a sister team that builds internal tooling. The key was checked into a tests/ directory two months ago and never noticed. The provider's spend on that key over those two months was within normal bounds — the leak was not exploited — but the incident is treated as a near-miss. The fix is not a stronger linter; it is the discipline that the provider key never existed outside the gateway's vault in the first place. The sister team's tooling should never have held it.
This chapter is the discipline. It is short by design — the rules are tight; following them is the work.
The single rule¶
Provider keys live in the gateway's vault and nowhere else. Every caller — every product, every script, every batch job — holds a gateway-issued credential, narrow to its purpose, rotated independently.
Every other rule in this chapter is a consequence.
Two key planes¶
The gateway operates two distinct credential planes.
Provider keys — credentials issued by Anthropic, OpenAI, AWS Bedrock, etc., that authorise calls to their APIs. These are wide-blast: a single key can usually call any model, any region, any feature, on the account it belongs to. Some providers offer narrower scopes (Bedrock with IAM policies; OpenAI projects); most do not.
Gateway credentials — credentials the gateway issues to its callers. These are narrow: scoped to a set of aliases, a tenant, an environment. Issuance and rotation are owned by the gateway team.
The boundary is the gateway. Provider keys never leave it; gateway credentials never enter the provider.
What goes wrong without this¶
Five failure modes the rule prevents.
| Failure | Cause | Cost |
|---|---|---|
| Provider key in source control | Product teams hold the key | Public-repo accidents, history-leak after deletion |
| Provider key in a binary | Embedded for "convenience" | Reverse-engineerable, especially in mobile/desktop |
Provider key shared across staging and production |
Single secret in deploy pipeline | A staging environment leak is a production incident |
| Rotation is "every product redeploys" | No central key holder | Practically never rotates |
| Tenant-level cost attribution is impossible | Provider sees one key | The chapter-0 problem persists |
A single gateway holding the provider key, with rotation orchestrated by the gateway team, prevents all five.
Issuing gateway credentials¶
A gateway credential is created when a caller (a product, a service, a batch job, an internal tool) is registered with the gateway. The credential carries:
- Caller identity —
service.support-agent.production - Allowed aliases —
[fast-summariser, smart-reasoner, embeddings-v3] - Allowed tenants —
[acme-corp, globex-eu]or*for platform-internal - Allowed environment —
production(staging credentials are separate) - Allowed workload classes —
[interactive, batch] - Expiry — when the credential rotates (e.g., 90 days)
- Issued-by — who approved this credential
A request bearing this credential is checked against all dimensions. A support-agent credential trying to call code-assistant is refused; a credential scoped to acme-corp cannot be used for globex-eu even if the requester says so; a staging credential cannot issue calls against production aliases.
The credential is opaque to the caller — typically a signed JWT or a service-platform-issued token. Validation is cryptographic, not lookup, so the gateway can scale without a per-call database hit.
Rotation¶
Provider keys rotate on a schedule plus on signal. The schedule is per-provider policy — common practice is 90 days for production keys, 30 days for any key suspected of exposure. The signal is any audit anomaly, departure of someone who had read access, or a security review finding.
Rotation procedure for a provider key:
- Generate the new key on the provider's console (or API).
- Add it to the gateway's vault as a secondary key, alongside the existing primary.
- Switch the gateway to use the new key for new calls; existing in-flight calls finish on the old key.
- Verify call success against the new key for some period (15–60 minutes).
- Decommission the old key in the provider's console.
- Audit log the rotation event.
Steps 3 and 4 are why dual-key support is required. A rotation that takes the old key down before the new one is verified is a rotation that produces a multi-minute outage.
Rotation procedure for a gateway credential:
- Issue a new credential with the same scope as the old one, with a fresh expiry.
- Caller (the product) updates to the new credential (via deploy or via runtime configuration).
- After a verification window, the old credential is revoked.
Gateway credentials should rotate more often than provider keys (e.g., 30–60 days), because they are held by more parties and the blast on leak is narrower but still real.
Scope-tightening over time¶
Initial gateway credentials are often broader than they need to be — "this product, all aliases, all tenants" — because the team is moving fast. A quarterly tightening review reduces blast over time.
For each credential:
- Pull audit data: which aliases did it actually call in the last 90 days?
- Which tenants did it act on behalf of?
- Issue a new credential with scope narrowed to the observed usage plus a small margin.
- Rotate the credential.
This is the credential analogue of the "god-key audit" from module 19 chapter 06. Same discipline, different layer.
Secrets storage¶
Provider keys live in a proper secrets manager — HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Kubernetes Secrets with sealed-secrets, depending on the platform. The gateway is the only consumer.
A few practical rules:
- The secrets manager's access policy lists exactly the gateway service identity. No human-readable access in production.
- Reads are audited at the secrets manager level. Anomalies (a read from a non-gateway identity) are pages.
- Backups of the secrets manager are encrypted and access-controlled separately.
- Local development uses different credentials — separate provider accounts when budget allows; otherwise scoped sub-credentials with hard spending caps.
What the audit captures¶
For every call, the audit (chapter 11) records which gateway credential authorised it and which provider key the gateway used. The two together let security investigations answer "did this leaked credential cause that downstream effect" with precision.
Provider key usage is not logged with the key itself in clear text; an opaque identifier or hash is sufficient.
How credentials interact with other surfaces¶
- Routing (chapter 03) — the caller's credential constrains which aliases are reachable, narrower than the global alias list.
- Quota (chapter 05) — buckets are keyed by caller identity from the credential.
- Cost (chapter 07) — attribution flows from credential → caller → tenant.
- Privacy (chapter 10) — a credential's allowed tenants implicitly carries each tenant's privacy zone.
- Observability (chapter 11) — every call's audit lists the credential used.
How to recognise broken credential management in the wild¶
- Provider keys exist outside the gateway team's control
- Rotation is "we'll do it next quarter" with no enforced cadence
- Staging and production share a provider key
- Linters/scanners catch credentials in code regularly (the discipline is reactive, not preventive)
- Gateway credentials are not scoped per caller — one credential is shared across multiple products
- The secrets manager has many readers, including humans
Interview Q&A¶
Q1. Why never share a provider key between staging and production? Because a staging environment is the easiest leak vector — engineers have read access for debugging, code paths log credentials accidentally, third-party tools probe staging endpoints. A shared key means a staging leak is a production incident. The cost of two separate provider accounts (or two separate keys on one account) is small; the cost of a production-key leak is large. Wrong-answer notes: "we are careful in staging" is not a substitute for separation.
Q2. The gateway needs to call Anthropic and OpenAI. How many provider keys does it hold, and how are they protected? At minimum two — one per provider. Often more — one per region or per account tier. All live in the secrets manager with the gateway service identity as the sole reader. Rotation cadence is per-provider policy; dual-key support during rotation prevents outage windows. The audit log records which key authorised each call (by opaque identifier), so investigations can correlate. Wrong-answer notes: "one key for both" is impossible; "we put them in env vars" is the leak vector.
Q3. A product team asks for "the API key" to debug a latency issue. What do you say? You give them: a temporary, audit-marked gateway credential scoped to their feature, with logging at the gateway side showing each call they make. You do not give them the provider key. The latency issue investigation does not require holding the provider's credentials; it requires call-level visibility, which the gateway provides. The discipline is non-negotiable; the exception "just this once" is how the chapter's opening leak happens. Wrong-answer notes: capitulating to "we need it for debugging" trains the discipline to fail.
Q4. Walk through what happens if a gateway credential leaks. First, the leak's blast is small: the credential is scoped to one caller, possibly one tenant, possibly one set of aliases. The leak does not give the attacker the provider's key. The platform revokes the leaked credential immediately; the credential's last-seen audit shows what was done. A new credential is issued to the legitimate caller with a different identifier. The cost is bounded by the credential's scope, the rate it was used at, and the audit window during which fraud could have occurred. Compare this to a leaked provider key, where the attacker can call any model on any tenant the gateway serves. Wrong-answer notes: "we'd handle it" without describing the bounded blast misses the point of the chapter.
What to do differently after reading this¶
- Confirm provider keys exist only in the gateway's secrets manager. Scan the rest of the company for any other copies.
- Build the gateway-credential issuance flow if it does not exist. Every caller registers; every call carries a scoped credential.
- Implement dual-key rotation for provider keys. Test it once before you need it.
- Schedule a quarterly credential tightening review.
- Log credential usage in the audit; alarm on anomalies (credentials calling unexpected aliases, tenants, or environments).
Bridge. Credentials authorise the call. Cost determines whether it should be made. The next chapter builds cost attribution and budgets — how the gateway turns provider invoices into per-tenant, per-feature, per-agent visibility, and how budgets enforce limits before the bill arrives. → 07-cost-attribution-and-budgets.md