Skip to content

07. Secrets and config management — Vault, parameter stores, and rotation

⏱️ Estimated time: 17 min | Level: intermediate

ELI5 callback: In the dragon farm, the barn runs the work, the feeding trough holds the data, the fence limits access, the breeding ground scales the herd, and the ledger stops waste. Today we separate sensitive secrets from plain configuration before trouble starts.

1) See the shape clearly

Vault, parameter stores, and environment configuration all matter here. They do not optimise the same pressure. See. Start with workload shape, not vendor branding. Check startup time, runtime length, and host control. Check who patches the base layer. Check whether scale is steady or bursty. Check whether warm state must survive. Simple, no? Secrets are sensitive values like API keys, passwords, and tokens. Configuration is broader: flags, endpoints, regions, and feature settings. Secret stores provide access control, audit, and rotation support. The trap is treating everything like an environment variable forever. So what to do? Write the fit matrix before provisioning anything. - Prioritise the slowest or costliest path. - Measure idle time honestly. - Record operational ownership. - Record rollback method. - Record debugging path. - Record compliance limits. Good teams choose boring defaults first. Fancy choices can wait.

2) Read the decision signals

Use a secret manager for credentials, certificates, and signing material. Use parameter stores for non-secret configuration that still needs central control. Inject secrets at runtime instead of baking them into images. Rotate secrets with automation because humans forget. Scope secret access by workload identity, not by shared team account. Separate local development convenience from production discipline. Now use thresholds, not feelings. If latency is sacred, keep readiness. If cost is sacred, chase utilisation carefully. If control is sacred, reduce abstraction. If delivery speed is sacred, buy managed pieces. Quick decision prompts: - Which values are secrets, and which are only config? - Who or what needs each secret at runtime? - How often should each secret rotate? - What is the fallback if rotation fails? - Where will audit logs live? - How will developers test safely? See. One clear 'no' can eliminate a whole option. Trade-offs are normal. Document the fallback path. Now watch.

3) Map the working path

The clean pattern is simple. Workload identity asks the manager for only what it needs. The application receives values just in time. Now watch the path. ┌────────────┐ ┌────────────┐ ┌────────────┐ │ App │──→│ Identity │──→│ SecretMgr │ └────────────┘ └─────┬──────┘ └─────┬──────┘ │ │ ▼ ▼ ┌────────────┐ ┌────────────┐ │ Config │ │ Audit │ └────────────┘ └────────────┘ Identity comes first, because anonymous secret reads are a bad joke. Secret managers should return narrow values, not giant bags. Configuration can often be cached more aggressively than secrets. Audit logs should show who read or changed critical items. Rotation is safer when clients can reload without restart panic. Good teams classify values before writing any integration code. At every arrow, ask who retries. At every box, ask who pays. At every store, ask what expires. Now watch. One metric should sit beside each box. That is how operations stays sane.

4) Notice the common traps

Committing secrets into git, images, or notebook outputs. Storing secrets in plain environment files on shared machines. Using one global secret for many services. Skipping rotation because the first deployment worked. Letting applications fetch every secret in the namespace. Mixing feature flags and credentials without labels. See. Most outages start as silent assumptions. Review these traps before launch: - Leaked long-lived keys can survive for months. - Wide secret scopes can turn one bug into a broader incident. - Missing audit logs slow down investigations. - Rotation without reload logic can create outages. - Notebook outputs can expose secrets accidentally. - Manual copy-paste processes invite drift and mistakes. Simple, no? Write failure drills for the top three risks. Decide what degrades first. Decide what must never degrade. Review quotas before launch day. Prefer explicit limits over wishful thinking. Now watch.

5) Lock the operating routine

Make a list of secrets, configs, and public constants separately. Store secrets in a managed vault or cloud secret service. Deliver values at runtime using workload identity. Rotate high-risk secrets automatically and test the reload path. Log reads, writes, and policy changes. Teach developers how to use safe local substitutes. Lock the language across the team. Use the same terms in code, dashboards, and reviews. Review this quick operating list: - Prefer short-lived credentials. - Name secrets by service and purpose. - Keep config keys documented. - Test rotation in staging. - Remove unused secrets fast. - Scan repos for accidental exposure. Good platform design keeps the barn, feeding trough, fence, breeding ground, and ledger aligned. So what to do? Create a one-page runbook. Create a one-page cost note. Create a one-page rollback note. Teach the team the same words. That alignment saves real money. See. Consistency beats cleverness. Benchmark first; opinions come second. Name the owner of every limit. Prefer reversible choices whenever the future is foggy. Document what changes during incidents. Keep one small default path for newcomers. Automate the boring thing as soon as it stabilises. Vendor docs help, but workload data matters more. Good naming prevents bad tickets. Observe p95, not only averages. Small runbooks beat heroic memory. Teach cost with the same seriousness as latency. Now watch how much confusion disappears.

Where this lives in the wild

  • HashiCorp Vault with dynamic credentials. Strong example when teams want central policy, leasing, and rotation.
  • AWS Secrets Manager and SSM Parameter Store. Common split for sensitive values versus general runtime config.
  • Google Secret Manager plus service accounts. Useful for clean workload-based access in GCP environments.
  • Azure Key Vault with managed identities. Common enterprise pattern for secret retrieval and certificate handling.
  • Kubernetes External Secrets or CSI secret drivers. Shows how cluster workloads can consume managed secrets safely.

Pause and recall

  1. What is the difference between a secret and ordinary config? Say it without looking up vendor names.
  2. Why should secrets be injected at runtime? Give one concrete example.
  3. What makes rotation safe instead of scary? State the trade-off in one line.
  4. Why is workload identity better than shared credentials? Mention one failure mode too.

Interview Q&A

Q. How should applications receive secrets? A. Applications should use workload identity to fetch only required secrets at runtime. Common wrong answer to avoid: Put them in environment variables during image build. Better direction: Explain why build-time injection leaks and ages badly.

Q. Why rotate secrets automatically? A. Because risk grows with time, and humans are unreliable clocks. Common wrong answer to avoid: Rotation is only needed after a known leak. Better direction: Tie rotation to reduced blast radius and audit confidence.

Q. What belongs in a parameter store versus a secret manager? A. Non-sensitive runtime settings fit parameter stores; sensitive credentials fit secret managers. Common wrong answer to avoid: Anything text-based can go in either. Better direction: Mention access control, audit, and sensitivity.

Q. What should local development do? A. Use safe substitutes, short-lived developer credentials, and clear separation from production material. Common wrong answer to avoid: Copy production secrets into a local file once. Better direction: Show discipline without blocking productivity.

Apply now (5 min)

  1. Write five values your app uses at startup.
  2. Mark each one as secret, config, or public constant.
  3. Choose where each value should live.
  4. Choose how the app receives each value.
  5. Pick one secret to rotate monthly.
  6. Write how the app reloads it safely.
  7. Write one audit log you need.
  8. Delete one unsafe habit from your current flow.

Bridge. Secrets safe. But dragons are expensive. How to control cost? → 08