05. Storage in Kubernetes — Data should outlive the pod¶
⏱️ Estimated time: 23 min | Level: intermediate
ELI5 callback: Think of a busy shipping port. The dock manager must place every container on the right ship. Heavy ML work needs a cargo crane, and port security keeps lanes and permissions clean.
Pods are temporary, so storage must be explicit¶
A pod can be rescheduled anytime, so local writable state is fragile. EmptyDir and container filesystems help scratch work, not durable records. Keep the analogy close. The dock manager reads the manifest, the container carries one workload unit, the ship offers capacity, the cargo crane handles ML-heavy lifts, and port security blocks unsafe access. Simple, no? Persistent data needs an explicit storage contract outside pod replacement. See. Start by deciding what must survive a restart. Now watch.
state choices
┌────────────┐ ┌────────────┐ ┌────────────┐
│ pod fs │ -> │ scratch │ -> │ persistent │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
v v v
ephemeral | cache space | real data
Not all writes deserve durability.
PersistentVolumes and claims split supply from demand¶
A PersistentVolume describes available storage capacity and capabilities. A PersistentVolumeClaim expresses what the workload wants to consume. That split lets app teams request storage without hand-wiring devices. StorageClass adds the policy layer for provisioning behavior. See. Separate demand from supply so platforms can evolve safely. Now watch.
pv claim flow
┌────────────┐ ┌────────────┐ ┌────────────┐
│ claim │ -> │ class │ -> │ volume │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
v v v
app asks | policy picks | storage binds
Claims talk intent, volumes provide reality.
CSI drivers automate provisioning and attachment¶
CSI is the standard interface between Kubernetes and storage systems. It lets vendors build one integration model instead of custom hacks. Dynamic provisioning means volumes appear when claims are created. That removes ticket-driven storage work for common paths. See. Provisioning should feel boring when the control plane is healthy. Now watch.
csi path
┌────────────┐ ┌────────────┐ ┌────────────┐
│ claim │ -> │ csi │ -> │ disk │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
v v v
request | provision | attach mount
Storage arrives through a controller workflow.
Stateful workloads need identity and careful rollout¶
Databases, brokers, and some ML stores need stable identity per replica. StatefulSets provide ordered identities and volume attachment patterns for that. Rolling stateful systems needs more care than replacing stateless web pods. Data layout and leader election often matter more than YAML shape. See. Stateful does not mean impossible; it means less careless. Now watch.
stateful set
┌────────────┐ ┌────────────┐ ┌────────────┐
│ ordinal │ -> │ volume │ -> │ restart │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
v v v
stable name | same data | ordered move
Identity stays attached to storage.
Backups, recovery, and cost need equal attention¶
Persistent storage is only useful when you can recover data safely. Snapshots, backups, and restore drills prove whether durability is real. Fast storage tiers are attractive until the monthly bill appears. Retention policy is part of storage design, not a legal afterthought. See. Recovery time and recovery point should guide storage choices. Now watch.
recovery view
┌────────────┐ ┌────────────┐ ┌────────────┐
│ write │ -> │ backup │ -> │ restore │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
v v v
live data | snapshot copy | usable state
A backup is theory until restore works.
Where this lives in the wild¶
- StatefulSets power self-hosted brokers and search clusters when teams accept the operational load.
- ML platforms store checkpoints on persistent volumes during long training jobs.
- SaaS teams use dynamic provisioning so app teams request storage without tickets.
- Fintech systems pair Kubernetes volumes with strict backup and restore drills.
Pause and recall¶
- Why is pod-local storage unsafe for durable application data?
- How do PVs, PVCs, and StorageClasses divide responsibilities?
- What extra work does CSI remove for platform teams?
- Why do stateful systems need different rollout thinking than stateless APIs?
Interview Q&A¶
Q: Why do PVCs help app teams? A: They let workloads request storage intent without embedding backend details everywhere. That keeps application manifests portable while platform policy evolves underneath. Common wrong answer to avoid: “Because PVs are deprecated.”
Q: Why can dynamic provisioning still feel slow? A: The controller must create, attach, and mount real storage before the pod becomes ready. That workflow can take noticeable time, especially on fresh nodes or busy backends. Common wrong answer to avoid: “Because Kubernetes storage is always slower than local disks.”
Q: Why are backups not enough without restore drills? A: A backup file proves only that a copy exists somewhere. Recovery matters only when teams can restore usable state within expected time. Common wrong answer to avoid: “Because auditors demand extra paperwork.”
Q: Why are managed databases often still attractive? A: They offload replication, backup, patching, and failure management work. Running stateful systems on Kubernetes is possible, but not automatically cheaper or easier. Common wrong answer to avoid: “Because Kubernetes cannot run databases at all.”
Apply now (5 min)¶
List one workload you know and separate cache data from durable data. Choose a claim size, access mode, and storage class for it. Now write one sentence for its backup frequency and restore target. Decide whether StatefulSet or managed service feels saner. Finally, note one hidden cost in that storage decision.
Bridge. Storage attached. But how do containers talk securely? → 06 → 06-service-mesh-network-policy.md