09. Consistent hashing — spread keys across nodes without moving everything¶
⏱️ Estimated time: 16 min | Level: advanced
ELI5 callback: The town crier pins one notice on the bulletin board. The board rules decide delivery, and the town directory helps readers find it.
Why plain modulo hashing hurts during cluster changes¶
Plain modulo hashing looks fine until the node count changes. Add one node, and huge amounts of data move. See. That means cache misses, long rebalances, and angry operators. Consistent hashing reduces movement when membership changes. It maps both nodes and keys onto a ring. Then each key walks clockwise to its owner. Diagram:
keys -> hash -> ring position
node count change -> modulo remaps many keys
consistent ring -> only nearby ranges move
+-----+-----+-----+-----+
| n1 | n2 | n3 | n4 |
+-----+-----+-----+-----+
- Hash customer:42 to a point on the ring.
- Walk clockwise until a node token appears.
- That node owns the key.
- Add one new node and only adjacent ranges shift.
- The rest of the cluster stays mostly stable.
- Stability matters as much as even spread.
- Rebalance cost should be bounded and predictable.
- Cache systems benefit from fewer key migrations.
- Storage systems benefit from calmer recovery behavior. Less movement means fewer self-inflicted outages.
The ring becomes a routing map for every key¶
Think of the ring as a sorted circle of positions. Each server owns the arc until the next server. A routing town directory maps every key to the next clockwise node. Lookup stays cheap when token lists are sorted. This works for caches, partitions, and storage shards. Now watch. Only nearby ownership changes when a node joins or leaves. Diagram:
+------------------------------+
| 10 -> n1 40 -> n2 80 n3 |
| key 55 walks to n3 |
| key 12 walks to n2 |
| key 95 wraps to n1 |
+------------------------------+
- Store node tokens in sorted order.
- Hash the incoming key once.
- Binary-search the first token greater than the key.
- Wrap to the first token if needed.
- Return that owner and its replicas.
- Deterministic lookup keeps clients consistent.
- Sorted token lists make routing fast.
- Wrap-around behavior must be tested carefully.
- Ring epochs prevent stale maps from lingering silently. A simple lookup rule saves huge operational pain later.
Node churn is where the ring earns its salary¶
Node churn is where consistent hashing earns its salary. A town crier announces joins, leaves, and failures. That update lands on the bulletin board for every router. Routers then refresh the ring view using the new epoch. Only adjacent ranges move to new owners. This keeps rebalance bounded and predictable. Simple, no? Diagram:
join n4 -> take slice from clockwise neighbor
leave n2 -> clockwise neighbor absorbs that slice
fail n3 -> replicas serve until repair finishes
routers compare epoch before accepting ownership map
stale routers reject old maps loudly
bounded movement keeps the cluster usable
- Add node n4 with its tokens.
- Compute only the ranges n4 should now own.
- Stream those ranges from current owners.
- Update routers after the new map is durable.
- Remove old owners only after handoff completes.
- Membership changes need epochs or versions.
- Rebalance should be incremental, not chaotic.
- Reads during movement need a clear ownership rule.
- Automation should still expose what changed. Calm rebalancing is the real product feature here.
Virtual nodes and replicas smooth out unfair load¶
Real traffic is uneven, so one token per node is rarely enough. Virtual nodes spread ownership more evenly across the ring. Replica count and clockwise placement become board rules. More virtual nodes usually improve balance. Too many virtual nodes can slow management operations. Replication also needs failure domains, not just raw counts. A good design balances fairness with operator sanity. Diagram:
n1 -> tokens 5, 35, 65
n2 -> tokens 15, 45, 75
n3 -> tokens 25, 55, 85
key 52 -> n3 primary -> n1 replica -> n2 replica
spread improves when tokens are interleaved
racks must differ when choosing replicas
- Give each node many small token ranges.
- Measure ownership spread after placement.
- Replicate clockwise to the next distinct failure domains.
- Repair missing replicas after failures.
- Keep token count low enough to manage comfortably.
- Virtual nodes fight skew.
- Replication must respect racks or zones.
- Repair traffic can dominate if placement is sloppy.
- Balance metrics should be visible on every dashboard. Even spread is engineered, not wished into existence.
Hot keys, stale maps, and operations still matter a lot¶
Consistent hashing does not magically fix hot keys. One celebrity key can still burn one partition. Each rebalance notice should carry epoch and owner metadata. That way stale routers can reject outdated maps. You still need caches, batching, and skew monitoring. So what to do? Measure key popularity before blaming the ring. Diagram:
hot key -> one owner -> one hot CPU
skew alert -> inspect key distribution and traffic shape
epoch 17 map accepted, epoch 16 map rejected
batch small moves during peak hours carefully
keep tooling for drain, repair, and rollback ready
ring math is only half the operating story
- Track top keys by traffic and size.
- Compare real ownership against expected ownership.
- Alert when one range overheats repeatedly.
- Use mitigation like key splitting or local caching.
- Rehearse map rollback before the real incident.
- Good hashing cannot erase bad access patterns.
- Operational metadata must travel with ring changes.
- Traffic skew deserves first-class monitoring.
- Rollback plans save you when automation gets excited. The ring is clever, but operators still need plain truth.
Where this lives in the wild¶
- Amazon Dynamo and Dynamo-style key-value systems using ring placement.
- Apache Cassandra token rings distributing partitions across clusters.
- Riak deployments leaning on vnode-style ownership ranges.
- Memcached client libraries using ketama-style consistent hashing.
- Envoy ring-hash load balancing for stable request routing.
Pause and recall¶
- Why does adding one node break plain modulo hashing so badly?
- How does the clockwise lookup rule reduce remapping?
- Why are virtual nodes useful even on equal hardware?
- What extra metadata protects routers during membership changes?
Interview Q&A¶
Q: What problem does consistent hashing solve best? A: It minimizes key movement when nodes join or leave while keeping lookup deterministic. Common wrong answer to avoid: "It guarantees perfectly equal load under every traffic pattern."
Q: Why use virtual nodes? A: They smooth ownership imbalance and make heterogeneous clusters easier to tune. Common wrong answer to avoid: "They are only for academic elegance and change nothing practical."
Q: Does consistent hashing remove hot partitions? A: No, because skewed traffic or giant keys can still overload whichever node owns that range. Common wrong answer to avoid: "Yes, the ring automatically spreads every hot key across the cluster."
Q: Why attach epochs to ring maps? A: They let routers reject stale ownership views during joins, leaves, and recovery. Common wrong answer to avoid: "Routers can just trust whichever map they saw first today."
Apply now (5 min)¶
Draw a ring for four nodes with three virtual tokens each. Hash five sample keys onto that ring. Then add one new node and mark only the moved ranges. Choose a replica count and place the replicas clockwise. Write one metric that would reveal a hot partition early. Add one epoch field to the ring map design. Now watch the rebalance become measurable instead of magical.
Bridge. Consistent hashing distributes load. But when nodes disagree on who's leader, what then? → 10