10. Consistency and Replication — Which warehouse tells the truth¶

~13 min read. Same item, many copies, one user, one answer must win.

Built on the ELI5 in 00-eli5.md. The warehouse — the storage layer — matters more when many warehouses hold the same goods and one copy is newer.

1) When copies drift apart¶

See. One warehouse gets the new value first. Another warehouse is still old. The user reads from the old one and says, "But I just updated this." That is a consistency problem.

client write ──→ ┌──────────────┐
                 │ warehouse A  │ version 42
                 └──────┬───────┘
                        │ replicate later
                        ▼
                 ┌──────────────┐
                 │ warehouse B  │ version 41
                 └──────┬───────┘
                        │ replicate later
                        ▼
                 ┌──────────────┐
                 │ warehouse C  │ version 41
                 └──────────────┘

Replication exists for good reasons. You want lower read latency. You want backup copies. You want one machine loss to not wipe data. You want overflow lanes for storage traffic too.

But copies create a new question. Which copy is truth right now? If all copies were instant, life is easy. In real systems, networks delay. Disks pause. Regions partition. So agreement takes time.

Now what is the practical mistake? People say, "Just keep everything consistent." Good sentence. Expensive sentence. Sometimes impossible during a partition. Sometimes slower than the business allows. So you choose where to be strict.

Two user experiences matter most. First, read-your-write. After I update my profile photo, I should see it. Second, cross-user freshness. If I book the last seat, nobody else should also book it. The second is much harder.

2) Three common replication styles¶

Leader-follower¶

One node is the leader. All writes go there first. Followers copy the leader log later. Simple, no?

write ──→ leader ──→ follower 1
              │
              └────→ follower 2
read  ──→ leader or followers

This model makes one warehouse the cashier of truth. Order is easy because one writer serializes changes. Reads scale nicely on followers. Operational thinking is also simpler.

But the pain is obvious. If the leader dies, you need failover. If followers lag, users read stale data. If the leader region is far away, write latency increases.

Good fit: account settings, product catalog, content metadata, systems where one write path is acceptable.

Multi-leader¶

Now multiple leaders accept writes. Each region has a local writer. Then leaders replicate to each other.

region A leader ──┬──→ region B leader
                  └──→ region C leader
region B leader ──┬──→ region A leader
                  └──→ region C leader

This helps when users are globally distributed. A user in Mumbai writes locally. A user in Frankfurt writes locally. Latency improves. Regional isolation improves too.

But conflicts arrive. If two leaders update the same row differently, who wins? Last-write-wins is simple. Also dangerous. A later clock timestamp may erase the better business value. So multi-leader is for data that can merge safely, or for domains where temporary conflicts are acceptable.

Good fit: user preferences, shopping carts, draft documents, multi-region writable tables with clear conflict rules.

Leaderless¶

Here there is no single leader. The client or coordinator writes to multiple replicas directly. Reads also query multiple replicas. This is where quorum enters.

client
  ├──→ replica A
  ├──→ replica B
  └──→ replica C

This model keeps the system writable even if one replica is down. Very attractive for high availability. But now reconciliation is your daily job. Version vectors, timestamps, quorum math, read repair, anti-entropy, all become first-class.

Good fit: availability-first key-value workloads, shopping carts, session-like data, systems that tolerate brief divergence.

3) Consistency models you will actually discuss¶

Strong consistency means the latest successful write is immediately visible. You read after the write. You get the new value. No guessing. One truth.

That is the cleanest mental model. It is also the most coordination-heavy. So use it where wrong reads are expensive. Bank balance. Seat inventory. Payment state. Quota enforcement.

Eventual consistency means replicas may disagree briefly, but they converge if no new writes arrive. This sounds scary the first time. In practice, many product surfaces survive it well. Like counts. Feed ranking metadata. Recently viewed items. Playback progress with small delay tolerance.

Causal consistency sits in between. It says related events must appear in order. If comment B replies to comment A, you should not see B before A. If I update my profile name and then post, others should not see the post with the old name forever. Cause should follow cause.

See the ladder.

strong     ── latest write always visible
causal     ── related writes keep order
eventual   ── replicas may differ, later converge

So which one should you choose? Ask one question. What user lie is unacceptable? If "you may see old data for five seconds" is okay, eventual can work. If "you may buy the same last ticket twice" is not okay, you need something stronger.

Do not discuss consistency as philosophy. Discuss it as user harm. That makes your HLD answer sharp.

4) Quorums and CAP in practice¶

Worked example now. Suppose you keep user session state on three replicas. So replication factor N = 3.

You choose write quorum W = 2. You choose read quorum R = 2. Because R + W = 4, and 4 > N, a read and a write should overlap on at least one replica.

Start state: replica A = version 41 replica B = version 41 replica C = version 41

A user updates theme from light to dark. The coordinator sends the write to all three replicas.

Step 1: A stores version 42. B stores version 42. C is slow and times out.

Step 2: The write still succeeds. Why? Because acknowledgments received = 2. Required W = 2. So success = yes.

Current state becomes: A = 42 B = 42 C = 41

Now the next read arrives with R = 2.

Step 3: The coordinator asks A, B, and C. Suppose C replies first with 41. A replies next with 42. B is slightly delayed.

Step 4: The coordinator already has two replies. Those are C=41 and A=42. Now what to do? Pick the highest version seen. Return 42 to the client.

Step 5: The coordinator notices C is stale. So it can trigger read repair. C is updated from 41 to 42. All replicas converge again.

Simple math summary: N = 3 W = 2 R = 2 R + W = 4 4 > 3 Therefore read and write sets overlap.

Now what if you choose W = 1 and R = 1? Fast. Cheap. Also risky. A write can land on only C. A read can hit only A. Then the user sees stale data. That is the price of low coordination.

Now CAP, but practical. During a network partition, you usually cannot keep all three: perfect consistency, full availability, and partition tolerance. Partition tolerance is not optional on real networks. So the real choice is often: prefer consistency during the partition, or prefer availability during the partition.

Use CP when wrong reads create real business damage. Example: seat booking, wallet balance, inventory decrement, permission enforcement.

Use AP when some staleness is better than downtime. Example: social likes, cart contents, profile views, analytics counters.

The practical interview line is this: "We pick the weakest consistency that the product can safely tolerate." That sounds mature because it is. You are not worshipping a theorem. You are minimizing user harm and operational pain.

Where this lives in the wild¶

GitHub repository metadata — MySQL primary-replica layouts serialize writes on a primary and let replicas absorb read-heavy page loads.
Amazon shopping cart systems — Dynamo-style leaderless replication keeps cart adds available even when one replica is slow or unreachable.
DynamoDB Global Tables — multi-region leaders keep each region writable for user profile or session tables, then resolve conflicts asynchronously.
Netflix playback state and service metadata — replicated Cassandra clusters use tunable consistency so reads stay fast while copies converge.
Google Spanner-backed inventory or booking flows — stronger consistency is chosen when selling the same scarce item twice is unacceptable.

Pause and recall¶

Why does replication improve availability but complicate truth?
When is leader-follower simpler than multi-leader?
In the quorum example, why did version 42 still win even though C replied with 41?
Which product surfaces can usually tolerate eventual consistency, and which usually cannot?

Interview Q&A¶

Q: Why choose leader-follower and not leaderless for a user account service? A: Account data often wants ordered writes, simpler failover, and easier debugging. Leader-follower gives one write sequence and fewer conflict rules. Common wrong answer to avoid: "Leaderless is always better because there is no single point of failure" — it removes one bottleneck but adds conflict handling, quorum tuning, and harder reasoning.

Q: Why choose eventual consistency and not strong consistency for a social like counter? A: A briefly stale like count is usually acceptable, while extra coordination would raise latency and lower availability for little business gain. Common wrong answer to avoid: "Because likes do not need correctness" — they still need convergence; the system is relaxing freshness, not correctness forever.

Q: Why use quorum reads and writes instead of reading from one nearest replica? A: Quorums reduce stale reads by forcing overlap between write acknowledgments and read responses. One nearest replica is faster, but freshness becomes luck. Common wrong answer to avoid: "Quorum guarantees zero stale reads in all situations" — sloppy quorums, clock issues, and failed repairs can still create anomalies.

Q: Why choose CP and not AP for inventory deduction? A: Overselling the last item creates refunds, customer anger, and ledger repair work. Brief unavailability is often cheaper than incorrect stock state. Common wrong answer to avoid: "CAP says modern systems must pick CP for every serious workload" — many serious workloads are intentionally AP where staleness is safer than downtime.

Apply now (5 min)¶

Exercise: Design a three-replica product catalog. Mark which fields can be eventual, which must be strong, and why. Then choose one replication style for each: price, inventory count, product description, review count.

Sketch from memory: Draw three warehouses holding versions 41, 42, and 41. Show a read quorum of 2 and a write quorum of 2. Then annotate where read repair happens.

Bridge. Data is consistent enough for the business now. But copies and quorums do not save you when a warehouse burns down or a road gets flooded. → 11-failure-modes-and-resilience.md