Skip to content

10. Concurrency Patterns — many hands, one pipe room

~15 min read. When work overlaps, timing becomes part of your design.

Built on the ELI5 in 00-eli5.md. The plumbing — shared flow touched by many workers — needs rules so parallel rooms do not crack the same hallway at once.


1) Thread pools and async/await solve different waiting shapes

Concurrency does not begin when you write Thread. It begins when tasks can overlap in time. Some overlap is CPU-heavy. Some overlap is just waiting on network or disk. See. Those cases need different tools.

A quick picture helps.

Incoming work
   ├─▶ CPU-bound tasks ──▶ thread pool
   └─▶ I/O-bound waits  ──▶ async/await

A thread pool limits active workers. This avoids creating unbounded fresh threads during bursts. It also makes memory and scheduling costs predictable.

ExecutorService pool = Executors.newFixedThreadPool(8);
pool.submit(() -> thumbnailService.generate(imageId));

Worked example. Suppose one thumbnail job burns 180 ms of CPU. If 64 uploads arrive together on an 8-core machine, a pool of 8 or 12 workers is sensible. Creating 64 hot threads only increases context switching. More workers do not magically create more cores. Simple, no?

Async/await is different. It shines when the task mostly waits. Network calls, file reads, and database round-trips are classic cases. The code looks sequential, but the thread is not blocked the whole time.

const [price, inventory] = await Promise.all([
  pricingClient.fetch(sku),
  inventoryClient.fetch(sku),
]);

Worked numbers again. Suppose price call takes 90 ms. Inventory call takes 110 ms. Sequential waiting costs roughly 200 ms plus overhead. Concurrent awaiting can finish near 110 ms. That is real latency reduction without extra thread explosion.

So what to do? Use thread pools for bounded execution. Use async/await for efficient waiting. Do not call one “modern” and the other “old.” They solve different bottlenecks in the same plumbing.

2) Mutexes, semaphores, and read-write locks guard shared mutable state

If two workers touch shared mutable data, coordination begins. Without it, correctness depends on lucky timing. That is never a strong business strategy.

Mutex

A mutex allows one entrant into a critical section. Use it when one update must be fully isolated.

Thread A ──lock──▶ [ wallet update ] ──unlock
Thread B ──wait──────────────────────────────┘
lock.lock();
try {
    wallet.debit(amount);
} finally {
    lock.unlock();
}

Semaphore

A semaphore allows limited concurrency instead of just one. This is excellent for scarce shared capacity. Think database connections, GPU slots, or third-party API quotas.

permits = 3
A enter ✓
B enter ✓
C enter ✓
D wait

If your PDF generator can safely run only 5 jobs at once, a semaphore models that nicely. See. You are protecting capacity, not one variable.

Read-write lock

A read-write lock allows many readers together, but one writer alone. It helps when reads dominate heavily and writes are rare.

Readers: R1 R2 R3 can enter together
Writer : W must wait until readers leave

Worked example. Suppose product cache sees 12,000 reads each second. Writes happen only 15 times each second. A plain mutex serializes all reads unnecessarily. A read-write lock may improve throughput. But if reads are tiny and contention is low, the extra lock machinery may not help much. Measure first. Do not marry complexity blindly.

So what to do? Prefer immutable data when possible. Use a mutex for tiny critical sections. Use a semaphore for bounded slots. Use a read-write lock only when the read-heavy pattern is proven.

3) Producer-consumer and concurrent data structures reduce direct collisions

Sometimes the best lock is no direct shared mutation at all. That is where queues and concurrent collections help. Instead of everyone poking the same object, they exchange work safely.

Producer-consumer first. One side creates jobs. Another side drains them. Bursts get absorbed instead of exploding user-facing latency.

Producers ──▶ Queue ──▶ Consumers

Worked example. Checkout finishes 6,000 orders in ten minutes. Each order needs an email and an analytics event. If checkout sends both inline, latency grows and failure coupling increases. Push jobs into queues instead. Four email consumers and two analytics consumers can drain steadily. The user leaves fast. Background workers handle the rest.

Code-level sketch:

email_queue.put({"order_id": order_id, "template": "confirmed"})
job = email_queue.get()
mailer.send(job)

Now concurrent data structures. These are collections designed for overlapping access. They reduce manual locking for common cases. Examples include concurrent queues, maps, ring buffers, and atomic counters.

ConcurrentHashMap<String, Session> sessions = new ConcurrentHashMap<>();
AtomicInteger inFlight = new AtomicInteger(0);

What do they buy you? Safer common operations, better scalability, and less hand-written lock code. What do they not buy you? Automatic business correctness across multiple dependent steps. If you need “check stock then reserve then charge,” a concurrent map alone is not enough. The data structure protects operations, not your whole workflow. Simple, no?

4) Race conditions and deadlocks are timing bugs wearing disguises

Race condition means correctness depends on order of interleaving. Tests may pass a hundred times and fail once in production. That is why these bugs feel haunted.

Classic lost update:

balance = 1000
A reads 1000
B reads 1000
A writes 900
B writes 800
Final = 800, but expected = 700

Code sketch:

if (stock > 0) {
    stock = stock - 1;
}

Two threads can both observe stock > 0. Both then decrement. Now you oversell one unit. See. The code looks innocent because the danger lives between lines.

Deadlock is a different monster. Now threads wait forever on each other.

Thread A holds Lock1, waits for Lock2
Thread B holds Lock2, waits for Lock1

Worked example. Suppose account transfer locks source wallet first, destination wallet second. Another request locks them in reverse order. Transfer A holds wallet 11 and wants wallet 29. Transfer B holds wallet 29 and wants wallet 11. Both sleep forever unless your runtime times out.

So what to do? Keep one global lock order. Keep critical sections short. Never call external services while holding a lock. Prefer timeouts where the platform allows. Reduce shared state rather than adding decorative locks everywhere.

One final checklist for concurrent plumbing.

  • Is the shared thing truly mutable?
  • Can ownership move to one worker instead?
  • Can a queue replace direct touching?
  • Can a concurrent data structure replace manual locking?
  • Have you measured contention, not imagined it?

Where this lives in the wild

At Flipkart, an inventory backend engineer uses bounded thread pools for catalog image processing. At Uber, a marketplace engineer uses semaphores to cap expensive routing computations per host. At Swiggy, an order-platform engineer pushes notifications through producer-consumer queues to smooth dinner surges. At LinkedIn, a feed infrastructure engineer relies on concurrent maps and atomics for high-volume counters. At Zerodha, a trading systems engineer designs strict lock ordering so portfolio updates do not deadlock.


Pause and recall

  1. When would a thread pool help more than async/await? 2. Why does a semaphore model capacity better than a mutex? 3. What problem does producer-consumer solve besides “running in background”? 4. How is a race condition different from a deadlock?

Interview Q&A

Q1) Why thread pool not one fresh thread per task?

Because bounded workers protect CPU, memory, and scheduler health during bursts. Fresh threads look simple until the machine spends more time juggling than working. Common wrong answer to avoid: “More threads always mean more throughput.”

Q2) Why semaphore not mutex for limited external capacity?

Because the goal is to allow some concurrent entries, not exactly one. A mutex would underuse safe capacity and slow healthy traffic. Common wrong answer to avoid: “Both are just locks, so either is fine.”

Q3) Why producer-consumer not direct inline processing?

Because queues decouple bursty producers from slower consumers and protect user latency. Inline work couples every downstream delay to the request path. Common wrong answer to avoid: “Backgrounding is only for non-important work.”

Q4) Why concurrent data structure not plain map plus one big lock?

Because common concurrent operations scale better with purpose-built structures. One giant lock becomes a bottleneck and raises deadlock temptation around extra logic. Common wrong answer to avoid: “A big lock is always the safest and fastest answer.”


Apply now (5 min)

Exercise: Take one shared resource in your system. Decide whether it needs a thread pool, semaphore, queue, mutex, or concurrent collection. Write one sentence for the bottleneck you are controlling.

Sketch from memory: Draw a request flow that fetches price and stock concurrently, then enqueues email work. Mark where shared state exists and which concurrency pattern protects it.


Bridge. Once work overlaps safely, you still need disciplined storage access. Next, see how repositories, DAOs, and transactions carry data through the basement. → 11-data-access-patterns.md