01. Broker, worker, ack — what Celery actually does per task¶

~12 min read. One send_welcome_email.delay(user_id) looks like one task that runs once. It is not always. By the end of this chapter you will see the broker-worker protocol, why acks_late matters, and why your task should be idempotent.

Builds on: 00-eli5.md.

The kitchen analogy explains what Celery does. To debug duplicate emails or lost tasks, you need to see the protocol. We follow one task — send_welcome_email(user_id=42) — through the broker, the worker, and the ack.

1) The lifecycle of one task¶

TIME    LAYER                     ACTION
────    ────────────────────      ─────────────────────────────────
0ms     web app                   send_welcome_email.delay(42)
                                  Celery serialises args to JSON
0.5ms   broker (Redis)            LPUSH celery <task message>
                                  Message persisted in queue
                                  Web app returns to user
1ms     worker (separate process) BRPOP celery
                                  Pulls message, deserialises
                                  Adds to worker's prefetch pool
1.2ms   worker                    Picks up the task from pool
                                  Begins execution
50ms    worker                    Task finishes
50.1ms  worker                    Sends ACK to broker
50.2ms  broker                    Removes task from queue (gone)

The web app's part is microseconds (serialize, push). The worker's part is the full task duration. The broker is the persistence layer between them.

2) The message — what actually goes on the wire¶

A Celery message is JSON (default — also msgpack, yaml). Structure:

{
  "id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "task": "myapp.tasks.send_welcome_email",
  "args": [42],
  "kwargs": {},
  "retries": 0,
  "eta": null,
  "expires": null,
  "queue": "emails",
  "headers": { "task": "myapp.tasks.send_welcome_email", "id": "..." }
}

Key fields:

id — unique task ID. Used for ack, result lookup, deduplication.
task — fully-qualified Python path. Workers must have the same code.
args / kwargs — task parameters, JSON-serialised. Cannot include objects — no Django model instances, no datetime without conversion. Always serialisable primitives.
eta — earliest time to run. Used by apply_async(eta=...) for delayed execution.
retries — how many times this task has been retried.
queue — which queue to enqueue to (chapter 02).

3) Worker startup and the prefetch pool¶

When a worker starts:

celery -A myapp worker --loglevel=info --concurrency=4 --prefetch-multiplier=4

--concurrency=4 — four child processes (or threads/greenlets, depending on the pool).
--prefetch-multiplier=4 — each child pre-fetches up to 4 tasks. Worker total prefetch: 4 × 4 = 16.

The worker connects to the broker, declares its queues, and starts consuming. Prefetched tasks live in worker memory until a child is free to execute one.

The trade-off:

High prefetch. Children keep busy; no waiting on broker. But if the worker crashes, all prefetched tasks are at risk.
Low prefetch. Each task acked or returned to the queue quickly. But more broker round-trips; possible idle time.

For short tasks (subseconds), high prefetch is fine. For long tasks (seconds to minutes), prefetch=1 ensures fair distribution across workers.

celery -A myapp worker --concurrency=4 --prefetch-multiplier=1

4) `acks_early` vs. `acks_late` — the crucial config¶

Default Celery acknowledges tasks on receipt, not on completion. This is acks_early:

worker receives task → ACK to broker (task gone from queue) → execute task

If the worker crashes mid-execution, the task is lost. No retry, no record.

The fix: acks_late = True. The worker acks after execution completes:

worker receives task → execute task → ACK to broker (task gone) → done

If the worker crashes, the task is not acked; the broker eventually re-delivers it (after visibility_timeout for SQS-style brokers, or on consumer disconnect for AMQP). The task runs again.

The trade-off:

acks_early (default) — tasks lost on worker crash. Suitable for tasks that don't matter.
acks_late = True — tasks may run more than once on worker crash. Tasks must be idempotent.

In production, acks_late = True is the standard choice. Loss is worse than duplicate for most use cases. Set globally:

app.conf.task_acks_late = True
app.conf.worker_prefetch_multiplier = 1   # often paired with acks_late

5) Why tasks must be idempotent¶

With acks_late = True, a task may execute more than once on failure. Even without crashes, network glitches, timeouts, and broker bugs can cause re-delivery. Tasks must be designed to tolerate duplicate execution.

Bad — sends email twice if re-delivered:

@app.task
def send_welcome_email(user_id):
    user = User.objects.get(pk=user_id)
    send_email(user.email, subject="Welcome", body=render_email(user))

Good — checks before sending:

@app.task
def send_welcome_email(user_id):
    user = User.objects.get(pk=user_id)
    if user.welcome_email_sent_at:
        return  # already sent; idempotent skip
    send_email(user.email, subject="Welcome", body=render_email(user))
    user.welcome_email_sent_at = timezone.now()
    user.save()

Better — uses an idempotency token at the destination:

@app.task
def charge_payment(payment_id):
    payment = Payment.objects.get(pk=payment_id)
    # The payment ID is the idempotency token; Stripe dedupes if it sees the same ID twice.
    stripe.PaymentIntent.create(
        amount=payment.amount,
        currency='inr',
        idempotency_key=str(payment.id),
    )

The pattern: idempotency is a property of the task design, not a Celery feature. Celery provides at-least-once delivery; idempotency turns that into "exactly-once effect."

6) The result backend — when you need return values¶

@app.task
def add(x, y):
    return x + y

result = add.delay(2, 3)
print(result.id)         # 'f47ac10b-...'
print(result.get(timeout=10))  # 5 — blocks until done

For result.get() to work, Celery needs a result backend. Configured in:

app.conf.result_backend = 'redis://localhost:6379/1'
app.conf.result_expires = 3600  # results expire after 1 hour

Result backends:

Redis. Fast, common. Volatile (set expiry).
Database (django-celery-results). Persistent. Slower.
RPC. Results delivered via the broker. Each result is consumed once.

Most production teams skip the result backend. Reasons:

Results add load — every task completion is a write to the backend.
Most tasks are fire-and-forget — the web doesn't need the return value.
Tasks that need to "tell" the web something do it directly (database, websocket, push notification).

Use the result backend for chord patterns (chapter 02) and for explicit user-facing "task progress" UIs.

7) The broker choice — Redis vs. RabbitMQ vs. SQS¶

Redis.

Simplest setup; you probably already have Redis.
Low memory footprint; high throughput.
Lossy on hard crash unless persistence is configured.
Limited reliability features: no native dead-letter queue, no message TTL per-queue.
Best for: most apps where simplicity matters and you can tolerate occasional task loss.

RabbitMQ (AMQP).

Most feature-complete: dead-letter queues, priority queues, message TTL, durable queues, fanout exchanges.
More operational complexity (Erlang VM, clustering, plugin management).
Stronger guarantees with acks_late.
Best for: financial and regulated workloads, complex routing, multi-tenant queues.

SQS.

Managed service; no broker to operate.
14-day max message retention.
15-minute max visibility timeout (re-deliver if not acked).
Strong delivery guarantees; built-in dead-letter queues.
Higher per-message latency than Redis (~100ms vs. ~1ms).
Best for: AWS-heavy infra; long-running tasks; teams that don't want to operate a broker.

For most Python apps starting out, Redis with acks_late=True is the right default. Migrate to RabbitMQ or SQS only when you hit a constraint (loss is unacceptable, broker ops is a problem, etc.).

8) Worker pool types¶

celery -A myapp worker --pool=prefork --concurrency=4
celery -A myapp worker --pool=gevent --concurrency=100
celery -A myapp worker --pool=solo --concurrency=1
celery -A myapp worker --pool=threads --concurrency=8

prefork (default). One process per concurrency unit. True parallelism. Safest. Memory cost: each child has its own Django app instance.
gevent / eventlet. Greenlet-based concurrency. Many concurrent tasks per process for I/O-bound workloads (HTTP calls). Each blocking call must be greenlet-compatible.
threads. Threaded worker. Useful for I/O-bound tasks where greenlets are not viable.
solo. Single-threaded; useful for debugging.

The choice mirrors web-server pool choices. CPU-bound tasks: prefork. I/O-bound, large fan-out: gevent. Sync but I/O-heavy: threads.

9) The threaded example — a task that almost loses messages¶

A team runs Celery with default config (acks_early, prefetch_multiplier=4). Each Celery worker is a 4-CPU pod running 4 prefork children with prefetch 4 each = 16 prefetched tasks.

One morning, Kubernetes restarts the pod. The 16 prefetched tasks that hadn't started executing yet were already acked to the broker; they vanish. Among them: a payment confirmation email, two refund notifications, three audit-log entries.

The fix is structural:

app.conf.task_acks_late = True
app.conf.worker_prefetch_multiplier = 1
app.conf.task_reject_on_worker_lost = True

Now: each worker pulls one task at a time, executes, then acks. If the pod restarts mid-execution, the broker re-delivers. Tasks that completed don't re-deliver (they were acked). Tasks that started but didn't finish do re-deliver — which requires idempotency.

After the fix, monthly task loss drops from "occasionally" to "essentially never." The cost is slightly higher broker round-trips and the discipline of idempotency.

Operational signals¶

Healthy. Queue depth steady around zero or oscillating with traffic. Worker utilisation 50-80%. Task latency (p50, p99) within target. No deadletter accumulation.

First degrading metric. Queue depth climbing. Workers can't keep up; either traffic spiked or tasks are slower than usual. Add workers or speed up tasks.

Misleading metric. Tasks per second. High throughput can mask high error rate; tasks that error fast inflate the count.

Expert graph. Queue depth × task latency × failure rate, per task name. The cell that lights up is the next investigation.

Where this appears in production¶

Instagram (early Django + Celery) — large Celery deployments for media processing.
Pinterest — Celery for image pipelines; heavily tuned for prefetch and concurrency.
Mozilla (services) — Celery for backend processing; well-documented patterns.
Eventbrite — Celery + Redis broker at scale.
The Washington Post — Celery for CMS-side processing.
A Bengaluru fintech — Celery + SQS broker for payment workflows.
A Pune SaaS — switched from Redis to RabbitMQ when message loss became unacceptable.
A Mumbai e-commerce — Celery with acks_late=True + idempotency tokens at every destination API.

Recall / checkpoint¶

What is the lifecycle of a Celery task from delay() to ack?
What is the difference between acks_early and acks_late?
Why must Celery tasks be idempotent?
When should you use a result backend, and when should you skip it?
What are the trade-offs between Redis, RabbitMQ, and SQS as brokers?
What is worker_prefetch_multiplier and how does it interact with acks_late?
Which worker pool type would you choose for I/O-bound tasks?

Interview Q&A¶

Q1. The team is losing payment-confirmation emails when pods restart. Walk through the diagnosis and fix. Default Celery acks tasks on receipt. When the pod restarts, prefetched tasks are gone. Fix: set task_acks_late = True so the broker holds the task until execution completes; set worker_prefetch_multiplier = 1 so a worker pulls only what it can immediately execute; set task_reject_on_worker_lost = True so abrupt termination re-delivers. Make tasks idempotent — the trade-off for at-least-once delivery. Validate by simulating a pod restart with traffic. Common wrong answer to avoid: "switch brokers" — the loss is config, not broker.

Q2. Walk through how to design an idempotent task that charges a payment. The task ID itself (or the underlying payment ID) becomes the idempotency token at the destination. Pass it explicitly to the payment provider (Stripe idempotency_key, Razorpay idempotency_key). If the task retries, the provider sees the same key and returns the original result. The task is safe to run multiple times. Compare to a non-idempotent version that just calls stripe.PaymentIntent.create() — that creates a new payment per retry. The fix is one extra parameter; the bug is six-figure. Common wrong answer to avoid: "set retries to zero" — fails on real failures.

Q3. The team's queue depth is climbing over a deploy. Walk through diagnosis. Either workers stopped processing (deploy killed them; restart hasn't completed) or tasks suddenly take longer (deploy changed task code; new code is slower). Check worker count and worker utilisation. If workers are zero or at full capacity, scale them. If tasks are slow, profile the task: which step is slow, is it a new code path, is the database the constraint. Queue depth is the symptom; the cause is at the worker layer. Common wrong answer to avoid: "raise the broker capacity" — the broker isn't the problem; the consumer is.

Q4. The team uses a result backend and the database is slow. Walk through the assessment. The result backend writes on every task completion. At high task volume, this saturates the database. Fix: disable the result backend (app.conf.result_backend = None) unless explicitly needed; switch to Redis result backend; or use ignore_result=True per task to skip storing results for fire-and-forget tasks. The reason to use a result backend is narrow — chord patterns and explicit progress tracking. For most tasks, the result is not needed. Common wrong answer to avoid: "scale the database" — fixes a symptom, not the design.

Q5. A task is taking 10 minutes; another task is enqueued and waits behind it. Walk through the fix. The single-queue model has head-of-line blocking — a slow task delays everything behind it. Fix: separate queues by task profile. Fast tasks go on fast queue; slow tasks on slow queue. Workers per queue, sized for the workload. The fast queue's workers have prefetch=1 and concurrent=many; the slow queue's workers have prefetch=1 and concurrent=few. Tasks routed at submission: task.apply_async(queue='slow'). Common wrong answer to avoid: "scale workers" — without queue separation, slow tasks still block fast ones.

Q6. The team is choosing between Redis and RabbitMQ. What is the deciding factor? The reliability requirement. If task loss is tolerable (most non-financial workloads), Redis with acks_late=True is sufficient and operationally simpler. If task loss is unacceptable (payments, regulated workloads), RabbitMQ with durable queues, clustering, and publisher_confirms adds the durability guarantees Redis can't match. The operational cost: RabbitMQ requires Erlang familiarity and active monitoring; Redis is one process. Choose based on the cost of one lost task. Common wrong answer to avoid: "Redis is always faster" — broker choice is about reliability, not speed.

Operational memory¶

This chapter explained the broker-worker protocol: how tasks become messages, how workers consume, the difference between early and late ack, and why idempotency is the structural property tasks need under at-least-once delivery. The important idea is that Celery guarantees delivery, not execution — exactly-once is a property of task design, not configuration.

You learned to set acks_late=True, size prefetch, choose a pool type, design idempotent tasks, and pick a broker matched to reliability needs. That solves the opening problem of "why did this task run twice (or zero times)?"

Carry this diagnostic forward: when a task misbehaves, ask which layer is responsible — broker (delivery), worker (execution), or task code (idempotency). Each has structural fixes.

Remember:

acks_late = True + idempotency = production-safe.
Tasks are not exactly-once; they are at-least-once with idempotent design.
Prefetch trade-off: throughput vs. crash exposure.
Result backend: only when actually consumed; otherwise skip.
Broker choice is about reliability, not speed.

Bridge. The protocol is clear. Day-to-day, you write tasks, organise them into queues, schedule the periodic ones with beat. The next chapter is the working developer's surface. → 02-tasks-queues-day-to-day.md