03. Deployment, scaling, prod gotchas — the operational surface¶

~14 min read. Django in development is forgiving. Django in production reveals every implicit assumption — worker count, database pool, static files, async, caching, transactions. This chapter is the operational catalogue: what breaks, why, and how to size it.

Builds on: 02-models-views-routing-day-to-day.md.

A working Django app on a developer's laptop is one process, one database connection, one user, no concurrency. Production is many workers, a connection pool, hundreds of users per second, and concurrency at every layer. The previous chapters explained what Django does; this one explains what production does to Django.

1) The WSGI deployment shape¶

The reference production deployment:

internet → reverse proxy (nginx) → gunicorn (N workers) → Django app
                                                         ↓
                                                   Postgres / Redis

Reverse proxy. Terminates TLS, serves static files, handles slow-client buffering, rate-limits. gunicorn does none of these well; nginx does all of them well. Run them together.

gunicorn workers. Each worker is a separate Python process holding its own Django app instance. Worker count rule of thumb: 2 × CPU + 1. Below this you under-utilise CPU; above this you spend time context-switching. Adjust based on per-request CPU vs. I/O ratio.

Worker class. Default is sync — one request per worker at a time. Long-running views (5s+ database calls, external API calls) starve other requests on the same worker. Switches: gevent or eventlet (greenlet-based; many concurrent requests per worker, but every blocking call must be greenlet-friendly); uvicorn workers for ASGI/async views; threaded workers (--threads 4) for I/O-heavy sync code.

Timeout. --timeout 30 — kill workers stuck for 30 seconds. Set higher than your slowest expected request, lower than your patience for a stuck pod. Workers killed mid-request lose the response; the client gets a 502 from nginx.

2) Database connections and pooling¶

Django opens a database connection per request by default. With 100 gunicorn workers across 4 pods, you have 400 potential connections. Postgres caps connections (default 100, often configured 200-500 in production). Exhaustion is the most common Django prod incident.

The defence:

Connection pooling. PgBouncer sits between Django and Postgres. Django opens one connection to PgBouncer per worker; PgBouncer multiplexes onto a smaller pool of real Postgres connections.

Django workers (1000s) → PgBouncer (transaction-pooling) → Postgres (100 connections)

Modes:

Session pooling. A worker gets a Postgres connection for the whole session. Same as no pooling for the worker.
Transaction pooling. A worker gets a Postgres connection only for the transaction's duration. Lots of Django features assume session-level state (prepared statements, advisory locks, LISTEN/NOTIFY); transaction pooling can break them.
Statement pooling. Even tighter; almost never compatible with Django.

The right config: PgBouncer in transaction mode; Django settings turn off prepared statements (DISABLE_SERVER_SIDE_CURSORS = True); avoid features that require session state.

CONN_MAX_AGE. Without PgBouncer, set CONN_MAX_AGE = 60 (or higher) in DATABASES. Each worker reuses its connection for up to 60 seconds. Cuts connection-establishment overhead substantially.

Read replicas. For read-heavy workloads, route read queries to a replica:

# Custom database router
class ReadReplicaRouter:
    def db_for_read(self, model, **hints):
        return 'replica'
    def db_for_write(self, model, **hints):
        return 'default'

The trade-off: replicas lag the primary (typically 50-500ms); reads-after-writes can see stale data. Application logic must tolerate or pin specific reads to the primary.

3) Static files and media¶

Static files (CSS, JS, images bundled with the app) live in STATIC_ROOT after python manage.py collectstatic. In production they should be served by:

nginx directly (cheapest).
A CDN (CloudFront, Cloudflare) with the origin pointing at nginx or S3.
whitenoise middleware as a fallback for simple deploys (serves static files from Django; cheap but slower).

Media files (user uploads) are stored separately. Best practice: cloud object storage (S3, GCS) via django-storages. Never the local filesystem on a horizontally-scaled app — each pod has its own filesystem; uploads on pod A are invisible to pod B.

DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
AWS_STORAGE_BUCKET_NAME = 'my-media-bucket'
AWS_S3_CUSTOM_DOMAIN = 'media.example.com'

4) Caching — the layers¶

Django's cache framework is multi-backend. Pick layers based on the access pattern:

Per-view cache. @cache_page(60 * 15) caches the entire response for 15 minutes. Useful for anonymous, slow-changing pages. Useless for authenticated pages (cache key doesn't include user).
Template fragment cache. {% cache 600 sidebar request.user.id %} caches one fragment, keyed by user.
Low-level cache. cache.get(key) / cache.set(key, value, timeout) for computed values — expensive aggregations, third-party API responses.
QuerySet cache. Tools like django-cachalot cache QuerySet results invalidated on model changes. Powerful and tricky — invalidation correctness is the catch.

Backends:

Redis (django-redis). Standard for production. Persistent, clustered, multi-key operations.
Memcached. Cheaper, simpler; no persistence; LRU eviction.
Database cache. For low-volume use; rarely worth the complexity vs. Redis.
Local-memory cache. Per-worker process only. Useless for shared cache; useful for "compute once per worker boot" patterns.

The most-bitten gotcha: invalidation. Django's cache.set has no built-in invalidation tied to model changes. A view caches a list; an admin updates a row; the view still serves stale data until expiry. Patterns:

Versioned cache keys (f"orders:v{order.version}"). Bumping the version invalidates.
post_save signals that cache.delete(...) the affected keys.
Short TTLs and tolerance for staleness.

5) Transactions — implicit and explicit¶

Django wraps each view in a transaction if ATOMIC_REQUESTS = True. The full request is one transaction; an exception rolls everything back. Convenient and risky:

Long views hold long transactions, holding row locks.
External API calls inside the transaction make the transaction wait on the network.
The pattern is "commit early, side-effect later" — but ATOMIC_REQUESTS makes everything one transaction.

The mature pattern: ATOMIC_REQUESTS = False. Wrap explicitly with transaction.atomic() only where needed.

def transfer_funds(request):
    with transaction.atomic():
        account_a.balance -= amount
        account_a.save()
        account_b.balance += amount
        account_b.save()
    # External call outside the transaction
    notify_user.delay(account_a.user_id)
    return JsonResponse({'ok': True})

select_for_update. When you need to read a row and update it without race, lock it:

with transaction.atomic():
    order = Order.objects.select_for_update().get(pk=order_id)
    if order.status == 'pending':
        order.status = 'paid'
        order.save()

Lock is released at transaction commit. Holding a SELECT FOR UPDATE across an external API call is the recipe for deadlocks under load.

on_commit hooks. A side effect that should happen only if the transaction commits:

def order_paid(order):
    order.status = 'paid'
    order.save()
    transaction.on_commit(lambda: send_receipt.delay(order.id))

If the transaction rolls back, the Celery task is never enqueued. This pattern eliminates the "the email said you paid but the database says you didn't" class of bug.

6) Async Django — where it helps and where it doesn't¶

Django 4+ supports async views, async ORM, async middleware. The mental model:

ASGI server (uvicorn) replaces gunicorn's WSGI.
A view can be async def and use await for I/O.
The ORM has async methods (Order.objects.afirst(), order.asave()).

Where it helps: views that wait on external HTTP calls (third-party APIs), WebSocket and SSE endpoints, fan-out patterns.

Where it does not help: CPU-bound work (still bound by GIL), simple ORM queries (sync ORM is fine; async ORM doesn't make queries faster), or apps with heavy sync libraries (most ORMs, many third-party SDKs are sync; mixing them with async views creates blocking spots).

The honest position: async Django is mature enough for new projects with specific async-heavy patterns. Migrating a sync codebase to async is rarely worth it; the gains are workload-specific.

7) Sessions, cookies, and the session backend¶

Default session backend is the database. Every authenticated request reads the django_session row, then often writes it back. At scale, this is a hot table.

Alternatives:

Signed cookies (signed_cookies). Session data lives in the cookie; no database read. Limits: cookie size (4 KB), no server-side invalidation, all session data sent on every request.
Cached sessions (cached). Sessions in Redis or memcached. Fastest; lost on cache restart.
Cached + database (cached_db). Cache for reads, database for persistence. Good balance.

Production default for medium-scale: cached_db with Redis. For small apps, database is fine.

8) Static analysis, type hints, and the team's third year¶

A Django codebase at year three benefits from:

mypy with django-stubs. Type-checks models, querysets, and views. Catches whole classes of bugs at PR time.
ruff or flake8. Linting; consistent style; catches unused imports and bare excepts.
pytest-django. Faster than the default test runner; supports fixtures, parameterisation, plugins.
django-migration-linter. Flags risky migrations in CI (non-nullable adds, dropping indexes, locking operations on large tables).
django-silk. Per-request profiling for staging; per-request query count assertions in tests.

These are not optional at scale. A codebase without them accumulates debt that compounds.

9) Observability — what to track¶

Per-request:

Latency (p50, p95, p99 per endpoint).
Query count per request.
Error rate per endpoint.

Per-resource:

Database connection pool utilisation.
Slow-query log (queries > 100ms).
Cache hit rate per cache backend.
gunicorn worker memory and request count per worker.

System-level:

Request rate per endpoint.
4xx/5xx breakdown.
Background task lag (Celery queue depth).

Tools that produce these out of the box: Sentry, New Relic, Datadog, OpenTelemetry instrumentation. Without them, debugging a slow endpoint in production is grep-driven and painful.

10) The threaded example — sizing a deployment¶

Take a Django app expecting 100 RPS sustained, p95 latency target 200ms, p99 target 500ms.

Worker count. Each request takes ~80ms on average. With sync workers, one worker handles ~12 RPS. 100 / 12 ≈ 9 workers minimum. With margin, 16 workers across 4 pods (4 workers each, each pod has 2 CPU cores).

Database connections. 16 workers × 1 connection-per-worker = 16 sustained. With CONN_MAX_AGE = 60, this is stable. Add PgBouncer in transaction mode to support spike traffic without exhausting Postgres.

Cache. Redis with one connection per worker; redis-py's connection pool handles spikes. Cache hit rate target ≥ 80% for hot keys.

Static files. Served from nginx + S3 + CloudFront. Django never touches them in production.

Background tasks. Celery with Redis broker; 4 worker processes, autoscaling on queue depth. The Celery worker is a separate deployment from the web; sizing is independent.

Observability. Sentry for errors, OpenTelemetry for traces, structured JSON logs to stdout shipped to a log aggregator. Per-endpoint p95 latency on the leadership dashboard.

After two weeks of load, the actual numbers will diverge from the plan. The discipline is to size to the data, not to the plan, after the first traffic.

Operational signals¶

Healthy. Per-endpoint p95 within target. Worker utilisation 60-80% (room for spikes). Database connection pool utilisation < 60%. Cache hit rate stable. Error rate < 0.1%.

First degrading metric. Worker utilisation climbing past 85% sustained. Either traffic has grown or per-request cost has climbed; the workers are saturating.

Misleading metric. CPU utilisation alone. Workers can be at 30% CPU and saturated by I/O wait or by held database locks; CPU is not the constraint.

Expert graph. Per-endpoint latency × query count × cache hit rate; the cell that lights up is the cell to fix.

Where this appears in production¶

Instagram — multi-data-centre Django; their writeups on connection pooling and async I/O are foundational.
Pinterest — Django + PgBouncer; documented patterns on connection-pool sizing.
Eventbrite — Django for the web tier; their post on ATOMIC_REQUESTS = False and explicit transactions is widely cited.
The Washington Post (Arc Publishing) — Django CMS with heavy CDN use; static files entirely off Django.
Mozilla — extensive use of django-redis and cached sessions; their patterns on cache invalidation are public.
Dropbox (internal tools) — Django apps run with whitenoise for simplicity; per-tool sizing.
Goa-based fintech SaaS — transaction.on_commit for queue-after-commit pattern; eliminates double-debit bugs.
Bengaluru SaaS — async views for third-party API fan-out; sync for everything else.

Recall / checkpoint¶

What is the worker-count rule of thumb and why?
Why is connection pooling load-bearing at scale?
What is PgBouncer transaction mode and what does it break in Django?
When does cache_page work and when does it not?
What is transaction.on_commit and what bug class does it eliminate?
When is async Django worth migrating to?
What is the static file plan for a horizontally-scaled deployment?

Interview Q&A¶

Q1. The Django app exhausts Postgres connections at 200 RPS. Walk through the diagnosis and the fix. Diagnosis: each Django worker holds a connection per request; high worker count × low CONN_MAX_AGE × no pooler equals connection storms. Fix: install PgBouncer in transaction mode between Django and Postgres; set Django DISABLE_SERVER_SIDE_CURSORS = True; tune CONN_MAX_AGE. PgBouncer multiplexes worker connections onto a smaller Postgres pool. Validate by watching Postgres pg_stat_activity connection count. Common wrong answer to avoid: "increase max_connections" — buys time, does not solve the structural problem.

Q2. A view that does a payment-then-email pattern is sending emails when the database rolls back. Walk through the fix. Diagnosis: the email send is inside the transaction; the transaction rolls back; the email was already sent (or queued). Fix: move the email side effect to transaction.on_commit(lambda: send_email.delay(...)). The closure is invoked only on successful commit. If the transaction rolls back, the email is never queued. The pattern eliminates "the receipt said you paid but the database doesn't show it." Common wrong answer to avoid: "send the email only if the save returns successfully" — does not handle nested transactions.

Q3. A team uses ATOMIC_REQUESTS = True. A long view that calls an external API now blocks other operations on the same rows. Walk through the trade-off. The trade-off: ATOMIC_REQUESTS = True is convenient (every view is atomic) but couples transaction lifetime to request lifetime. External API calls inside the transaction hold row locks for the API's duration. Other operations on those rows block. Fix: ATOMIC_REQUESTS = False; wrap the database-touching parts in explicit transaction.atomic(); move external calls outside the block. The cost is more explicit code; the benefit is bounded lock time. Common wrong answer to avoid: "make the API call faster" — the structural fix is the transaction shape, not the API.

Q4. The team's session reads are the slowest part of authenticated requests. What is the fix? Diagnosis: default db session backend; every request reads from django_session; the table is a hot row. Fix: switch to cached_db (Redis cache + database persistence) or cached (Redis only, no persistence). For lightweight sessions, signed-cookies eliminate the read entirely. Validate by measuring per-request session-backend latency. Common wrong answer to avoid: "scale the database" — sessions are a hot table; the structural fix is to move them.

Q5. The team wants to use async views to "make Django faster." Walk through the assessment. Async helps where requests wait on I/O — third-party APIs, slow databases, WebSocket workloads. Async does not help CPU-bound work (still GIL-bound), simple ORM queries (no wait to overlap), or codebases with heavy sync libraries. For a typical CRUD-heavy app, async adds operational complexity (uvicorn, asgi-compatible middleware, async ORM) without proportional gains. Honest answer: profile the workload; if I/O wait dominates, async helps; otherwise, optimise queries and add caches first. Common wrong answer to avoid: "async is always faster" — depends on the workload's blocking profile.

Q6. The Django admin is being used as the operational dashboard; load is climbing. How do you respond? The admin is being asked to serve traffic it was not designed for. Options: (a) add caching at the admin's list views; (b) build a dedicated dashboard using DRF + a front-end; (c) move heavy queries to a read replica. For staff-only traffic, (a) is often enough. For high volume, (b) is the right path. The admin is for low-frequency staff work; high-frequency views deserve a designed UI. Common wrong answer to avoid: "scale the admin" — past a point, the structural fix is to not serve customer-facing or dashboard traffic from the admin.

Operational memory¶

This chapter explained the operational surface of Django in production: WSGI deployment, connection pooling, static files, caching layers, transactions, async, sessions, observability. The important idea is that Django's defaults are good for development; production maturity requires conscious sizing and explicit patterns at every layer.

You learned to size gunicorn workers, pool database connections, serve static files outside Django, choose cache layers per pattern, manage transactions with on_commit and explicit atomic(), and observe the right metrics. That solves the opening problem because every operational surface now has a default and a known evolution.

Carry this diagnostic forward: when Django is slow in production, the question is which operational layer is saturated — workers, connections, cache, or transactions. Each has a structural fix.

Remember:

Worker count: 2 × CPU + 1; size to traffic.
PgBouncer in transaction mode for connection scale; turn off server-side cursors.
Static files via nginx/CDN; media via S3.
transaction.on_commit for side effects that must wait for commit.
Async helps I/O-bound; not a magic speed-up.
Per-endpoint p95 is the truth; aggregates lie.

Bridge. Django on the application side is complete. The next module — 05_nginx — covers the reverse proxy that sits in front of Django, handles TLS, and serves static files faster than Django ever will. → ../05_nginx/00-eli5.md