00. Postgres — ELI5¶
Postgres is a strict ledger clerk. Every write goes into a journal first. Then into the books. Nothing is forgotten. Nothing is half-done.
Picture a busy bank with one head clerk. Customers walk in. They deposit, withdraw, transfer. The clerk does three things, in this order, every time.
First, he writes the request in a journal. Pen on paper. Permanent. This is the WAL — the write-ahead log.
Second, he updates the actual passbook page in his desk drawer. This page is in his fast scratchpad memory — shared buffers. Later, when he is free, he files the page back into the long-term filing cabinet on disk — the heap.
Third, he updates an index card at the front of the cabinet. So next time someone asks "show me account 42", he doesn't dig through every file. This is the B-tree index.
See. Journal first. Books second. Index third. That ordering is the whole game. Simple, no?
The picture¶
client (psql, app, ORM)
│
▼
┌──────────────────────┐
│ POSTMASTER (PID 1) │ listens on :5432
└──────────┬───────────┘
│ fork on each connection
▼
┌──────────────────────┐
│ backend process │ one OS process per session
│ (parses SQL, │
│ runs the query) │
└──────────┬───────────┘
│
┌──────────┼───────────┐
▼ ▼ ▼
┌───────────┐ ┌────────┐ ┌──────────┐
│ shared │ │ WAL │ │ index │
│ buffers │ │ buffer │ │ pages │
│ (scratch) │ └───┬────┘ └──────────┘
└─────┬─────┘ │ fsync on COMMIT
│ ▼
│ pg_wal/ (durable journal on disk)
▼
heap pages on disk (the actual table rows, 8KB each)
The postmaster sits and listens. A new client connects. Postmaster forks a fresh OS process for that session. That backend handles every query for that one connection until it disconnects. This is why too many connections kill Postgres — too many processes. That is also why we put PgBouncer in front to pool them.
What problem does it solve¶
Without a real database, life is sad. Two users update the same row. One write is lost. The server crashes mid-write. Half the row is on disk. Someone reads a balance while another transfers money. They see a phantom amount. No constraints. Garbage gets in. Reports lie.
Postgres fixes all four. ACID transactions — your INSERT either fully lands or fully rolls back. MVCC — each transaction sees a consistent snapshot of the world, without blocking writers. WAL — even on a power cut, every committed write is replayable from the journal. Constraints — foreign keys, CHECKs, UNIQUE. The database refuses bad data.
You pay for this with some write amplification and a process called vacuum. But you sleep at night.
The three things you actually deal with¶
Tables and schemas¶
A table is rows and columns with types the database enforces.
A schema is a namespace inside a database — public.users, billing.invoices.
Foreign keys glue tables together. The DB itself refuses orphans.
psql is the CLI you live in. \d users shows the table shape.
Migrations evolve the schema over time — add columns, add indexes, backfill.
Transactions¶
BEGIN; ... COMMIT; — a unit of work that either all happens or none of it does.
Default isolation is Read Committed — you see what is committed at the moment of each statement.
Repeatable Read and Serializable give stronger guarantees for ledgers and double-entry work.
Long transactions are dangerous — they hold snapshots open and block vacuum.
Indexes¶
A table without an index is a phone book without alphabetical order.
The default is B-tree — sorted, balanced, good for =, <, >, BETWEEN.
GIN is for JSONB and full-text search. GiST is for geometry and ranges.
EXPLAIN ANALYZE shows the plan and tells you if your index actually got used.
Too many indexes slow writes. Too few slow reads. So what to do? Measure, then add.
Where this lives in the wild¶
- Instagram — shards user feed and likes data across thousands of logical Postgres schemas mapped to fewer physical nodes, using PL/pgSQL for Snowflake-style ID generation.
- Notion — runs 480 logical Postgres shards spread across 96 physical instances, with
workspace_idas the shard key for every block of content. - Figma — vertically partitions tables onto separate Postgres clusters first, then horizontally shards via a custom
DBProxyservice that rewrites SQL at the application layer. - GitLab — runs a single 12 TiB Postgres instance handling 60-80k transactions per second, with PgBouncer cutting active connections from thousands down to 10-20.
- TripAdvisor — powers 250 million traveller reviews on Postgres, with content types (members, media, reviews) split across separate Postgres machines.
- Reddit — moved media metadata into AWS Aurora Postgres with
pg_partmanpartitioning andpg_cronfor partition maintenance, serving 100k requests/sec under 5ms p90. - Robinhood — runs core brokerage transactional data on Postgres with application-level sharding so each shard has its own DB and deployment, shrinking blast radius.
- Discord — used Postgres for early message storage before scaling forced a move to Cassandra and later ScyllaDB; smaller services still live on Postgres.
- Twitch — runs roughly 94% of its 125 OLTP database hosts on Postgres, with its oldest cluster handling 300k+ transactions per second.
- Disqus — uses Postgres recursive CTEs (
WITH RECURSIVE) to fetch threaded comment trees in a single query, replacing a 500%-slower N+1 pattern. - Skype — runs core call-record and presence data on Postgres with PL/Proxy hash-partitioning by username and PgBouncer-style pooling via Skytools.
- Datadog — stores host metadata (host_id, AZ, tags) in Postgres, handling 25,000 upserts per second to track when each agent host was last seen.
- Cloudflare — runs Postgres internally for the Stream product, Images, and the Cloudflare Dashboard itself, accessible only through Cloudflare Tunnel from bastion hosts.
- Heroku — operates the Heroku Postgres managed service, which migrated hundreds of thousands of customer databases onto Amazon Aurora Postgres underneath.
- Supabase — built a whole backend-as-a-service around a dedicated Postgres per project, exposing it as REST (PostgREST), auth (GoTrue), and realtime via logical replication.
- DoorDash — ran its Python monolith on Aurora Postgres until single-writer bottlenecks pushed migration to CockroachDB for the new-verticals fulfillment backend.
- OpenStreetMap — keeps the full planet of roads, nodes, and ways in Postgres with the PostGIS extension, and serves routing queries via
pgRouting's Dijkstra implementation. - TimescaleDB users (Cloudflare, Speedcast) — store IoT telemetry and observability time-series in Postgres hypertables that auto-partition by time chunk.
- OpenAI-embedding RAG apps — store document chunks and their vector embeddings side-by-side in Postgres via the
pgvectorextension with HNSW indexes for sub-20ms similarity search. - Sentry — stores error events, issues, and the
nodestore_nodeblob table in self-hosted Postgres, where vacuum tuning is the famous operational pain point. - Coinbase — uses RDS Postgres alongside MongoDB and Redis in its Golang/Java backend services for transactional financial data needs.
- Airbnb — partitioned its main Postgres database to cut writes by 33% after profiling revealed message inbox traffic dominated the master.
- Apple FoundationDB Record Layer adopters — moved off Postgres at iCloud scale, demonstrating where Postgres' single-writer limit becomes the wall; an instructive negative example.
Where to go next¶
- 01-mvcc-wal-indexes.md — the journal, the snapshots, and the index structures that make all this work.
- 02-psql-schema-queries-explain.md — daily life in
psql, schema design, and readingEXPLAINoutput without panic. - 03-vacuum-replication-connection-pooling.md — keeping a busy cluster healthy: vacuum, replicas, PgBouncer.
- 04-vs-mysql-nosql-sharding-interview.md — interview angles: Postgres vs MySQL vs NoSQL, isolation levels, sharding tradeoffs.
Bridge. First we open the journal and the index. How does Postgres really keep everyone seeing a consistent world without locking each other out? That is MVCC + WAL. → 01-mvcc-wal-indexes.md