Skip to content

00. Databases, Storage, and Caching — The Five-Year-Old Version

Every system produces data. Where it sits, how you find it, and how fast you reach it — that decides everything.


Imagine a huge library. Not a small one. One with millions of books, thousands of visitors per hour, and branches across the country.

Every book sits on a shelf. The shelf is physical storage. Big shelves hold many books but finding one takes time — you walk down long aisles, scan labels, pull the book. Small shelves near the front desk hold popular books for quick access.

To find a book, you check the card catalog. The card catalog is an index. It tells you exactly which shelf, which row, which position. Without it, you scan every book. With it, you walk straight to the right spot.

Some books are so popular that every visitor wants them. You make copies and put them at the reservation desk — right by the entrance. No walking to the shelves. Grab and go. That is caching. The reservation desk is faster but smaller than the main shelves.

Not every visitor comes to your main library. Some live across the city. So you build branch libraries — smaller copies of the main collection in different neighborhoods. Each branch library has the most popular books. If a visitor needs a rare book, the branch library requests it from the main collection. That is replication.

Books don't last forever. Outdated editions need replacing. Damaged books need repair. Lost books need tracking. You keep an overdue list — a record of every book that's been checked out, returned late, or gone missing. The overdue list is your transaction log. It tracks every change to the library's state.

Database design is library design. You choose shelves (storage engines), build card catalogs (indexes), set up reservation desks (caches), open branch libraries (replicas), and maintain overdue lists (transaction logs). Get any one wrong and the library either loses books or makes visitors wait in line for hours.


The placeholders you will see called back

Placeholder Meaning
shelf the storage engine — where data physically lives (B-tree, LSM, heap)
card catalog the index — the data structure that makes lookups fast
reservation desk the cache layer — Redis, Memcached, or in-memory structures for hot data
branch library replicas and partitions — copies of data in different locations
overdue list the write-ahead log (WAL) / transaction log — record of every change

Top resources


What's coming

  1. 01-relational-data-modeling.md — tables, normalization, foreign keys, and when to denormalize
  2. 02-nosql-document-keyvalue.md — MongoDB, DynamoDB, Redis — schema flexibility and access-pattern-first design
  3. 03-wide-column-and-graph.md — Cassandra, HBase, Neo4j — specialized shelves for specialized goods
  4. 04-storage-engines-btree-lsm.md — how data physically sits on disk and why it matters
  5. 05-indexing-and-query-plans.md — building the card catalog and reading EXPLAIN output
  6. 06-transactions-and-isolation.md — ACID, isolation levels, and what "consistent" really means
  7. 07-replication-strategies.md — leader-follower, multi-leader, leaderless, and the lag problem
  8. 08-partitioning-and-sharding.md — splitting the library by wing, floor, or letter
  9. 09-caching-patterns-deep-dive.md — cache-aside, write-through, invalidation, stampede, and TTL math
  10. 10-object-storage-and-data-lakes.md — S3, GCS, Parquet, and the data lake pattern
  11. 11-search-and-vector-stores.md — Elasticsearch, Pinecone, pgvector — full-text and semantic search
  12. 12-connection-pooling.md — PgBouncer, HikariCP, and why connections are expensive
  13. 13-cap-theorem-in-practice.md — what CAP actually means for real database choices
  14. 14-honest-admission.md — what we don't fully understand about data storage

Bridge. The library starts with its most fundamental design choice: how to organize the shelves. Tables, rows, and relationships — relational modeling. → 01-relational-data-modeling.md

One more thing. A smart librarian doesn't organize books randomly. High-demand books go on eye-level shelves near the entrance. Rare manuscripts go in the basement vault. The organization strategy depends on who's visiting and what they need. Read-heavy libraries look different from write-heavy archives. Understanding these access patterns is what separates a librarian from a pile of books on a floor.