11. Data Access Patterns — pipes to the storage basement¶

~15 min read. Good services fail when their storage pipes leak silently.

Built on the ELI5 in 00-eli5.md. The plumbing — internal data flow — connects a clean room to the warehouse through a safe hallway.

1) Start with one rule: business code should not babysit SQL¶

See. A service class should decide business rules. It should not decide cursor lifetime, retry loops, and row mapping together. That is how one neat method becomes forty nervous lines. So what to do? Push storage details behind a narrow contract. The order service asks for orders. It does not care whether rows came from Postgres, Redis, or a read replica. That separation makes tests smaller. It also makes production incidents easier to isolate. Use this mental picture. ┌──────────────┐ │ OrderService │ └──────┬───────┘ │ ▼ ┌──────────────┐ │ Repository │ └──────┬───────┘ │ ▼ ┌──────────────┐ │ Query/Mapper │ └──────┬───────┘ │ ▼ ┌──────────────┐ │ Database │ └──────────────┘ Worked example. Assume checkout needs three facts. Order total is 850. Inventory available is 2 units. User credit limit left is 500. Business code should only compare these numbers. Storage code should fetch them, shape them, and persist changes. Concrete code-level sketch.

interface OrderRepository {
  findDraft(orderId: string): Promise<Order | null>
  save(order: Order): Promise<void>
}
class CheckoutService {
  constructor(private repo: OrderRepository) {}
  async confirm(orderId: string) {
    const order = await this.repo.findDraft(orderId)
    if (!order) throw new Error('draft missing')
    order.confirm()
    await this.repo.save(order)
  }
}

Simple, no? The service speaks domain words. The repository speaks storage words. That clean split is the first data access pattern.

2) Repository, DAO, Active Record, Data Mapper: same basement, different pipe fittings¶

Students mix these up because all of them touch the database. But their responsibilities differ. Repository is collection-like. DAO is operation-like. Active Record mixes object and persistence. Data Mapper keeps domain objects persistence-ignorant. Quick comparison first. ┌──────────────┬─────────────────────────────┐ │ Pattern │ Main job │ ├──────────────┼─────────────────────────────┤ │ Repository │ fetch aggregate-like objects│ │ DAO │ run table-oriented methods │ │ ActiveRecord │ object saves itself │ │ DataMapper │ separate mapper handles SQL │ └──────────────┴─────────────────────────────┘ Take a ledger entry example. Row has id 9001, amount 2500, currency INR. With DAO, you may write methods like insertEntry and findByAccountId. With Repository, you ask for LedgerAccount and save LedgerAccount. With Active Record, entry.save() may emit SQL directly. With Data Mapper, Entry stays plain and EntryMapper does persistence. Concrete DAO style.

interface LedgerEntryDao {
  LedgerEntryRow findById(long id);
  List<LedgerEntryRow> findByAccount(long accountId);
  void insert(LedgerEntryRow row);
}

Concrete Repository style.

interface LedgerAccountRepository {
  LedgerAccount load(AccountId id);
  void save(LedgerAccount account);
}

Concrete Active Record feel.

entry = LedgerEntry.find(9001)
entry.amount = 2600
entry.save

Concrete Data Mapper feel.

entry = LedgerEntry(id=9001, amount=2600, currency='INR')
entry_mapper.update(entry)

Which one when? Use DAO when the model is table-shaped and simple. Use Repository when you care about domain aggregates and invariants. Use Active Record for fast CRUD-heavy apps with modest domain rules. Use Data Mapper when the domain is rich and must stay clean. Why this matters in interviews. Interviewer is checking whether you can separate domain language from table language.

3) Unit of Work and connection management: one business action, one transaction story¶

Repository alone is not enough. Many use cases touch multiple tables. You cannot half-save money movement. Suppose transfer amount is 700. Wallet A starts at 3000. Wallet B starts at 900. After transfer, A should become 2300 and B should become 1600. If debit commits and credit fails, the service lies. Unit of Work groups changes into one commit boundary. Think of it as one supervisor for all write pipes in a request. ┌──────────────┐ │ Transfer API │ └──────┬───────┘ │ opens ▼ ┌──────────────┐ │ UnitOfWork │ ├──────────────┤ │ WalletRepo │ │ LedgerRepo │ └──────┬───────┘ │ commit/rollback ▼ ┌──────────────┐ │ Connection │ └──────────────┘ Concrete service flow.

with uow.start() as tx:
    from_wallet = tx.wallets.get('A')
    to_wallet = tx.wallets.get('B')
    from_wallet.debit(700)
    to_wallet.credit(700)
    tx.ledger.add('A', 'B', 700)
    tx.commit()

Where should connection management live? Usually at request or job scope. Not inside every repository method separately. If each repository opens its own connection blindly, cross-table consistency becomes hard. Also connection pools are finite. If pool size is 20 and one slow report grabs 20 long-lived connections, checkout traffic stalls. So what to do? Borrow late. Release early. Keep transactions short. Never do remote HTTP calls while holding a database transaction open. Use read-only connections for read paths when the stack supports it. That is disciplined connection management inside a service.

4) ORM tradeoffs, query builders, and raw SQL: choose by clarity, not fashion¶

ORMs remove repetitive mapping code. They also hide expensive queries when used carelessly. Classic failure is N+1. Suppose you load 100 orders. Then lazy-load items per order. One query becomes 101 queries. Latency jumps from 40 ms to 900 ms. Simple, painful, common. Diagram of the mistake. ┌──────────────┐ │ load orders │──→ 1 query └──────────────┘ │ ▼ ┌──────────────┐ │ load items │──→ 100 more queries └──────────────┘ Query builders sit in the middle. They keep SQL explicit but compose filters safely. Raw SQL is strongest when the query is analytics-heavy, vendor-specific, or performance critical. ORM is strongest when you need quick CRUD and predictable object mapping. One practical rule set. Use ORM for insert, update, and simple by-id reads. Use query builders for dynamic filters like status, city, date range, and pagination. Use raw SQL for window functions, CTE-heavy reports, and hand-tuned joins. Concrete comparison.

const rows = await db('orders')
  .select('id', 'status', 'total')
  .where('status', 'PAID')
  .whereBetween('created_at', [from, to])
  .orderBy('created_at', 'desc')
  .limit(50)

SELECT city, SUM(total) AS revenue,
       RANK() OVER (ORDER BY SUM(total) DESC) AS city_rank
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY city;

Interview maturity shows here. Do not sound religious. Say the abstraction level should match query complexity and observability needs. Also log query time, row count, and retry count. If you cannot see the pipes, you cannot fix the leak.

Where this lives in the wild¶

At Swiggy, a backend engineer designing the order service separates cart rules from MySQL transaction code. At Zerodha, a fintech platform engineer uses Unit of Work around ledger writes and reconciliation records. At Shopify, a staff engineer on inventory systems prefers Data Mapper for rich domain rules across warehouses. At Stripe, a payments engineer keeps raw SQL for high-risk settlement reports and uses repositories for payment objects. At PhonePe, a senior backend engineer tunes connection pool usage because peak traffic punishes idle transactions.

Pause and recall¶

When does a Repository communicate better than a DAO? Why can Active Record become messy in a rule-heavy domain? What problem does Unit of Work solve during multi-table writes? Why might raw SQL beat an ORM for a ranking report?

Interview Q&A¶

Why Repository not DAO?¶

Repository speaks in aggregates and domain intent, while DAO usually mirrors tables more directly. Common wrong answer to avoid: Repository is just a fancy DAO with better naming.

Why Unit of Work not separate save calls?¶

Unit of Work gives one transaction boundary, one connection story, and one rollback point. Common wrong answer to avoid: Multiple saves are fine if each repository method is small.

Why Data Mapper not Active Record?¶

Data Mapper protects a rich domain model from persistence concerns and test friction. Common wrong answer to avoid: Active Record is always simpler, so it is always better.

Why query builder not raw SQL everywhere?¶

Query builders help with safe dynamic filtering, but raw SQL stays clearer for very complex queries. Common wrong answer to avoid: Never write SQL by hand because ORMs already optimize everything.

Apply now (5 min)¶

Take a checkout use case with Order, Inventory, and Payment tables. Write one paragraph saying what belongs in the service, repository, and Unit of Work. Then list one query you would keep in an ORM and one you would write as raw SQL. Sketch from memory: draw the request flow from service to repository to mapper to database, and mark where the transaction begins and ends.

Bridge. Good storage pipes move data. Next, we study the special behavior wiring AI systems need around models. → 12-design-patterns-for-ai.md