06. How fresh is fresh enough — lambda, kappa, and the bill for data nobody queries¶

~24 min read. Finance pulls the platform's cloud invoice and circles one number: the streaming layer costs $41k/month and rising. Then they ask the question that has no slide in your architecture deck — "which of these always-on consumers, transforms, and compactions is serving a query a human is waiting on, and which is keeping cold data warm out of habit?" You do not have the answer, because you optimized every path for the same freshness. This file is how you stop doing that.

Built on the freshness gap, replay log, incremental indexing, modality cost asymmetry, and lambda vs kappa named in 00-first-principles.md. Chapters 02–05 made everything fresh; this file asks how fresh each path needs to be, and what maintaining one or two code paths costs to deliver it.

What chapters 02–05 settled, and the bill they quietly ran up¶

Chapter 02 gave us a replay log that absorbs bursts and lets us reprocess history. Chapter 03 split storage into cheap immutable raw and queryable derived artifacts. Chapter 04 ran ASR, vision, and embedding models in the stream with idempotent writes. Chapter 05 kept the index fresh with a growing segment and background compaction. Every one of those chapters optimized for the same thing: shrink the freshness gap. And they succeeded — the copilot sees a screenshot eight seconds after upload.

But each mechanism runs continuously and bills continuously. The consumers never sleep. The transform layer holds GPU warm. Compaction burns CPU around the clock to hold recall on segments that may never be queried again. The freshness gap is closed, but the platform is now paying the peak freshness price on every path — including the analytics dashboard that nobody looks at before 9 a.m., and the per-customer history that gets queried once a quarter for a billing dispute. The pressure has flipped. It is no longer "can we make it fresh?" It is "we made everything fresh; what are we paying to over-deliver, and how many separate code paths do we maintain to deliver it?"

What this file solves¶

A platform that makes every path equally fresh pays peak-freshness cost on data that is queried rarely or never, and the team must decide whether to run one always-on streaming path or a fast streaming path plus a separate cheap batch path — a choice that trades cloud spend against the number of code paths and reconciliation logic they maintain. This file shows how to turn "fresh enough" into a per-path latency budget, how the lambda (two paths) and kappa (one path) architectures answer the duplication question differently, and how the modern lakehouse-as-log variant lets you collapse toward kappa without losing the cheap correct reprocessing batch used to give you for free.

1) Why "make it fresh" stops being the goal — the need for a per-path freshness budget¶

The earlier chapters treated freshness as a single dial to turn up. That works until you notice the platform has many read paths with wildly different urgency, all paying the same always-on cost.

Look at the actual queries hitting the platform:

READ PATH                          WHO WAITS              HOW FRESH IT NEEDS
 live copilot retrieval             a human, mid-call      seconds   (the original pressure)
 agent "what's the sentiment        a human, mid-call      seconds
   trend this call?"
 supervisor live dashboard          a human, glancing      ~minutes
 daily CSAT / volume report         nobody, runs at 6am    hours is fine
 quarterly billing dispute lookup   a human, occasionally  hours; queried once a quarter
 monthly model-retraining export    a batch job            a day is fine

Three of these need seconds. Three are happy with hours or a day. The earlier chapters built one pipe and gave all six the seconds-fresh treatment, because that pipe was the only one we had. The visible symptom is the invoice: always-on consumers, warm GPU, and continuous compaction, charged against paths where a nightly run would be indistinguishable to the user.

So the real problem is not "is the data fresh?" It is we are paying peak freshness cost on paths whose users would never notice a slower one. Which raises the natural question: how do you let the urgent paths stay seconds-fresh while the patient paths get cheap, without building and maintaining two completely separate platforms?

Why this rule exists. Freshness has a cost curve that is roughly inverse to latency: every halving of the freshness gap roughly doubles the always-on machinery you must keep running. Spending peak freshness on a path whose user tolerates hours is pure waste — you pay the steep part of the curve for a latency nobody perceives. A freshness budget per path is the only way to spend the expensive end of the curve only where a human is actually waiting.

2) The naive repair, and where it breaks: "just turn off freshness for the cheap paths"¶

The obvious fix: keep the streaming pipe for the copilot, and for the cheap reports, go back to a nightly batch job over the raw data. Two pipes. The streaming one for the urgent paths, the batch one for everything else. This is exactly the lambda architecture — a speed layer (streaming, fast, approximate) and a batch layer (slow, complete, authoritative), with a serving layer that merges them.

It works, and for years it was the standard answer. But it breaks in a specific, expensive way: you now maintain the same business logic twice. The transcript-cleaning rule, the PII-masking step, the way you chunk a conversation, the embedding model version — all of it must exist in both the streaming code and the batch code, and the two must stay in sync. They never do. The streaming path gets a hotfix; the batch path doesn't; the nightly report disagrees with the live dashboard, and now you are debugging why two pipelines that should agree don't.

So the root cause of the pain is not "freshness is expensive." It is two code paths computing the same thing drift apart, and reconciling them costs more than the compute ever did. The real question becomes: can we have one code path that serves both the urgent and the patient reads — fresh when needed, cheap to reprocess — without maintaining a second pipeline?

LAMBDA (two paths)                          THE DRIFT FAILURE
 speed layer  ─▶ fast, approximate ─┐        chunking rule changes in streaming code
 batch layer  ─▶ slow, complete  ───┼─▶ merge but NOT in batch code
                                     │        → nightly report disagrees with live view
 same logic written twice ──────────┘        → days lost finding "why don't they match?"

3) The running example: the copilot's six read paths, budgeted¶

Take the platform's real read paths and assign each a freshness budget and the machinery it justifies.

READ PATH                     BUDGET (Δ)    PATH IT JUSTIFIES                         COST POSTURE
 live copilot retrieval        ≤10 s         streaming: consumer→transform→index       peak (worth it)
 live sentiment on this call   ≤10 s         streaming: same pipe                      peak (shared)
 supervisor live dashboard     ≤2 min        streaming or 1-min micro-batch            medium
 daily CSAT report             ≤6 h          batch / scheduled query over lakehouse    cheap
 quarterly billing lookup      ≤6 h          batch query over cold tier (ch.03)        cheap, cold
 monthly retraining export     ≤24 h         batch read of raw + derived               cheapest

The three live paths share one streaming pipe — the same consumers, transforms, and index from chapters 02–05. The three patient paths do not need their own pipeline; they can read the same derived tables the streaming path already wrote, on a schedule. That is the key move: the streaming path lands its results in the lakehouse (chapter 03's derived zone), and the cheap paths query that landed data later instead of recomputing it. One write, many reads at different freshness.

Now trace the cost. The copilot's path justifies always-on consumers and continuous compaction — a human is mid-call. The daily CSAT report, if built as a second streaming aggregation, would run a stateful Flink job 24/7 to produce a number read once at 6 a.m. Built instead as a scheduled query over the already-landed transcripts table, it costs one query's compute per day. The freshness budget told us the report does not justify an always-on path; the landed derived table let the cheap path reuse the streaming path's output instead of duplicating its logic.

This is kappa thinking: one streaming path computes and lands everything; slower consumers read the landed data at whatever freshness they need, including a batch query. No second pipeline of business logic.

4) Rule: one source of logic, many freshness tiers reading from it¶

The chapter's invariant: maintain one code path that computes derived artifacts and lands them durably; let each read path choose its freshness by when it reads, not by recomputing in a parallel pipeline. Freshness becomes a read-time choice over a single written truth, not a compute-time fork into two pipelines that drift.

Kappa achieves this by making the streaming pipeline the only pipeline. Reprocessing history is not a separate batch layer — it is replaying the log (chapter 02) or re-reading the landed lakehouse tables through the same code. There is one chunking rule, one PII step, one embedding version, because there is one pipeline. A report that wants yesterday's data reads yesterday's landed rows; the copilot that wants the last ten seconds reads the growing index segment. Same logic produced both.

KAPPA (one path, many freshness tiers)
                                            ┌─ live copilot      reads growing seg   (Δ ≤ 10s)
 log ─▶ ONE streaming pipeline ─▶ derived ──┼─ live dashboard    reads recent landed (Δ ≤ 2m)
        (transform, embed, land)   tables   ├─ daily report      queries landed @6am (Δ ≤ 6h)
                                            └─ retraining export  reads landed monthly(Δ ≤ 24h)
        reprocess = replay the log / re-read landed tables through the SAME code
        no second pipeline, no drift, freshness is a read-time choice

Teacher voice. Lambda asks "fast or correct?" and answers "both, in two pipelines." Kappa asks "why are there two pipelines?" and answers "there aren't — there is one pipeline and many readers." The thing that made lambda necessary was that early stream processors couldn't reprocess history correctly, so you needed a batch layer to be the authority. Once the log is durable and replayable and the lakehouse holds the landed truth, the batch layer's only job — correct reprocessing — is something the streaming code can now do by replaying. The second pipeline becomes redundant logic you maintain for no reason. If you find yourself writing the same transform twice, stop and ask whether replay can do what the batch layer was doing.

5) Lambda vs Kappa vs lakehouse-as-log — which, under this workload¶

Three architectures answer the duplication question differently. The choice decides how many code paths you maintain and how reprocessing works.

Lambda — speed layer plus batch layer, merged at serve¶

Helps: the batch layer is an unarguable authority; if the streaming path has a bug, the nightly batch recomputes the correct answer from scratch over all raw data. Reprocessing is trivial because batch is reprocessing.

Hurts: every transform exists twice (stream + batch) and drifts; the serving layer must merge two answers that disagree; two systems to operate, monitor, and reason about. The operational tax is the duplicated logic, not the compute.

Use when: your stream processor genuinely cannot reprocess history correctly (legacy, no durable replayable log), or regulators demand a separate authoritative batch recomputation independent of the streaming path.

Kappa — one streaming pipeline, replay for reprocessing¶

Helps: one code path, no drift, one system of logic; reprocessing is replaying the log or re-reading landed tables through the same code; freshness is a read-time choice. The 2026 default for new builds.

Hurts: reprocessing large history by replaying a long log can be slow and expensive (re-running every model call over months of audio); the log's retention bounds how far back you can replay unless you also keep landed history.

Use when: you have a durable replayable log (chapter 02) and want one source of logic — almost every new streaming-AI platform, including this one.

Helps: an Iceberg/Delta/Hudi table acts as both sink and source. The streaming path lands into Bronze tables; downstream Silver/Gold transforms read those tables in micro-batch or continuous mode. The object store (S3/ADLS) becomes an infinite-retention log, so you replay from the lakehouse instead of a retention-bounded Kafka topic. ACID upserts make this safe. You get kappa's single logic path and cheap long-history reprocessing without keeping months in Kafka.

Hurts: micro-batch latency between table layers (seconds to minutes) if you chain table-to-table transforms, so the most latency-sensitive hop (the copilot's index) may still read from the stream directly, not from a downstream table.

Use when: you want kappa's simplicity but need to reprocess months of history cheaply — which is this platform, since retraining exports and audits reach far back.

The choice for the running platform¶

Kappa, refined toward lakehouse-as-log. One streaming pipeline (chapters 02–05) lands transcripts, captions, and embeddings into Iceberg/Delta derived tables and the vector index. The live copilot reads the fresh index directly (seconds). The patient paths query the landed Iceberg tables on a schedule (hours). Reprocessing a changed embedding model reads raw audio from the cold tier (chapter 03) and re-runs the same transform code — no second pipeline. Lambda's two-path drift is avoided; the batch layer's old job (authoritative reprocessing) is done by replay.

Mini-FAQ. "If kappa is the default, why does lambda still exist?" Two reasons. First, a lot of platforms predate durable replayable logs and lakehouse table formats, so their stream processor can't reprocess correctly and the batch layer is still load-bearing. Second, some regulated environments want an independent batch recomputation as a cross-check on the streaming path — the duplication is the point, an audit control, not an accident. For a greenfield AI platform with a durable log and a lakehouse, neither reason applies.

6) The property that changes the design: query rate decides what stays warm¶

The dimension that reshapes cost is not freshness alone — it is freshness × query rate. A path that is both urgent and frequently queried (the copilot) justifies peak always-on machinery. A path that is urgent but rarely queried is the trap: you kept it hot for a query that almost never comes.

The clearest example is the index (chapter 05). Recent data is queried constantly by the live copilot, so it must stay in the hot, compacted, always-fresh growing-and-sealed segments. But a customer's interaction from 200 days ago is queried maybe once, for a dispute. Keeping that vector in a hot HNSW segment, compacted forever to hold recall nobody is checking, is pure waste.

So freshness must decay with age, matching the query rate:

DATA AGE        QUERY RATE      FRESHNESS / STORAGE TIER                COST
 0–7 days        very high       hot index, compacted, fresh (seconds)   peak — worth it
 7–90 days       low             warm index, compaction relaxed          medium
 90 days–1 yr    rare            cold: vectors in cheap store, no live    low; rehydrate on demand
                                  HNSW; lakehouse query path only
 > 1 yr          almost never    raw in Glacier (ch.03), derived dropped  cheapest; re-derive if needed

This is the same tiering chapter 03 applied to storage, now applied to freshness machinery. The pressure evolution: tiering relieves always-on cost (you stop compacting cold segments) but creates a rehydration pressure — a rare query against 200-day-old data must rebuild or scan a cold structure, paying a latency spike that one query can afford. The cold tier's slow read is the cost the rare query absorbs so the common query stays cheap.

Teacher voice. The cost mistake that survives longest is uniform freshness — treating ten-second-old data and two-hundred-day-old data with the same always-on machinery because the pipeline doesn't know the difference. The fix is to let freshness and the machinery that maintains it decay with query rate. Hot data gets the expensive treatment; cold data gets a slow read path it pays for only when someone actually asks. If your compaction CPU is flat across all data ages, you are paying to keep cold data warm.

7) Cost table: freshness paths under this workload¶

Order-of-magnitude, monthly, for the running platform (~12k calls + 80k chats + uploads/day). Verify against your bill.

Path / posture	Freshness (Δ)	Always-on cost	Code paths	When to use
Everything streaming, uniform freshness	seconds for all	highest (~$41k, the symptom)	one	the accidental default that overspends
Lambda: stream + nightly batch	seconds / 24h	high + drift tax	two (drift)	only when batch must be an independent authority
Kappa: one stream, readers pick freshness	seconds / hours by reader	medium	one	default — new AI platforms
Kappa + freshness tiering by age	seconds hot / slow cold	lowest sustainable	one	this platform — decay machinery with query rate
Pure batch (nightly only)	24h	lowest	one	when no human waits — analytics-only, no copilot

Row four is the target: one code path (no lambda drift), freshness budgeted per read path, and the always-on machinery decaying with data age so cold data isn't compacted into the void. The savings come from not doing work — not compacting cold segments, not running a second pipeline, not keeping warm what nobody queries.

Concrete: the daily CSAT report built as an always-on streaming aggregation runs a stateful job 24/7 (~$1.5–3k/month) to produce a number read once. Built as a scheduled query over the already-landed Iceberg transcripts table, it costs one query's compute per day (cents to a few dollars). Same number, ~99% less spend, because the freshness budget said hours was fine and the landed table let the cheap path reuse the streaming path's output.

8) Operational signals: watching freshness-vs-cost¶

Healthy: cost per read path roughly proportional to that path's query rate × freshness need; compaction CPU concentrated on hot/recent segments and near-zero on cold; the live paths meet their seconds budget while batch paths run on schedule and idle between runs.
First metric to degrade: cost-per-useful-query, i.e., always-on spend divided by queries actually served. It climbs when freshness machinery runs for data that is rarely or never queried — the number rises silently while every latency SLO still looks green. This is the leading indicator of over-freshness.
Misleading metric people watch: end-to-end freshness (Δ) and latency. They stay excellent precisely because you are over-spending; great freshness on a never-queried path is the symptom, not the health. Low Δ everywhere reassures while the bill climbs.
First graph an expert opens: always-on cost broken down by read path, overlaid with queries-served per path. They look for paths with high cost and near-zero queries (warm cold-data, redundant streaming aggregations) and for two pipelines computing the same aggregate (lambda drift) — the spend with no reader is the waste.

9) Boundary: where streaming-everything fits, and where it's over-engineering¶

Strong fit: a platform with genuinely urgent, frequently-queried read paths — the live copilot — where seconds of freshness change the answer. Kappa with one logic path and tiered freshness is the clean answer; the always-on cost is justified by the human waiting.
Pathological: running an always-on streaming pipeline for read paths that no human waits on — nightly reports, monthly exports. This is where streaming is over-engineering: you pay 24/7 to serve a query that runs once at 6 a.m. A scheduled batch query over landed data is correct and cheaper.
Scale/workload limit that breaks intuition: at low query rates, batch beats streaming on cost by a wide margin, and the "real-time everything" instinct is exactly wrong. The intuition "streaming is the modern way, batch is legacy" fails: streaming is the modern way for urgent frequent reads; for patient or rare reads, batch is not legacy, it is the right cost posture. Kappa does not mean "stream every read" — it means "one logic path, readers choose freshness," and some readers choose batch.

10) Wrong model to drop: "kappa means everything is real-time"¶

The seductive idea is that adopting kappa means every read path is streaming and seconds-fresh, and batch is obsolete. It feels modern. The correct model: kappa means one code path computes and lands the truth; freshness is a read-time choice, and many readers legitimately choose batch. Kappa eliminates the second pipeline of logic, not the batch read pattern. A daily report querying a landed Iceberg table on a schedule is a perfectly kappa-compatible batch read — it reuses the single streaming pipeline's output instead of recomputing it. The thing kappa kills is duplicated business logic (lambda's drift), not the existence of slow, cheap, scheduled queries. Treating "kappa" as "stream everything" reintroduces the over-freshness cost the architecture was supposed to remove.

11) Other freshness-vs-cost failure shapes¶

Uniform over-freshness — every path gets seconds-fresh machinery; cost scales with peak need, not query rate; the invoice is the symptom.
Lambda drift — same transform in stream and batch code diverges; nightly report disagrees with live dashboard; days lost reconciling.
Warm cold-data — old segments compacted forever to hold recall nobody checks; compaction CPU flat across all data ages.
Redundant streaming aggregation — a report built as an always-on stateful job to produce a number read once a day; should be a scheduled query over landed data.
Retention-bounded replay — kappa reprocessing fails because the Kafka log aged out the history; needed lakehouse-as-log for long replay.
Rehydration storm — too many rare cold queries hit at once, each rebuilding a cold structure; the cost moved to read time and now spikes.
Micro-batch chain latency — chaining Bronze→Silver→Gold table transforms adds minutes; the copilot's latency-critical hop must read the stream directly, not a downstream table.
Freshness theater — paying for seconds-fresh on a path whose UI refreshes every 30 seconds anyway; the user can't perceive the freshness you bought.

12) Pattern transfer¶

Freshness tiering = storage tiering (chapter 03) — decaying freshness machinery with query rate is the same hot/warm/cold shape chapter 03 applied to bytes; same constraint (cost ∝ keeping things ready), now applied to compute and compaction instead of storage class.
One-logic-path = single source of truth — kappa's no-drift property is the same invariant as a single source of truth in any system: two copies of logic diverge exactly like two copies of data; reconciliation cost dominates compute cost.
Replay-to-reprocess = the replay log's payoff (chapter 02) — kappa replaces lambda's batch layer with replay, which only works because chapter 02 made the log durable and replayable; the storage chapter's immutable raw (03) makes re-deriving safe.
Query rate decides warm/cold = working-set / caching — keeping hot data fresh and cold data slow is the same working-set principle as a CPU cache or a CDN: spend on what is accessed, let the rest pay a slow path on demand.

13) Design test¶

Does every read path have an explicit freshness budget tied to who waits and how often it's queried — not "fresh by default"?
Is there exactly one code path computing derived artifacts, with readers choosing freshness at read time (kappa), rather than two pipelines computing the same thing (lambda drift)?
Do patient/rare read paths query landed data on a schedule instead of running their own always-on streaming aggregation?
Does freshness machinery (compaction, hot index) decay with data age and query rate, so cold data isn't kept warm for queries that never come?
Can you reprocess history by replaying the log or re-reading landed tables through the same code — and does retention reach far enough (lakehouse-as-log) for your oldest audit?

Where this appears in production¶

Architecture patterns and the lambda→kappa shift: - Apache Kafka + Apache Flink — the canonical kappa stack: durable replayable log plus stateful stream processor, one pipeline for real-time and replayed-historical work. - LinkedIn — origin of the kappa idea (Kreps): one log, reprocess by replay, no separate batch layer. - Uber — moved heavy real-time feature computation onto streaming (Flink) while keeping batch for the patient analytics paths — freshness budgeted per use. - Netflix — streaming for personalization freshness, scheduled batch for the reporting paths that no human waits on. - RisingWave / Materialize — streaming databases that serve real-time materialized views by SQL, collapsing speed + serve into one system. - Databricks Delta Live Tables — Bronze/Silver/Gold table chain that is the lakehouse-as-log pattern: stream into Bronze, micro-batch downstream.

Lakehouse-as-log and freshness tiering: - Apache Iceberg / Delta Lake / Apache Hudi — ACID table formats acting as both streaming sink and source, making S3/ADLS an infinite-retention replay log for cheap long-history reprocessing. - Confluent Tableflow — materializes Kafka topics directly as Iceberg tables, unifying the log and the lakehouse for kappa reprocessing. - Snowflake / BigQuery scheduled queries — patient reports run as scheduled queries over landed tables instead of always-on streaming aggregations. - Milvus / Qdrant tiered storage — hot vectors in RAM/SSD, cold vectors on cheap storage, matching the freshness-decays-with-query-rate pattern at the index. - Pinterest / Spotify — real-time embedding/feature freshness for serving, batch recomputation for training exports — different freshness tiers off shared landed data. - AWS Glue + scheduled jobs — batch reads of landed lakehouse data for retraining exports where 24h freshness is fine and streaming would be waste. - Stripe / fraud platforms — seconds-fresh streaming for the scoring path that blocks a transaction, batch for the analytics no one waits on.

Pause and recall¶

Why does "make it fresh" stop being the goal once the platform has many read paths?
What is the specific failure of the lambda architecture, and why is it about code paths rather than compute?
State the chapter's invariant. How does kappa make freshness a read-time choice instead of a compute-time fork?
What does the lakehouse-as-log variant add over plain kappa, and what problem of Kafka retention does it solve?
Why does query rate (not freshness alone) decide what machinery stays warm, and what new pressure does freshness tiering create?
Why is running an always-on streaming aggregation for a daily report over-engineering, and what's the cheaper shape?
Which metric rises silently as you over-deliver freshness, and which comforting metric hides it?
Why does "kappa means everything is real-time" misread the architecture, and what does kappa actually eliminate?

Interview Q&A¶

Q1. The streaming platform's bill is climbing and finance wants to know why. How do you diagnose over-freshness? A. Compute cost-per-useful-query per read path: always-on spend divided by queries actually served. Paths with high always-on cost and near-zero query rate are the waste — a daily report running as a 24/7 streaming aggregation, cold segments compacted forever, or two pipelines computing the same aggregate (lambda drift). The fix is a freshness budget per path: only urgent, frequently-queried paths justify always-on machinery; patient paths query landed data on a schedule. Common wrong answer to avoid: "Check latency and freshness SLOs." They're all green — that's the symptom. Great freshness on a never-queried path is the over-spend, not the health.

Q2. Lambda or kappa for a new support-copilot platform — defend a choice. A. Kappa, refined toward lakehouse-as-log. One streaming pipeline lands transcripts/embeddings into Iceberg tables and the vector index; the copilot reads the fresh index, patient reports query landed tables on a schedule. Reprocessing is replay through the same code — no second pipeline to drift. Lambda's only real advantage (an independent authoritative batch layer) isn't needed here because the durable replayable log plus lakehouse already give correct reprocessing. Common wrong answer to avoid: "Lambda, so we always have a correct batch fallback." That fallback costs you maintaining every transform twice; with a durable log, replay gives the same correctness without the drift tax.

Q3. Your daily CSAT report and your live dashboard show different numbers. What's the architecture smell? A. Lambda drift: the same aggregation logic exists in a streaming path (dashboard) and a batch path (report), and they diverged — a rule changed in one and not the other. The fix is to collapse to one logic path (kappa): compute the aggregate once in the streaming pipeline, land it, and have both the dashboard and the report read the landed result at their own freshness. One computation, two reads. Common wrong answer to avoid: "Reconcile the two pipelines and add tests." That treats the symptom; maintaining two computations of the same thing guarantees recurring drift. Remove the second computation.

Q4. Why does freshness need to decay with data age, and what does that cost? A. Query rate drops sharply with age — recent data is hit constantly by the copilot, 200-day-old data maybe once a quarter. Keeping old vectors in a hot, continuously-compacted index pays peak machinery for queries that almost never come. Decaying freshness (warm then cold tiers) stops compacting cold data; the cost moves to rehydration — a rare query against cold data pays a latency spike, which a once-a-quarter query can afford. Common wrong answer to avoid: "Keep everything hot so any query is fast." That pays peak cost for the rarest queries; the right trade is a slow read path that the rare query pays for itself.

Q5. When is streaming the wrong choice entirely? A. When no human waits on the read path — nightly reports, monthly retraining exports, quarterly audits. An always-on streaming pipeline for a query that runs once at 6 a.m. is pure over-spend; a scheduled batch query over landed data is correct and far cheaper. "Streaming is modern, batch is legacy" is wrong: batch is the right cost posture for patient and rare reads. Common wrong answer to avoid: "Stream everything for consistency." Uniform streaming pays peak cost on paths nobody waits on; kappa means one logic path, not one freshness for all readers.

Q6. (Cumulative) The copilot is fresh and correct, but the monthly retraining export is missing six months of audio because the Kafka log only retains 7 days. Is this a chapter-02, chapter-03, or chapter-06 issue? A. It surfaces at chapter 06 (reprocessing/replay) but the root is the chapter-02 log's bounded retention colliding with a long-history read. The fix is the lakehouse-as-log refinement: land raw and derived into Iceberg/Delta (chapter 03's immutable raw) so replay reads months of history from the object store, not the retention-bounded Kafka topic. Kappa reprocessing needs an infinite-retention source for long history — that's what chapter 03's storage provides. Common wrong answer to avoid: "Increase Kafka retention to a year." Expensive and fragile; the lakehouse-as-log pattern is the standard fix — replay from cheap object storage, not from the hot log.

Design/debug exercise (10 min)¶

Step 1 — Modeled example. Freshness budget for one read path:

Read path:    daily CSAT report
Who waits:    nobody (runs 6am, read by analysts later)
Query rate:   1/day
Budget (Δ):   ≤ 6 h
Path chosen:  scheduled query over landed Iceberg transcripts table (NOT a streaming job)
Logic source: reuses the one streaming pipeline's landed output (kappa, no second pipeline)
Cost posture: one query's compute/day; no always-on machinery
Anti-pattern: building this as a 24/7 stateful Flink aggregation → ~99% wasted spend

Step 2 — Your turn. Build the freshness-budget table for the supervisor live dashboard (refreshes every ~60 s on screen, watched intermittently during business hours) and the quarterly billing-dispute lookup (a human searches one customer's full history, ~once a quarter, can wait minutes). For each decide: budget Δ, who waits, query rate, streaming vs landed-query vs cold-tier read, and what would make it over-engineered.

Step 3 — Reproduce from memory. Redraw the section-4 kappa diagram (one streaming pipeline → derived tables → many readers at different freshness, reprocess-by-replay), label which reader reads the live index vs landed tables, and write one sentence connecting freshness tiering here to chapter 03's storage tiering and one connecting kappa's replay to chapter 02's durable log.

Operational memory¶

This chapter explained why a platform that makes every read path equally fresh ends up paying peak always-on cost on data that is queried rarely or never — and why the tempting fix, a separate batch pipeline (lambda), trades that cost for a worse one: maintaining the same business logic twice and watching the two copies drift. The important idea is one logic path, many freshness tiers reading from it (kappa), not two pipelines computing the same thing.

You learned to give each read path an explicit freshness budget tied to who waits and how often it's queried, to land the single streaming pipeline's output durably so patient readers query it on a schedule instead of recomputing it, to refine kappa toward lakehouse-as-log so long-history reprocessing replays from cheap object storage, and to decay freshness machinery with data age so cold data isn't compacted into the void. That turns the $41k uniform-freshness invoice into spend that tracks query rate, with no lambda drift.

Carry this diagnostic forward: when the bill climbs while every SLO is green, compute cost-per-useful-query and hunt for high-cost, low-query paths; when two reports disagree, suspect lambda drift and collapse to one computation; when compaction CPU is flat across all data ages, you're keeping cold data warm. Freshness is a read-time choice over one written truth — spend the expensive end of the curve only where a human is waiting.

Remember:

Give every read path a freshness budget tied to who waits and how often it's queried; "fresh by default" overspends.
Lambda's cost is duplicated logic that drifts, not compute; kappa keeps one code path and lets readers choose freshness at read time.
Kappa eliminates the second pipeline, not the batch read — a scheduled query over landed data is perfectly kappa.
Decay freshness machinery with data age and query rate; the rare cold query pays its own slow read instead of keeping everything warm.
Reprocess by replaying the log / re-reading landed tables; lakehouse-as-log gives the long retention plain Kafka can't.

Bridge. We can now spend freshness only where it's worth it — one logic path, budgeted reads, machinery that decays with query rate. But that single pipeline now carries every interaction through every layer, which means it also carries every customer's name, card number, and face through ASR, embeddings, and an index that a cross-customer query can reach. The schema those events use will change under us, an auditor will eventually ask "where did this retrieved chunk come from and can you delete it," and PII hides differently in audio, an image, and text. So the question shifts from "how fresh and how cheap?" to "can we trace, validate, and erase what flows through this pipe — across modalities?" The next file builds governance, lineage, and data quality on streams. → 07-governance-lineage-and-quality.md