01. Why the nightly warehouse makes a copilot lie — freshness as a budget, not a slogan¶
~20 min read. Your support copilot just cited a chat that happened an hour ago as if it never happened. The data was in the warehouse. It just was not there yet. This file turns that failure into a number you can defend in a design review.
Built on the freshness gap named in 00-first-principles.md. That overview promised the gap is the dominant pressure of the whole module. This file makes you feel it on a real copilot answer and then forces you to write the gap down in seconds, not adjectives.
What you already know about pipelines, and what a copilot breaks¶
You have built batch pipelines before, or read the data-platform module next door. You know the shape: sources drop files, an orchestrator wakes up on a schedule, a transform engine crunches the pile, and a warehouse exposes clean tables by morning. That shape is correct for payroll, for the weekly revenue report, for the model-training dataset you rebuild every night. It is cheap, it recovers cleanly, and a few hours of lag costs nobody anything.
A GenAI copilot breaks that shape on one assumption: that lag is free. The copilot does not read a dashboard a human will glance at after coffee. It answers a live human during the interaction, and it grounds every answer in retrieval over recent data. If the retrieval index is eighteen hours behind the conversation, the copilot is not "a little stale." It is confidently wrong, because it will happily synthesize a fluent answer from whatever it can see and present it with the same tone it uses for fresh facts. This chapter is about why that happens and how to size the fix.
What this file solves¶
A copilot can answer fluently while citing data that is hours out of date, because the warehouse it retrieves from refreshes on a nightly schedule and nobody wrote down how fresh the data actually needs to be. This file shows how to trace a stale answer back to the exact scheduling decision that caused it, how to express the required freshness as a latency budget per data source, and how to decide — with numbers, not fashion — when streaming earns its complexity and when the nightly batch was right all along.
The mystery: a correct pipeline, a wrong answer¶
Start with the artifact. Here is a real copilot turn, reconstructed from a trace, on the angry-customer call from the overview.
14:32 Customer (voice): "I've tried this payment three times, it keeps failing,
I already chatted with your bot an hour ago about it."
14:32 Copilot retrieval query: "payment failure customer_id=88213 recent"
14:32 Retriever returns 4 chunks:
- chunk A: a help-center article on card declines (indexed 6 months ago)
- chunk B: this customer's order history (warehouse, refreshed 02:00)
- chunk C: a chat from 9 days ago about a refund (warehouse, refreshed 02:00)
- chunk D: —— (the 13:30 chat about THIS failure is not in the index)
14:32 Copilot: "I don't see any recent contact about a payment issue on your
account. Can you walk me through what happened?"
The customer hears "I don't see any recent contact" ninety minutes after she chatted about exactly this. She repeats herself, furious. The copilot did nothing wrong by its own logic — it retrieved, it grounded, it answered. The pipeline did nothing wrong either: every job succeeded, no alert fired, the warehouse tables are correct. The 13:30 chat is in the warehouse… it will land in the retrieval index at 02:00 tomorrow.
So the real problem is not a bad model and not a broken pipeline. The real problem is that the freshest data the copilot can retrieve is whatever the last batch run loaded — and the batch run is scheduled for the middle of the night. The data exists; it is simply not reachable in time.
Why this rule exists. A copilot's quality is bounded by its retrieval index, and a retrieval index is only as fresh as the last write into it. Batch scheduling decides that write cadence. So the scheduler — a thing nobody thinks of as an "AI" component — silently sets a ceiling on copilot intelligence. The model can be GPT-class and still answer like an amnesiac if the index it reads is twelve hours behind the conversation.
So how do we stop the scheduler from capping the copilot? Before reaching for any streaming technology, we have to know the actual number we are trying to hit.
1) The freshness gap, defined so you can measure it¶
The freshness gap is the wall-clock time between an event happening and that event being retrievable by the copilot. It is not one number; it is a chain of delays, and the total is what the customer feels.
event ingest store transform index retrievable
t0 ──Δi──▶ t1 ──Δs──▶ t2 ──Δt──▶ t3 ──Δx──▶ t4
│ │
└──────────────── freshness gap = t4 − t0 ───────────────────┘
Δi arrival → landed in ingestion log
Δs log → durable storage
Δt raw → derived artifact (transcript, caption, embedding) ← biggest variable
Δx artifact → searchable in the index
For the nightly warehouse, the chat hits the database in milliseconds (Δi, Δs small), but Δx is dominated by when the batch runs, not how long it takes. A chat at 13:30 waits 12.5 hours for the 02:00 job. The gap is ~12.5 hours, almost entirely scheduling latency. The processing was never the bottleneck; the cadence was.
This is the first correction to make. Engineers reach for "make the batch faster" — bigger cluster, tuned SQL — and the gap barely moves, because the gap is mostly waiting for the next scheduled run, not the run's duration. Shaving a 40-minute batch to 20 minutes turns a 12.5-hour gap into a 12.5-hour gap. The cadence is the lever.
Teacher voice. Write the freshness gap as a chain, never as a single adjective. When someone says "we need it real-time," ask which Δ they mean. Most "real-time" requests are really "Δx is killing us because the batch runs once a night" — and the fix is a smaller cadence or a streaming index update, not a rewrite of the whole platform.
2) Picture: the same event through two platforms¶
NIGHTLY BATCH STREAMING
(gap ≈ hours) (gap ≈ seconds)
13:30 chat ─┐ 13:30 chat ─┐
14:01 chat ─┤ pile up in DB 14:01 chat ─┤ each flows immediately
14:32 chat ─┘ all day 14:32 chat ─┘
│ │
╔═════▼═════╗ wakes at 02:00 ┌─────▼─────┐ continuous
║ BATCH ║ next day │ STREAM │ consumer
║ JOB ║ │ PROCESSOR│ always running
╚═════╤═════╝ └─────┬─────┘
│ │ ~2-30 s later
┌─────▼─────┐ ┌─────▼─────┐
│ INDEX │ fresh at 02:40 │ INDEX │ fresh at 14:32:08
└───────────┘ for yesterday └───────────┘ for right now
The difference is not speed of computation. Both can crunch a chat in milliseconds. The difference is when work starts. Batch starts on a clock. Streaming starts on an event. That single shift — from clock-triggered to event-triggered work — is the entire conceptual move of this module. Everything after this chapter is the machinery that makes event-triggered work survive bursts, failures, cost, and four different modalities.
3) The running example: live support interactions for a copilot¶
This is the example threaded through every file in the module. Hold it in your head now.
The platform. A SaaS company runs customer support across three channels. Voice calls produce audio (~12,000 calls/day, average 6 minutes). The chat widget produces text (~80,000 messages/day, bursty). Customers upload screenshots of errors (~9,000 images/day). All three must become retrievable for a support copilot within seconds of happening, so that whichever channel the customer touches next, the copilot already knows the story.
Notice the shape of the problem before any technology: three modalities, three volumes, three costs-to-usable. Text is nearly free and already searchable. Audio is useless until transcribed — and that transcription is the single most expensive, slowest step in the pipe. Images need a caption or an embedding before a retriever can find them. The freshness gap for each modality is dominated by a different Δ. For text, Δx (indexing cadence). For audio, Δt (running ASR). For images, Δt (embedding). One platform, three different bottlenecks.
Attempt A — the tempting fix: just schedule the batch more often¶
The on-call engineer's first move after the angry-customer incident is obvious: run the batch every 15 minutes instead of nightly. It even works for a week. The freshness gap for chat drops from ~12 hours to ~15 minutes. Everyone relaxes.
Then Monday morning arrives. Monday 09:00–11:00 is the weekly call surge — three weeks of post-weekend problems hit at once. The 15-minute batch now has to transcribe a backlog of calls that piled up since the last run, and transcription is slow. The 09:00 run does not finish until 09:22. The 09:15 run cannot start because the 09:00 run still holds the cluster. Runs queue. By 10:00 the "15-minute" freshness gap is 50 minutes and growing, exactly when load is highest and the copilot is needed most. The mini-batch did not remove the problem; it moved it to the worst possible time.
So the real problem is not the cadence value either; it is that clock-triggered work cannot adapt to load. A fixed schedule processes a trickle and a flood with the same trigger, so the flood backs up. So how do we start work the instant an event arrives and let the system absorb a flood without falling behind?
Attempt B — event-triggered streaming¶
Put the three channels onto a durable, replayable log the moment they happen. A continuously running consumer pulls events as they land, routes audio to ASR, images to an embedder, text straight to indexing, and writes derived artifacts into the retrieval index. The chat at 14:32:03 is searchable by 14:32:08. The Monday flood does not queue behind a clock — it streams through processors that scale with the partition count, and any backlog is bounded by consumer throughput, which we can provision, not by an arbitrary schedule. The freshness gap for chat is now seconds. (How we keep that promise under a flood is the entire next chapter; here we only establish that event-triggered work is the right shape.)
4) Rule: a copilot's intelligence is capped by its index freshness¶
The load-bearing invariant of this chapter, in one sentence: a retrieval-grounded model can be no fresher than the most stale write into the index it queries. Everything else — ingestion design, transform placement, index update strategy — is in service of keeping that staleness below the human's patience threshold for the interaction.
This is why "which model" is the wrong first question for a copilot. Swapping a better model onto a 12-hour-stale index produces a more eloquent amnesiac. The freshness gap is a property of the pipeline, not the model, and you fix it in the pipeline.
Mini-FAQ. "Isn't this just caching? Put the recent chats in a cache in front of the warehouse." A cache helps if you know the exact key to look up — but the copilot retrieves by semantic similarity, not by key. The 13:30 chat has to be embedded and added to the vector index before similarity search can find it. That is a write into the index, and a cache in front of a stale index does not perform that write. Freshness at the retrieval layer means incremental index updates, which chapter 05 builds.
5) Why streaming and not "just a faster batch" — under this workload¶
The honest comparison is not "streaming good, batch bad." It is: under this workload — interactive copilot, sub-minute freshness target, bursty multimodal load — which architecture hits the target at acceptable cost?
Faster batch (micro-batch on a schedule)¶
Helps: keeps your existing tools, simple recovery (rerun the failed window), cheap when load is steady.
Hurts: cadence sets a hard floor on the freshness gap; cannot adapt to bursts, so the gap blows out exactly during surges; transcription backlogs queue behind the clock.
Use when: the freshness target is comfortably larger than the batch interval and load is steady — e.g., a daily trend dashboard over support volume.
Event-triggered streaming¶
Helps: freshness gap measured in seconds; work starts on arrival; absorbs bursts by scaling consumers, not by waiting; one path for trickle and flood.
Hurts: always-on infrastructure (you pay for idle consumers at 03:00), exactly-once and ordering become real concerns, more moving parts to operate, late and out-of-order data needs explicit handling.
Use when: a live consumer (human or model) acts on the data and the freshness target is below a few minutes — the copilot, fraud blocking, live personalization.
The decision rule is blunt: if the freshness target is larger than your batch interval and load is steady, stay batch. Streaming is not a maturity badge. It is a cost you take on to buy seconds, and you should only buy seconds someone is actually paying for.
6) A concrete cost and freshness table¶
Numbers for the running example, comparing the nightly warehouse, a 15-minute micro-batch, and event-triggered streaming. Costs are order-of-magnitude for ~100k events/day, mixed modalities; adjust for your stack and provider.
| Approach | Freshness gap (chat) | Freshness gap (audio) | Behavior under Monday surge | Steady-state infra cost | First thing that breaks |
|---|---|---|---|---|---|
| Nightly batch (02:00) | up to ~24 h | up to ~24 h | unaffected (already slow) | low — cluster runs 40 min/day | copilot cites stale data |
| 15-min micro-batch | ~15 min, p99 worse | ~15–50 min | runs queue, gap blows to ~50 min | medium — cluster spins up 96×/day | runs overlap, transcription backlog |
| Event-triggered streaming | ~5–10 s | ~30 s–3 min (ASR-bound) | bounded by consumer throughput | higher — always-on consumers + processors | consumer lag if under-provisioned |
Read the audio column carefully. Even with full streaming, audio freshness is minutes, not seconds, because Whisper-class ASR is designed for ~30-second chunks and streaming variants land around 3 seconds of latency per utterance; the call must finish (or be chunked) before its transcript is complete. Modality cost asymmetry shows up here as freshness asymmetry: text is seconds-fresh, audio is minutes-fresh, no matter how good your streaming layer is. The platform cannot make audio fresher than ASR allows — it can only avoid adding more delay on top.
That asymmetry is a memory hook worth keeping: streaming makes text fresh; it cannot make audio fast. The bottleneck for audio is the model, not the pipeline.
7) Operational signals: knowing the gap is opening¶
You cannot fix a freshness gap you cannot see. The single most important metric in a streaming platform is end-to-end lag: the distribution of t_retrievable − t_event, measured per modality.
- Healthy: chat p50 lag ~6 s, p99 ~20 s; image p50 ~10 s; audio p50 ~90 s (call length dominated). Flat lines over the day.
- First metric to degrade: consumer lag — events sitting in the log unprocessed — climbs during the Monday surge before end-to-end lag does. Lag in the log is the leading indicator; stale answers are the lagging one.
- Misleading metric people watch: batch job success / job duration. The job can succeed in 18 minutes every single run and the copilot can still be answering from data that is, by design, a full cadence-interval stale. Green jobs hide a structural freshness ceiling.
- First graph an expert opens: end-to-end lag percentiles per modality, overlaid with event arrival rate. The expert is looking for the moment lag and arrival rate diverge — the signature of consumers falling behind a burst.
8) Boundary: where the nightly batch was right all along¶
Streaming is over-engineering more often than teams admit, and this module ends (file 08) on exactly that boundary. Establish the fit now.
- Strong fit for streaming: interactive grounding (copilot), where a human or model acts on data within seconds and a stale answer is a visible failure.
- Pathological: streaming a feed that only ever powers a weekly executive report. You pay always-on infra to make data fresh that nobody reads until Friday. The freshness has no consumer.
- Scale/workload limit that breaks intuition: at very low volume, streaming's always-on cost dominates and a cron job is cheaper and simpler. At very high volume with loose freshness needs (training-set rebuilds over months of interactions), batch over a lakehouse wins on cost by an order of magnitude. Streaming wins in the middle: meaningful volume and tight freshness.
The intuition to discard: "modern means streaming." Modern means matching the freshness target to the cheapest architecture that hits it. A platform that streams everything because streaming is fashionable burns money making cold data warm.
9) Wrong model to drop: "the model is the bottleneck"¶
The seductive wrong idea after the angry-customer incident is that the copilot needs a better model or a better prompt. It feels right because the failure surfaced in the model's answer. But the model answered correctly given what it could retrieve. The correct model: the bottleneck is the freshness of the index, set by the pipeline's trigger cadence, not the intelligence of the model. Upgrading the model without fixing the pipeline produces a more articulate wrong answer. The fix lives upstream, in how and when data becomes retrievable.
10) Other ways the freshness gap bites¶
- The "it's in the warehouse" fallacy — data present in storage but not yet in the retrieval index. Present ≠ reachable.
- Schedule overlap — a mini-batch run that has not finished when the next is due; runs queue and the gap compounds during load.
- Modality freshness skew — text fresh in seconds, audio stale for minutes; a cross-channel copilot answers inconsistently depending on which channel the data came from.
- Reprocessing blindness — you fix an embedding bug but the old (wrong) vectors stay in the index because there is no replay path; freshness is fine, correctness is not.
- Cold-start staleness — a brand-new customer has zero history in any index; the copilot has nothing to retrieve and over-relies on the live turn alone.
- Silent index write failure — the pipeline runs, but writes to the vector index are silently dropping; lag looks fine because the events left the log, but they never landed where the copilot reads.
- Late-arriving data — a mobile chat buffered offline arrives 40 minutes late; if the index does not handle out-of-order events, it lands "in the past" and is never surfaced for the session it belonged to.
11) Pattern transfer¶
- Backpressure (chapter 02) — the Monday-surge queue-up is a backpressure problem: producers outrun consumers. The same failure geometry appears in any log-structured system when consumer throughput is under-provisioned.
- Cache invalidation (data-platform module, warehouse/lakehouse) — "present but not reachable" is the same shape as a stale cache: the truth exists, the read path can't see it yet. Both are freshness-at-the-read-layer problems.
- Eval/serving skew (RAG fundamentals) — a copilot grounded on a stale index is a retrieval-quality failure that no amount of generation tuning fixes, the same way a chat-template mismatch is a protocol failure no prompt tweak fixes.
- Lambda vs Kappa (chapter 06) — the "faster batch vs streaming" choice here is the seed of the lambda/kappa debate: how many code paths do you maintain to serve both fresh-and-approximate and slow-and-correct?
12) Design test¶
Five yes/no questions to audit any "we need streaming" claim:
- Is there a live consumer (human or model) that acts on the data within minutes? (No → batch.)
- Have you written the freshness target as a number per source, in seconds? (No → you cannot size anything yet.)
- Is your required freshness target smaller than your current batch interval? (No → faster batch may suffice.)
- Is the load bursty enough that a fixed schedule would queue up during peaks? (Yes → event-triggered helps.)
- Does the freshest modality (audio via ASR) still meet the target after its irreducible processing latency? (No → no architecture fixes it; renegotiate the target.)
Where this appears in production¶
Freshness-critical (streaming earns its keep): - Intercom Fin — grounds support answers on the current conversation and recent account events, not yesterday's snapshot, so a copilot mid-chat sees what just happened. - Stripe Radar — scores each card swipe in milliseconds; an 18-hour-stale fraud model would approve fraud all day. - Uber — surge pricing and ETA depend on event-time location streams; a batch lag of minutes would price rides for traffic that already cleared. - DoorDash — live order and courier streams feed dispatch; staleness means assigning a courier who already left the area. - LinkedIn — feed and notification relevance updated from real-time activity streams so a connection's post is rankable within seconds. - Netflix Keystone — streams playback and interaction events to power near-real-time personalization and operational dashboards. - Robinhood / trading copilots — market-data streams where a multi-second freshness gap is a wrong-price answer. - PagerDuty / incident copilots — ground responses on the alert that fired seconds ago, not the last batch load.
Freshness-tolerant (batch is correct, and streaming would be waste): - Payroll and billing reconciliation — correctness over freshness; a nightly batch is the right tool and streaming adds only risk. - Model-training dataset builds — months of interactions rebuilt periodically over a lakehouse; batch wins on cost by an order of magnitude. - Weekly executive KPI reports — the consumer reads on Friday; always-on streaming infra would make data fresh that nobody queries until then. - Compliance archival — write-once, read-rarely; freshness is irrelevant, durability and cost are everything. - Churn-model feature backfills — large historical recompute jobs where a 12-hour gap is invisible to the consumer. - A/B experiment readouts — aggregated over days; sub-second freshness buys nothing the experiment design can use. - Data-warehouse marts for BI — dashboards refreshed hourly or daily; the human glances after coffee, not mid-second. - SEO / content recommendation indexes rebuilt nightly — freshness target is a day, batch is cheaper and simpler.
Pause and recall¶
Close the file. Answer from memory.
- Write the freshness gap as a chain of delays. Which delay dominates a nightly batch?
- The angry-customer copilot answered "I don't see any recent contact." Whose fault was it — model, pipeline, or scheduler? Why?
- Why does making the batch compute faster barely move the freshness gap?
- Attempt A ran the batch every 15 minutes and it broke on Monday. What broke, and why is the cause "clock-triggered work cannot adapt to load"?
- State the chapter's core invariant in one sentence.
- Why is audio freshness measured in minutes even with perfect streaming?
- Name two workloads where streaming is over-engineering.
- What is the leading indicator that the freshness gap is about to open, and what is the lagging one?
Interview Q&A¶
Q1. A PM says "the copilot feels dumb, let's try a smarter model." You suspect freshness. How do you prove it in one query? A. Pull a trace where the copilot missed recent context, and check the index timestamps of the retrieved chunks against the event time of the missing data. If the missing event exists in storage but its index write postdates the query — or never happened — it is a freshness gap, not a model gap. Swapping models won't change which chunks are retrievable. Common wrong answer to avoid: "Run an eval on a bigger model." That measures generation quality on whatever was retrieved; it cannot detect that the right document was never in the index.
Q2. Why doesn't running the nightly batch every 15 minutes solve the freshness problem cleanly? A. A fixed schedule processes a trickle and a flood with the same trigger, so during a burst the runs queue, overlap, and the gap blows out exactly when load is highest — the Monday-surge failure. Worse, the cadence still sets a floor: even with no load, data waits up to one interval. Mini-batch reduces the floor but does not make work event-triggered. Common wrong answer to avoid: "Just use a bigger cluster for the mini-batch." That speeds the run's duration, which is not the bottleneck; the bottleneck is the trigger cadence and overlap under burst.
Q3. How do you express a freshness requirement so it can drive architecture? A. As a per-source latency budget in seconds, broken down by the chain Δi+Δs+Δt+Δx, with a percentile (p99, not just p50). "Chat retrievable within 10 s p99; audio within 3 min p99 given ASR." That tells you which Δ to attack and whether any architecture can hit it. Common wrong answer to avoid: "Real-time." It is not a number, it hides which delay matters, and it usually means "the batch cadence hurts," which is a Δx problem with a cheap fix.
Q4. The freshness gap for chat is 6 seconds but for audio is 90 seconds on the same platform. Is the platform broken? A. No — that is modality cost asymmetry, not a bug. Audio is useless until transcribed, and ASR is the irreducible bottleneck; even streaming ASR adds seconds per utterance and a call must largely complete before its transcript is whole. The platform's job is to add no extra delay beyond ASR, not to make audio as fresh as text. Common wrong answer to avoid: "Scale the audio consumers to fix the gap." More consumers reduce queuing, not the per-call ASR latency floor.
Q5. When would you argue against streaming for a data feed in a design review? A. When no consumer acts within minutes — e.g., the feed only powers a weekly report or a periodic training-set rebuild. Streaming's always-on cost and operational surface buy freshness nobody consumes. The test is: is there a live human or model whose decision changes if the data is seconds-fresh vs hours-fresh? Common wrong answer to avoid: "Stream it anyway, freshness never hurts." Freshness you don't consume is pure cost and added failure surface.
Q6. (Cumulative) The copilot cites stale data. Is this a chapter-01 freshness-gap problem, a future ingestion/backpressure problem, or an index-update problem? A. You diagnose by where the lag lives. If the event never entered the log → ingestion. If it is in the log but consumers are behind → backpressure (chapter 02). If it was processed but never written to the vector index → incremental indexing (chapter 05). Chapter 01 gives you the frame — measure end-to-end lag per modality and locate which Δ is large — before assigning blame to a subsystem. Common wrong answer to avoid: "It's always the index." Often the event is stuck in the log or never ingested; jumping to the index skips the leading indicator (consumer lag).
Design/debug exercise (10 min)¶
Step 1 — Modeled example. Here is the freshness budget for chat, filled in for the running platform:
Source: chat message
Target: retrievable within 10 s p99
Δi ingest → log: < 0.5 s (HTTP → broker)
Δs log → durable: included in Δi (log is durable)
Δt raw → embedding: ~1.5 s (one embedding call)
Δx artifact → index: ~2 s (vector upsert + index visibility)
budget left: ~6 s headroom → OK
Verdict: streaming index update path; no faster model needed.
Step 2 — Your turn. Fill in the same budget for audio on the running platform (12k calls/day, 6 min average). Target: transcript retrievable within 3 min p99. Estimate Δt for ASR (remember the 30-second-chunk and ~3-second-streaming-latency facts) and decide whether the target is achievable, and what you'd renegotiate if it isn't.
Step 3 — Reproduce from memory. Without looking, redraw the freshness-gap chain diagram (event → ingest → store → transform → index → retrievable, with the four Δ labels), and write one sentence explaining why the nightly-batch gap is dominated by Δx (cadence) rather than Δt (compute). Connect it to the chapter invariant.
Operational memory¶
This chapter explained why a perfectly healthy nightly pipeline still makes a copilot answer "I don't see any recent contact" ninety minutes after the contact happened. The important idea is the freshness gap — the wall-clock delay from event to retrievable, dominated for batch by scheduling cadence (Δx), not compute — not the model's intelligence or prompt.
You learned to write freshness as a per-source latency budget in seconds with a percentile, trace a stale answer to the trigger cadence that caused it, and decide between faster-batch and event-triggered streaming by comparing the target to the batch interval and the load's burstiness. That solves the opening failure because the 13:30 chat was in storage but not in the index — and only an event-triggered write into the index closes the gap.
Carry this diagnostic forward: when a copilot seems dumb, measure end-to-end lag per modality before touching the model. If you see consumer lag climbing during a surge, suspect the pipeline trigger, not the LLM. And remember that streaming makes text fresh but cannot make audio fast — the ASR floor is real.
Remember:
- A retrieval-grounded model is only as fresh as the most stale write into its index; the scheduler caps the copilot.
- Faster compute does not shrink a cadence-dominated gap; smaller cadence or event-triggered work does.
- Write freshness as Δi+Δs+Δt+Δx in seconds, per source, at p99 — never as the word "real-time."
- Streaming makes text seconds-fresh but cannot beat the ASR latency floor on audio — modality freshness skew is structural.
- Streaming is over-engineering when no consumer acts within minutes; match the freshness target to the cheapest architecture that hits it.
Bridge. We established that work must be event-triggered, and that the freshness gap blows out under load when consumers fall behind. But "put it on a stream" hides the hard question: what happens to events when a Monday-morning flood arrives and the ASR consumers cannot keep up? Do we drop calls, slow the source, or buffer — and how does a log keep order and survive a crash while doing it? The next file builds the replay log and confronts backpressure head-on. → 02-ingestion-and-backpressure.md