03. Conversation History — Windows, buffers, and rolling memory¶

~14 min read. The first practical memory layer is plain chat history, but stored with discipline.

Built on the ELI5 in 00-eli5.md. The summary-card — the folded recap of older turns — lets the agent keep continuity without drowning the desk-note.

1) Raw transcripts are useful, but not enough¶

See the shape.

full transcript
      │
      ├── latest 4 turns ─────────────→ desk-note now
      │
      ├── older resolved turns ───────→ summary-card
      │
      └── selected durable facts ─────→ filing-cabinet / address-book

A transcript is the default history format. It preserves wording. It preserves tone. It preserves unresolved references. That is valuable. But raw transcripts grow linearly. Soon the desk-note fills. Soon irrelevant detail crowds current work. So we need policies. Not all turns deserve equal status. Some turns are still live. Some are context. Some should become a summary-card. Some should become structured facts. Some should simply disappear. Conversation history is the first memory compression layer. It is still close to the original chat. But it is more curated than a raw log.

2) Three common history strategies¶

A. Sliding window¶

Keep only the most recent N turns.

turns: 1  2  3  4  5  6  7  8
keep:              [5][6][7][8]

This is cheap and common. It works well for short tasks. It fails when an early constraint matters later.

B. Summary buffer¶

Keep a rolling summary of old turns. Keep recent turns verbatim.

┌──────────────────────┐
│ summary-card         │  ← turns 1-5 compressed
└──────────────────────┘
           │
           ├── turn 6 raw
           ├── turn 7 raw
           └── turn 8 raw

This is a strong default. It preserves trajectory. It saves tokens. But summaries can drift.

C. Branch-aware history¶

Store separate threads for subtopics. Bring back only the branch relevant now. This is powerful for long workflows. It is harder to build. The librarian must know which branch matters. So what to do? Start with summary buffer plus last-N turns. Add branch awareness only when workflows justify it.

3) What belongs in a good summary buffer¶

A weak summary says, "We discussed billing and debugging." That is useless. A good summary-card preserves: - current goal - constraints and preferences - key facts already established - open questions - promises the agent made - important tool results Look at the structure.

summary-card
├── goal: fix invoice mismatch
├── constraints: no DB writes in prod
├── facts: issue appears only on EUR accounts
├── open loop: confirm rounding service version
└── promise: assistant will draft SQL for staging only

This format is compact. It is also actionable. Notice how the summary-card is not just prose. It is organized memory. That makes later retrieval easier. It also makes refresh cheaper.

4) Worked example: converting turns into a rolling buffer¶

Suppose we have six turns. Turn 1: "Help debug invoice mismatch." Turn 2: "Only EUR customers are affected." Turn 3: "Do not touch production." Turn 4: "You suggested checking tax service version." Turn 5: "Version 3.2 is running in staging." Turn 6: "Now draft the investigation plan." A naive last-2 strategy keeps only turns 5 and 6. It loses the EUR constraint. It loses the production rule. Bad result. A rolling summary does this. Summary-card after turn 4: - goal: debug invoice mismatch - scope: EUR customers only - safety rule: no production changes - hypothesis: inspect tax service version Then keep turns 5 and 6 raw. On turn 6 the desk-note contains: - summary-card above - raw turn 5: staging runs 3.2 - raw turn 6: draft plan Now the answer can say: "I will draft a staging-only plan for the EUR-only mismatch, starting with comparison against tax service 3.2." See. Very small buffer. Still strong continuity.

5) Failure modes to watch¶

Summary drift is the big one. The buffer may overstate certainty. It may delete a caveat. It may merge two branches incorrectly. Another failure is stale carryover. A resolved issue may keep appearing in every turn. That wastes tokens and biases the model. Third failure is assistant self-fiction. The summary-card may store a plan the agent never actually confirmed. So what helps? Keep summaries grounded in actual turns. Prefer structured fields. Refresh summaries after meaningful milestones, not after every tiny reply. Log when the summary changes. And test the summary against the raw transcript sometimes. That is boring engineering. It saves you later.

Where this lives in the wild¶

Claude conversation threads — analyst rely on rolling history so a long document discussion stays coherent without replaying every sentence.
Slack AI thread summaries — manager use condensed thread state so follow-up questions do not require the whole channel replay.
Zendesk AI Agent — support rep benefits from buffered conversation state so the bot remembers verified identity, issue scope, and promised actions.
Notion AI Q&A — knowledge worker can compress prior back-and-forth into a working summary when drafting or revising a document.
Duolingo Max — learner needs recent correction history plus compact session summary so repeated mistakes stay visible across prompts.

Pause and recall¶

What is the main trade-off between a sliding window and a summary buffer?
Name four things a useful summary-card should preserve.
In the worked example, what crucial facts would a last-2 strategy lose?
Why is branch-aware history more powerful and more dangerous?

Interview Q&A¶

Q: Why use a summary buffer instead of storing the entire transcript in every prompt? A: Full replay preserves everything, but wastes budget and lowers relevance. A good summary buffer keeps task-critical state while letting the desk-note stay small. Common wrong answer to avoid: "Because summaries are always more accurate than transcripts" — summaries are cheaper, not automatically more accurate. Q: Why should conversation summaries be structured, not purely free-form prose? A: Structured summaries separate goals, constraints, facts, and open loops. That makes refresh, auditing, and retrieval more reliable. Common wrong answer to avoid: "Because JSON is easier for developers" — the deeper value is semantic stability and less summary drift. Q: Why refresh summaries at milestones instead of every turn? A: Turn-by-turn summarization adds cost and noise. Milestone-based refresh captures meaningful state changes with less churn. Common wrong answer to avoid: "Because models are bad at frequent summarization" — the real issue is unnecessary churn and compounding drift. Q: Why can conversation history alone not replace long-term memory? A: History is session-shaped. Long-term memory must survive sessions, support retrieval, and apply different retention rules. Common wrong answer to avoid: "Because history is unstructured" — even structured history is still not the same as cross-session memory.

Apply now (5 min)¶

Exercise: Take any five-turn chat. Write a six-line summary-card for it. Force yourself to include goal, constraints, facts, open loops, and one promise. Then compare your summary against the raw chat. What nuance did you lose? Sketch from memory: Draw the pipeline from full transcript to summary-card, raw recent turns, and durable stores. Label where desk-note and filing-cabinet sit.

Bridge. A rolling buffer helps, but old turns still pile up. Next we learn how to compress memory on purpose, not just clip it. → 04-memory-compression.md