Skip to content

04. Memory Compression — Shrink without losing the point

~16 min read. Old conversation is expensive, so we must compress it on purpose, not by accident.

Built on the ELI5 in 00-eli5.md. The summary-card — the folded recap of older work — teaches us how to keep meaning while clearing the desk-note.


1) Why compression exists at all

See. The raw transcript is usually too large. The desk-note cannot hold everything. So we need smaller forms. Compression is that smaller form.

raw chat ──→ summary-card ──→ fact list ──→ filing-cabinet
   │             │               │
   │             │               └── stable truths
   │             └── short recap of prior flow
   └── original wording and detail
Compression does not mean random shortening. It means preserving what future turns need. Good compression keeps goals and constraints. Good compression keeps decisions and open loops. Good compression keeps source links when stakes are high. Bad compression keeps fluffy phrasing. Bad compression drops caveats. Bad compression invents certainty. Now what is the problem? Different memory consumers need different shapes. The desk-note wants a small summary-card. The filing-cabinet may want embeddings plus metadata. The address-book wants stable facts only. The cleanup-bell wants permission to discard noise. So compression is not one operation. It is a routing step. It is a meaning-preserving rewrite. Simple, no?


2) Lossy and lossless compression

Look at the comparison.

┌───────────────────────┐      ┌────────────────────────┐
│ lossless              │      │ lossy                  │
├───────────────────────┤      ├────────────────────────┤
│ keep exact wording    │      │ keep the gist          │
│ keep every field      │      │ drop low-value detail  │
│ larger output         │      │ smaller output         │
│ safer for audits      │      │ better for prompt fit  │
└───────────────────────┘      └────────────────────────┘
Lossless compression is rare in language systems. But structured extraction can be close. For example: "User prefers bullet summaries" can become a field. That loses wording, not meaning. Lossy compression is more common. A ten-turn debate becomes six lines. That is useful for the summary-card. But lossy means risk. A missing caveat may change behaviour. So what to do? Use lossless or near-lossless forms for: - permissions - numbers - policy thresholds - cited tool outputs Use lossy forms for: - old brainstorming - phatic chatter - repeated restatements - long assistant explanations already accepted Compression quality depends on the target use. Not on a universal rule.


3) What a good compression pipeline does

A practical pipeline usually asks four questions. First, what is the target store? Second, what future question should this answer? Third, what detail must remain exact? Fourth, what can safely disappear?

conversation chunk
      ├── extract facts ──────────→ address-book
      ├── record event ───────────→ diary-page
      ├── write recap ────────────→ summary-card
      └── archive searchable text ─→ filing-cabinet
Notice the same source chunk can create multiple outputs. One old turn may yield: - a summary line - one stable preference - one dated event That is normal. Compression is not only shrinking. It is decomposition. See the mental move. We stop treating the transcript as sacred. We treat it as raw material. That is how memory systems scale.


4) Worked example: compressing a support thread

Suppose the raw turns say this. 1. Customer wants refund for order 4481. 2. Delivery arrived nine days late. 3. Refund allowed after seven days late. 4. Customer is angry but still polite. 5. Team tone must stay warm and firm. 6. Assistant promised to draft the reply. A poor summary says: "Refund conversation with upset customer." Useless, yes? A better summary-card says: - goal: reply to refund request for order 4481 - fact: delivery delay was nine days - policy: refund allowed after seven days late - tone: warm but firm - open loop: draft customer reply Now extract durable facts. Address-book entry: - support team style = warm but firm Diary-page entry: - case 4481 checked for late delivery on this date Searchable archive chunk: - raw thread text with order id metadata This is compressed memory. We kept what matters. We changed the shape. We lowered token load. We did not lose the point.


5) Failure modes and fixes

The first failure is omission. The summary drops a hard constraint. Fix: Use templates that force goals, constraints, facts, and open loops. The second failure is hallucinated compression. The model adds a preference never stated. Fix: Store source turn ids beside extracted items. The third failure is over-compression. Everything becomes so tiny that future retrieval is useless. Fix: Keep one richer archive form in the filing-cabinet. The fourth failure is stale summary drift. A resolved issue remains in every summary-card. Fix: mark items as resolved or expired. The fifth failure is privacy leakage. Compression preserves personal data too well. Fix: run the cleanup-bell before long retention. So what to remember? Compression saves cost only if it preserves utility. Otherwise you bought smaller garbage.


Where this lives in the wild

  • Intercom Fin — support operations compresses long ticket threads into compact state so later turns do not replay the whole case.
  • Slack AI thread summaries — team lead turns dozens of messages into a short state update with decisions and open loops.
  • Notion AI meeting notes — founder compresses discussion into action items, owners, and deadlines for later recall.
  • GitHub Copilot coding agent — software engineer benefits when verbose tool traces become short state summaries before the next step.
  • OpenAI ChatGPT Memory — frequent user extracts stable preferences from long conversations rather than storing every sentence forever.

Pause and recall

  1. Why is compression a routing problem, not only a shortening problem?
  2. When should you prefer near-lossless extraction over lossy summarization?
  3. In the support example, which parts became summary-card, address-book, and diary-page?
  4. Why can over-compression hurt later retrieval quality?

Interview Q&A

Q: Why compress conversation state instead of simply storing all prior turns in a vector database? A: Retrieval still needs concise, high-signal state for prompt injection. Raw storage alone does not create a usable working memory.

Common wrong answer to avoid: "Because vector databases are expensive" — the real issue is not only storage cost, but live usability and precision.

Q: Why separate summaries from extracted facts? A: Summaries preserve narrative flow. Facts preserve stable assertions. Mixing both makes updates and deletion harder.

Common wrong answer to avoid: "Because summaries are unstructured" — summaries can be structured, but they still serve a different job from facts.

Q: Why can lossy compression be the correct choice? A: Many old details no longer affect the next decision. Keeping only the actionable gist improves prompt fit and speed.

Common wrong answer to avoid: "Because exact wording never matters" — wording matters for audits, policies, and commitments.

Q: Why attach source references to compressed memory? A: Source links let you audit, refresh, and dispute memory. Without them, false compressed facts become hard to unwind.

Common wrong answer to avoid: "Only for debugging logs" — source grounding also protects trust and deletion workflows.

Apply now (5 min)

Exercise: Take a ten-message chat. Write a five-line summary-card. Then extract three durable facts and one diary-page event. Circle one sentence you deliberately threw away. Explain why it was safe to drop. Sketch from memory: Draw the pipeline from raw chat to summary-card, address-book, diary-page, and filing-cabinet. Label where the cleanup-bell may fire.


Bridge. Compression keeps old meaning alive. Good. But once memory leaves the desk-note, how do we search it later at scale? → 05-long-term-vector-memory.md