Skip to content

06. Episodic Memory — Remember the event, not just the fact

~15 min read. Some questions are about time and sequence, not only meaning.

Built on the ELI5 in 00-eli5.md. The diary-page — the dated record of what happened — is how agents answer, "What happened last time?"


1) What episodic memory means

See the difference first.

semantic memory           episodic memory
┌───────────────────┐     ┌────────────────────────┐
│ user likes bullets│     │ on Tuesday, user asked │
│ team uses Python  │     │ for a bullet summary   │
└───────────────────┘     └────────────────────────┘
stable fact                event tied to time/context
Episodic memory stores experiences. Not just truths.

An episode usually includes: - what happened - when it happened

  • in which task or session
  • what outcome followed This matters because many work questions are event-shaped.

"What did we try already?" "Which patch failed yesterday?" "How did the customer react last call?"

Those are diary-page questions. The filing-cabinet may hold the text. But the diary-page adds order and time.

That makes retrieval much sharper.

2) A good episode schema

A useful event record is compact. It still needs structure.

episode
├── actor: assistant / user / tool
├── action: ran migration dry-run
├── object: billing-service
├── time: 2026-02-12T10:40Z
├── outcome: failed on currency mismatch
└── source: session_84 turn_19
This is not academic purity. It helps real retrieval. You can now ask:

  • latest failure for billing-service
  • all events before the successful deploy
  • last user correction about tone

The diary-page should capture milestones. Not every tiny token. If you log every microscopic action,

the diary becomes noise. If you log only final outcomes, you lose the path that explains them.

So what to do? Log state-changing events. Log errors.

Log decisions. Log commitments. That is usually enough.


3) Why episodic memory beats generic history for certain tasks

Raw history is linear text. Episodic memory is queryable experience. Look at the flow.

chat turn ──→ detect state change ──→ write diary-page
                          ├── timestamp
                          ├── task id
                          ├── outcome
                          └── link to raw source
Now suppose the user asks, "Why did the previous rollout stop?" A sliding conversation window may miss it.

A semantic retriever may return vague related text. A diary-page query can say: "Most recent rollout stopped at 10:40Z because currency mismatch validation failed in billing-service dry-run."

That answer is event-aware. It preserves sequence. It preserves causality better.

Simple, no?

4) Worked example: yesterday's debugging trail

Suppose an agent helped with an outage. Episode 1:

  • 09:00 alert fired for checkout latency Episode 2:
  • 09:08 assistant suggested cache flush

Episode 3: - 09:10 tool showed cache hit rate was normal Episode 4:

  • 09:15 assistant shifted hypothesis to DB connection pool Episode 5:
  • 09:22 pool limit increased in staging

Episode 6: - 09:28 latency dropped Today the user asks,

"What fixed it last time?" If you only stored semantic facts, you may recall,

"Latency issue involved checkout and staging." Too fuzzy. If you stored diary-pages,

you can answer, "The successful intervention was increasing the DB connection pool in staging at 09:22, after cache flush was ruled out by normal hit-rate data." See the power.

The diary-page gives sequence and failed attempts. That is what users often want.


5) Design rules and limits

Do not turn episodic memory into a full surveillance log.

Store useful events, not life exhaust. Attach time, actor, and outcome. Keep source references.

Allow expiration when episodes are low value. Use the cleanup-bell for sensitive events. And remember this.

An episode is still a model-generated representation sometimes. The event may be mis-summarized. So high-stakes systems should keep raw supporting logs too.

Diary-page plus raw evidence is a strong pair. Diary-page alone can mislead. Look.

Memory quality improves when the event store is boring and explicit.

Where this lives in the wild

  • PagerDuty incident copilots — SRE need timestamped event recall so the assistant can summarize which mitigation steps already happened.
  • GitHub Actions troubleshooting assistants — developer benefit from episode logs that remember which fix attempts failed in previous runs.

  • Sales call copilots — account executive need last-meeting events, objections, and commitments tied to dates.

  • Clinical documentation assistants — doctor rely on episode timelines such as symptoms, interventions, and outcomes across visits.
  • Warehouse robotics supervisors — operator need event memory for stoppages, resets, and successful recovery actions.

Pause and recall

  1. What distinguishes a diary-page from a semantic fact?
  2. Which event fields make "what happened last time" queries easier?
  3. Why is raw chat history weaker than episodic memory for failure analysis?

4. In the outage example, which failed attempt should be preserved and why?

Interview Q&A

Q: Why keep episodic memory when you already store conversation transcripts? A: Transcripts are hard to query by event semantics and time. Episodic memory turns experience into structured, retrievable milestones.

Common wrong answer to avoid: "Because transcripts are too large" — size matters, but the bigger gain is event-oriented retrieval.

Q: Why not collapse episodes directly into semantic facts? A: Facts erase sequence. Many operational questions depend on order, failed attempts, and outcomes over time.

Common wrong answer to avoid: "Because facts are unstructured" — facts may be structured, but they still lose temporal shape.

Q: Why should episodes capture failed actions, not only successful ones? A: Knowing what already failed prevents repetition and improves diagnosis. Failure history is often more informative than the final fix.

Common wrong answer to avoid: "Only for auditing" — it directly improves future decision quality.

Q: Why can episodic memory create privacy risk faster than semantic memory? A: Episodes often contain sensitive actions, timestamps, and context. Rich timelines reveal behaviour patterns.

Common wrong answer to avoid: "Because episode stores are larger" — the deeper risk is behavioural detail, not only size.

Apply now (5 min)

Exercise: Take one recent task. Write four diary-page events for it. Each must have action, time, and outcome. Then ask yourself one question that only the timeline can answer. Sketch from memory: Draw semantic memory on one side and diary-page memory on the other. Write one example query under each.


Bridge. Episodes tell us what happened. Good. But products also need stable truths about the user that outlive any one event. → 07-semantic-memory.md