Agent Memory Systems — Interview Questions¶
The "your agent forgets between sessions / has stale facts / leaks user data across sessions" round. Different from the conversation-memory question in ai-system-design.md (system-design framing) — this file is the mechanism-level interview: short-term vs long-term, episodic vs semantic vs procedural, write/read paths, consolidation, eviction, staleness, privacy.
The senior tell is naming three memory layers explicitly (short-term working context, episodic per-session summary, semantic per-user facts), giving each one a read trigger and a write trigger, and naming a staleness handling rule for the long-term layer.
Foundations¶
Q: "What types of memory does an LLM agent need?"¶
Tags: mid · very-common · conceptual · source: AEM Institute 25 Advanced Agentic AI Questions 2026; A Practical Guide to Memory for LLM Agents 2026 (Towards Data Science)
Answer outline: - Three layers, each with a distinct write/read pattern: - Short-term (working memory): the current conversation context — recent turns, intermediate scratchpad, tool results in flight. Lives in the LLM's context window. Cleared when the session ends or context fills. - Episodic memory: structured records of what happened in past sessions. Conversation summaries, completed tasks, sequences of events. Retrievable; preserves temporal flow. - Semantic memory: factual knowledge about the user or domain — preferences, profile, learned facts. Retrievable; updated when facts change. - Procedural memory (sometimes called the fourth type): learned how-to patterns — tool-use habits, workflow shortcuts, common operations the user prefers. Stored as instructions or example trajectories; influences future planning. - Why this matters in 2026: long-lived agents (assistants, copilots) become useful only when they remember context across sessions. Without persistent memory, every session restarts from zero — bad UX. - Implementation: short-term in the model's context window (no extra infra); episodic + semantic in an external store (vector DB + key-value DB combo); procedural often emerges from few-shot examples or stored playbooks. - Numbers to drop: "three memory layers minimum in production: short-term, episodic, semantic", "procedural memory rare in 2026 but growing", "per-user memory budget: 100-2000 stored items typical"
Common follow-ups: - "Walk me through write triggers for each layer." - "What's the difference between episodic and semantic with a concrete example?"
Traps: - Treating "memory" as one thing. The layers have different lifecycles. - Storing everything as raw conversation history. Doesn't scale; PII liability.
Related cross-cutting: Architecture choices
Related module: learning/01_ai_engineering/11_long_term_memory_state/
Q: "Explain the difference between episodic memory and semantic memory in the context of an AI agent. How would you implement each?"¶
Tags: senior · very-common · conceptual · source: AEM Institute 25 Advanced Agentic AI Questions 2026
Answer outline: - Episodic memory = a record of events with temporal context. "On 2026-05-10, the user asked about their refund and we processed it. The conversation lasted 8 turns." Preserves narrative. - Semantic memory = abstracted facts. "The user is on the Pro plan. The user prefers email over SMS. The user's company is X." Atomic, retrievable, no inherent timestamp. - Implementation: - Episodic: store per-session summaries (LLM-generated at session end), plus key events with timestamps. Stored in a session log (Postgres / Mongo) keyed by user_id + session_id, often also embedded for retrieval. - Semantic: key-value or knowledge-graph store, keyed by (user_id, attribute). Updates overwrite. Embed values for similarity retrieval over facts. - Retrieval at the start of a new session: pull (a) the most recent episodic summary (for continuity), (b) semantic facts matching the current query intent. - The interplay: facts are abstracted from episodes. After several conversations where the user mentions liking concise answers, a semantic fact "user prefers concise responses" emerges. This consolidation step is the trickiest part — see the consolidation question. - Numbers to drop: "episodic store: 1 summary per session, 200-500 tokens each", "semantic store: 50-500 facts per user typical", "retrieval at session start: 1 episodic summary + top-3 semantic facts by relevance"
Common follow-ups: - "How do you consolidate episodes into facts?" - "What if episodic and semantic memory disagree?"
Traps: - Lumping episodes and facts in one store. Different access patterns; different update semantics. - Storing raw transcripts forever. PII liability.
Related cross-cutting: Architecture choices
Related module: learning/01_ai_engineering/11_long_term_memory_state/
Q: "What's procedural memory and when do you need it?"¶
Tags: staff · occasional · conceptual · source: MachineLearningMastery 3 Types of Long-term Memory 2026; emerging 2026 frontier topic
Answer outline: - Procedural memory = learned how to do patterns. Workflow shortcuts, common operations, tool-use sequences the user has come to expect. - Examples: - Coding agent learns "this user always wants Black formatting and pytest tests" — applies without being asked. - Customer-support agent learns "for tenant X, always check the SLA tier before quoting response time". - Calendar assistant learns "user prefers 25-min meetings with 5-min buffers". - Implementation: extracted patterns stored as instructions injected into the system prompt at the start of a session, or as few-shot examples retrieved by similarity. - Update: emerges from feedback signals (user accepts a pattern repeatedly → promote to procedural). Adjacent to fine-tuning territory but cheaper to iterate. - 2026 reality: most production agents don't have explicit procedural memory yet. Episodic and semantic dominate. Procedural is the emerging frontier. - Risk: procedural patterns can ossify. If the user's preference changes, the agent should detect and update — same staleness handling as semantic memory. - Numbers to drop: "procedural patterns per user: 5-30 typical when implemented", "promotion threshold: 3-5 consistent instances of the same pattern"
Common follow-ups: - "How is this different from fine-tuning?" - "How do you avoid procedural drift?"
Traps: - Treating procedural as future tech without naming a concrete production example.
Related cross-cutting: Architecture choices
Related module: learning/01_ai_engineering/11_long_term_memory_state/
Write & consolidation¶
Q: "Walk me through the write path for agent memory."¶
Tags: senior · common · design · source: standard senior memory-design probe; 2026 AI engineer loops
Answer outline:
- Three write events:
- Session end: a summarization LLM call produces an episodic summary (~200-500 tokens) capturing key facts, decisions, action items, and conversation tone. Stored to the episodic log.
- Fact extraction: in parallel or right after the episodic summary, a fact-extractor LLM call examines the transcript and emits structured facts: [{"subject": "user", "attribute": "preferred_format", "value": "concise", "confidence": 0.8, "source_turn": 14}]. New / updated facts written to the semantic store.
- Explicit user input: when the user says "remember that I'm vegetarian", that's a high-confidence write to semantic memory. Skip the extractor; trust the explicit signal.
- Conflict resolution: when a new fact contradicts an existing one (user said vegan, now says vegetarian), the newer wins by default, but with logging. For low-confidence facts, prefer not to update; for high-confidence, update with a timestamped audit trail.
- Async vs sync: writes happen at session end, on a worker, not in the user-facing path. The user shouldn't wait for memory consolidation.
- Privacy: PII redaction during extraction; user-scoped storage; encryption at rest. Explicit user controls: view, export, delete, opt-out.
- Numbers to drop: "session-end summarization: 1 LLM call, ~500 tokens output", "fact extraction: 1 LLM call, ~5-20 facts typical", "writes async; not in user-facing critical path"
Common follow-ups: - "What if the summarizer hallucinates a fact?" - "How do you handle the privacy case where the user wants to delete?"
Traps: - Synchronous writes that block the user. - No conflict resolution policy. New facts silently overwrite without audit.
Related cross-cutting: Production patterns
Related module: learning/01_ai_engineering/11_long_term_memory_state/
Q: "How do you consolidate episodes into semantic facts without hallucinating?"¶
Tags: staff · common · design · source: State of AI Agent Memory 2026 (mem0); standard staff-tier memory probe
Answer outline:
- Consolidation is the fragile step: an LLM reads the transcript and emits "user is vegetarian" — but the user might have said "I'm thinking about going vegetarian", or it was the LLM hallucinating context.
- Mitigations:
- Structured extraction prompt: explicit schema with confidence and source_quote per fact. The model must point to the transcript span supporting the claim.
- NLI verification: a second model entailment-checks each extracted fact against the cited transcript span. Drops claims not directly entailed.
- Confidence thresholds: low-confidence facts (<0.7) flagged for review or skipped.
- Spot-check sampling: human review of 1-5% of extractions; calibrate the extractor against humans.
- Updates require evidence: don't overwrite an existing fact unless the new fact has high confidence + clear contradiction in the transcript.
- Anti-pattern: free-form "summarize what you learned about the user" prompts. Produces fluent fabrications.
- Eval: a labeled set of (transcript, ground-truth-facts) pairs. Measure extraction precision (don't hallucinate facts) and recall (don't miss real facts). Precision matters more here — wrong facts compound across sessions.
- Numbers to drop: "extraction precision target: 95%+ (hallucinated facts < 5%)", "confidence threshold for write: 0.7-0.8", "human spot-check 1-5% of extractions"
Common follow-ups: - "What if NLI itself is wrong?" - "How do you detect a contradicting fact that's actually correct?"
Traps: - Free-form summarization that produces facts not in the source.
Related cross-cutting: Production patterns
Related module: learning/01_ai_engineering/11_long_term_memory_state/, learning/03_ai_security_safety/00_safety_guardrail_design/
Read & retrieval¶
Q: "How do you retrieve memory at the start of a session?"¶
Tags: senior · common · design · source: standard senior memory-design probe; 2026 AI engineer loops
Answer outline: - Two retrieval calls happen before the user's first message gets to the LLM: - Episodic recency: pull the most recent N session summaries (often N=1-3). Gives the model "we last talked about X yesterday" context. - Semantic relevance: embed an early signal (the user's first message, or just the user's identity context), pull top-K most-relevant semantic facts. Filters out irrelevant facts so context isn't polluted. - Inject into the system prompt: a structured block like
<user_memory>
Recent session summary: ...
Known facts: [list of relevant fact strings]
</user_memory>
Common follow-ups: - "What if the user's first message is too short to embed meaningfully?" - "How do you avoid wasting tokens on irrelevant facts?"
Traps: - Dumping all facts on every turn. Wastes context. - No re-retrieval on topic shift.
Related cross-cutting: Cost & latency
Related module: learning/01_ai_engineering/11_long_term_memory_state/
Q: "How does memory retrieval interact with conversation context?"¶
Tags: senior · common · conceptual · source: standard senior memory-architecture probe; 2026 AI engineer loops
Answer outline:
- Three sources of context flow into the LLM prompt: (1) the static system prompt, (2) retrieved memory (per-user facts + episodic summary), (3) the in-session conversation history.
- Order matters. Typical assembly:
- System prompt (cacheable; static across sessions).
- User memory block (cacheable for a session; changes per user but rare mid-session).
- In-session conversation history (changes every turn).
- Current user message.
- Caching: provider-side prompt cache helps most when the prefix is stable. Put memory after the static system prompt but before the conversation history.
- Token budget: memory shouldn't dominate. Soft cap at 10-20% of context window for memory; rest for conversation + tools + answer.
- As conversation grows, you may need to compact older turns. Strategy: summarize the first N turns into a "conversation so far" block; keep the most recent K turns verbatim.
- Lost-in-the-middle (see llm-fundamentals.md): memory is usually placed in the prefix where attention is strongest. Important facts go first.
- Numbers to drop: "memory block: 10-20% of context", "in-session conversation: bounded; compact older turns past 10-20 turns"
Common follow-ups: - "What happens when memory plus conversation exceeds the context window?" - "Does the order of facts matter?"
Traps: - Putting memory at the end of context. Less attention; possibly truncated first when context fills.
Related cross-cutting: Cost & latency
Related module: learning/01_ai_engineering/11_long_term_memory_state/
Staleness, eviction, conflict¶
Q: "An agent is using a vector database for long-term memory. How do you manage memory staleness? What if a user's preferences change?"¶
Tags: senior · very-common · scenario · source: AEM Institute 25 Advanced Agentic AI Questions 2026
Answer outline: - Memory items go stale in two ways: - Facts change: user used to live in Bangalore; now in Mumbai. Old fact must be overwritten. - Facts become irrelevant: a year-old preference about a feature that no longer exists. - Strategies: - TTL by category: time-sensitive facts (current employer, current address) have short TTL; identity-level facts (name, birthday) effectively don't expire. - Conflict detection: when extracting new facts, check against existing; explicit contradiction → update with the new value, log the change with timestamp + source turn. - Confidence decay: each fact's confidence decays over time; below threshold → re-confirm with the user or drop. - Explicit refresh prompts: occasionally ask "we have you down as preferring X — still right?" — especially for high-stakes facts (allergies, dietary restrictions in a food-ordering agent). - User-driven update: the user says "I no longer work at X" → high-confidence write, overwrites instantly. - Eviction policy when memory store grows: LRU (least-recently-accessed) or LFU (least-frequently-accessed) for non-critical facts. Pinned-critical facts never evicted. - Audit trail: every update logged. Lets you debug "the agent thinks I live in Bangalore" — when did it learn that, from what conversation, why hasn't it updated. - Numbers to drop: "TTL by category: identity-fact infinite, preference-fact 6-12 months", "confidence decay: linear over months", "evict at 1000-5000 facts per user typical"
Common follow-ups: - "What happens if the conflict detection is wrong?" - "How do you handle a user who corrects an old fact?"
Traps: - No update policy. The agent confidently uses stale facts forever. - No audit trail. Can't debug memory bugs.
Related cross-cutting: Production patterns
Related module: learning/01_ai_engineering/11_long_term_memory_state/
Q: "What's the eviction policy when your per-user memory store fills?"¶
Tags: senior · common · design · source: standard senior memory-management probe; 2026 AI engineer loops
Answer outline: - Memory should be bounded. Unbounded growth blows storage cost and retrieval performance. - Eviction strategies: - LRU (least recently retrieved): facts not retrieved in N months get evicted. Simple, works well. - LFU (least frequently retrieved): facts rarely matched by queries get evicted. - Tiered importance: critical facts (user-explicit, identity, allergies) never evict; preference facts evict after disuse; ambient facts (mentioned once, never relevant again) evict aggressively. - Time-windowed: episodic summaries roll up — keep the last N raw summaries, compress older ones into a "history of last quarter" meta-summary. - Soft cap before hard eviction: when approaching the budget, lower the inclusion threshold for new facts. - User-visible controls: let the user see their stored memory and explicitly delete items. Required for GDPR / similar regulations. - The honest answer: most production agents in 2026 don't hit memory eviction yet because users are still mostly single-digit-session. As long-lived agents become common, eviction policy becomes more visible. - Numbers to drop: "soft cap per user: 500-2000 facts", "hard cap: 5000 facts", "episodic raw retention: last 10-30 sessions; older compressed"
Common follow-ups: - "How does this interact with GDPR right-to-be-forgotten?" - "What if evicting a fact later proves wrong?"
Traps: - No eviction. Costs and retrieval performance degrade. - Aggressive eviction of critical facts.
Related cross-cutting: Production patterns
Related module: learning/01_ai_engineering/11_long_term_memory_state/
Architecture choices¶
Q: "What are the trade-offs of using a pure LLM's context window as the primary memory store versus an external vector database?"¶
Tags: senior · very-common · scenario · source: AEM Institute 25 Advanced Agentic AI Questions 2026
Answer outline: - Context-window-only (long-context model with full history in context): - Pros: simple, no external infra, no staleness or conflict logic (it's all there), the model sees everything coherently. - Cons: cost scales linearly with conversation length, latency suffers on long contexts, hits hard limits past 100k-1M tokens, lost-in-the-middle bites, no persistence after context fills, no cross-session continuity unless you stuff the full history in every turn (cost prohibitive). - External vector DB (memory in retrieval system): - Pros: scales beyond context window, persistent across sessions, low context cost (retrieve only what matters), can do TTL / eviction / staleness, structured update path. - Cons: retrieval is fallible (relevant fact not retrieved), extra infra to operate, complexity around consolidation and conflict. - Hybrid (the production answer): - Recent context (last 10-20 turns) in raw form in the prompt. - Older context summarized into the prompt's memory block. - Long-term semantic facts and episodic summaries in external store, retrieved by relevance. - For most production chat assistants in 2026: hybrid wins. Pure-context works for short-lived workflows; pure-vector-DB without recent-context buffer feels disconnected. - Numbers to drop: "1M-context model with full history: $5-50 per turn for an active user", "hybrid: $0.05-0.50 per turn typical", "100x-500x cost difference at scale"
Common follow-ways: - "When would pure-context be enough?" - "What's the failure mode of vector-DB-only?"
Traps: - Recommending pure-context "because models have 1M context now". Cost makes this prohibitive at scale. - Recommending pure-vector-DB without recognizing the recent-context need.
Related cross-cutting: Cost & latency, Architecture choices
Related module: learning/01_ai_engineering/11_long_term_memory_state/
Q: "When would you use a knowledge graph instead of a vector DB for agent memory?"¶
Tags: staff · occasional · scenario · source: standard staff-tier memory-architecture probe; 2026 AI engineer loops
Answer outline: - Vector DB: stores facts as embedded text. Retrieval by similarity. Good for fuzzy / paraphrase queries; weak on multi-hop reasoning ("who is X's manager's manager?"). - Knowledge graph (KG): explicit entities and relations. Retrieval by graph traversal. Good for multi-hop, structured queries, and answering questions that require connecting facts. - For agent memory: - Vector DB wins for unstructured personal facts ("user likes Italian food", "user mentioned project X"). - KG wins when memory represents structured relationships — corporate orgcharts, family relationships, entity-and-attribute domains. - Hybrid is common: vector DB for general facts, small KG for high-value structured relationships (e.g., a copilot for a sales team that needs to know account-contact relationships). - Operations: KG more complex (schema design, traversal queries); vector DB simpler (embed and search). - 2026 reality: most agent memory in production is vector-DB-based. KG-augmented memory is a frontier for specialized agents. - Numbers to drop: "vector DB: default for most personal memory", "KG: only when structured relationships are core to the agent's task", "hybrid KG + vector: emerging pattern for enterprise agents"
Common follow-ups: - "Give a concrete example where KG memory wins." - "How do you keep the KG fresh?"
Traps: - Reaching for KG by default. Vector DB suffices for most personal memory.
Related cross-cutting: Architecture choices
Related module: learning/01_ai_engineering/11_long_term_memory_state/, learning/01_ai_engineering/10_knowledge_graph_retrieval/
Privacy & user controls¶
Q: "How do you handle PII and user privacy in agent memory?"¶
Tags: senior · very-common · design · source: standard senior privacy probe; 2026 regulated-industry loops
Answer outline: - Memory contains the most-PII data in the agent stack — full conversation history, extracted personal facts, behavioral patterns. - Required controls: - Per-user isolation: storage keyed by user_id; access controls enforce that one user's data is never returned in another user's session. Test with red-team queries. - Encryption at rest: managed DB encryption (Postgres TDE, vector DB encryption modes); per-tenant key for enterprise. - Encryption in transit: TLS everywhere. - PII redaction at extraction: structured facts go through PII detection; explicit PII fields (SSN, payment, health data) either stored under stricter access controls or not at all. - User-visible controls: view stored memory, export, delete (per-fact or all-memory), opt-out of memory storage entirely. - Retention policy: bounded retention per data category. Conversation transcripts: 30-90 days. Episodic summaries: longer. Identity facts: indefinite with opt-out. - Audit log: every memory access logged (who, when, for what request). Compliance auditors will ask. - Regulations: GDPR right-to-be-forgotten requires the delete control to fully purge; CCPA / DPDPA equivalents. Map controls to specific regulations. - For sensitive domains (health, finance, legal): consider not persisting memory at all, or strict opt-in with explicit consent flow. - Numbers to drop: "audit log retention: 7 years for regulated industries, 90 days for general", "PII redaction recall target: 95%+", "delete latency from request to purge: <24h typical"
Common follow-ups: - "What happens to a deleted fact in an LLM judge's training data?" - "How do you handle a tenant's data residency requirement?"
Traps: - Storing PII in memory without explicit redaction. - No user-controls for view/delete. Compliance fail.
Related cross-cutting: Production patterns
Related module: learning/01_ai_engineering/11_long_term_memory_state/, learning/03_ai_security_safety/00_safety_guardrail_design/
Q: "A user says 'delete everything you know about me'. What happens?"¶
Tags: senior · common · scenario · source: standard senior privacy probe; 2026 regulated loops
Answer outline: - This is the GDPR right-to-be-forgotten case (and similar regimes). The delete must be real, complete, and verifiable. - Steps: - Authenticate the request — confirm it's actually the user. - Identify all storage layers holding the user's data: episodic memory, semantic memory, raw conversation logs, traces, backups, anywhere PII could land. - Issue delete to each layer. Synchronous for the primary stores; async with confirmation for backups. - Acknowledge to the user with a deletion confirmation including timeline (most layers in seconds, full purge of backups within X days). - Log the request (without the PII) for audit purposes. - Edge cases: - Aggregated data: usage stats that don't include PII can usually be retained. - LLM provider's logs: if you sent the user's data to OpenAI/Anthropic, their provider-side retention also needs deletion request (Enterprise tier usually supports this). - Backups: most backup systems retain for 30-90 days. Either purge immediately (expensive) or wait for natural expiration with the user informed. - Audit logs: required to retain by some regulations even when the user requests delete; redact PII from logs while keeping the audit record. - Test the delete pathway. Periodic red-team: request delete on a test user, verify nothing remains. - Numbers to drop: "primary delete: seconds-to-minutes", "full backup purge: 30-90 days", "audit log: PII redacted, request record retained"
Common follow-ups: - "What about data the user shared with the LLM provider?" - "How do you handle backups?"
Traps: - Soft delete only. Doesn't satisfy compliance. - Forgetting LLM provider logs.
Related cross-cutting: Production patterns
Related module: learning/01_ai_engineering/11_long_term_memory_state/, learning/03_ai_security_safety/00_safety_guardrail_design/
Production scenarios¶
Q: "Your agent confidently uses an outdated fact about a user. How do you debug?"¶
Tags: senior · common · debugging · source: standard senior memory-debug probe; 2026 AI engineer loops
Answer outline: - Pull the audit trail for that user's memory. When was the fact written? From which session / turn? With what confidence? - Compare to recent transcripts. Did the user actually contradict it in a recent session? If so, why didn't the consolidation pick up the update? - Common causes: - Low-confidence contradicting fact ignored: extractor didn't write the update because confidence was too low; the old fact persists. - Semantic mismatch in retrieval: the new fact got written but isn't retrieved (different embedding cluster than the query). - Conflict resolution policy: the system kept the old fact because the new one didn't meet the update threshold. - Fix paths: - Bump the extractor's confidence calibration if real updates are being missed. - Add explicit user-correction tooling ("update memory: user moved to Mumbai"). - Increase retrieval breadth for recently-written facts to ensure they surface. - Add to regression suite: the offending transcript becomes an eval case for the memory pipeline. - Numbers to drop: "memory audit log per fact: write timestamp, source, confidence, source turn — required for debugging", "regression case promotion: 5-20/week typical in mature systems"
Common follow-ups: - "What if the user never explicitly says the fact changed?" - "How do you proactively detect stale facts?"
Traps: - No audit trail. Can't debug; can't fix.
Related cross-cutting: Production patterns
Related module: learning/01_ai_engineering/11_long_term_memory_state/, learning/01_ai_engineering/03_agent_observability_debugging/
Q: "Walk me through designing memory for an enterprise copilot used across an organization."¶
Tags: staff · common · design · source: standard staff-tier memory-design probe; 2026 AI engineer loops
Answer outline: - Layered scope: - Per-user memory: personal preferences, conversation history. Strict user-scoped storage. - Per-team memory: team conventions, jargon, shared workflows. Optional; opt-in by team. - Per-tenant memory: org-wide settings, policies, shared facts (org chart, product catalog). Accessible to all users in the tenant. - Access control: each memory layer has its own permissions. Per-user is private; per-team accessible to team members; per-tenant accessible to all in the org. - Retrieval at session start: pull per-user + per-team (if user is in a team) + per-tenant facts relevant to the query. - Writes: - Per-user: extracted from the user's conversations. - Per-team: extracted only when multiple users in the team share a pattern; or explicit team-admin curation. - Per-tenant: typically admin-curated, not auto-extracted (avoids one user's quirks becoming organizational truth). - Privacy: extracted patterns should not leak personal info from one user into team/tenant memory. PII redaction before promotion. - Audit: per-tenant memory access logged for compliance. - Versioning: per-tenant memory changes (policy updates, jargon evolution) versioned; rollback possible. - Numbers to drop: "per-user typical 100-1000 facts", "per-team 50-500 patterns", "per-tenant 100-10000 facts (corporate KB)", "promotion threshold: pattern must appear in 3+ users before considered for team-level"
Common follow-ups: - "What if a user wants their data not to be promoted to team memory?" - "How do you handle org changes (mergers, restructures)?"
Traps: - Single-layer memory. Doesn't fit enterprise reality.
Related cross-cutting: Architecture choices, Production patterns
Related module: learning/01_ai_engineering/11_long_term_memory_state/
Q: "Compare mem0, Letta (MemGPT), and Zep as agent memory frameworks."¶
Tags: staff · occasional · conceptual · source: State of AI Agent Memory 2026 (mem0); standard staff-tier tooling probe
Answer outline: - mem0: structured fact extraction + storage + retrieval. SDK and managed service. Focused on the consolidation problem (turn conversations into clean semantic facts). Best for production agents wanting drop-in memory. - Letta (formerly MemGPT): OS-style memory paging. Agent has explicit memory management as tool calls (read, write, swap). Inspired by virtual memory in operating systems. Best when you want the agent to have explicit control over its own memory. - Zep: temporal knowledge graph + vector store hybrid. Strong on chat-history management and temporal reasoning. Targets conversational AI. - All are 2026-active. Choice depends on: - Want managed service? mem0 or Zep cloud. - Want explicit agent-driven memory ops? Letta. - Building from scratch with full control? Custom on vector DB + key-value store. - 2026 default: most teams either roll their own (vector DB + Postgres + custom extractor) or use mem0. Letta is more research-y; Zep is conversation-focused. - Numbers to drop: "mem0: managed service, drop-in", "Letta: OS-style memory paging, ~5-15 explicit memory tool calls per session", "Zep: chat-focused temporal KG"
Common follow-ups: - "When does building your own beat using a framework?" - "How does mem0's fact extraction work?"
Traps: - Treating these as interchangeable. They optimize for different memory models.
Related cross-cutting: Architecture choices
Related module: learning/01_ai_engineering/11_long_term_memory_state/