MemMachine: Open-Source Ground-Truth-Preserving Memory System Achieves 93% Accuracy on Long-Term Agent Memory Benchmarks
Available in: 中文
LLM agents suffer from memory degradation across sessions. MemMachine, a new open-source system, integrates short-term, long-term episodic, and profile memory to solve this problem with a ground-tr...
LLM agents suffer from memory degradation across sessions. MemMachine, a new open-source system, integrates short-term, long-term episodic, and profile memory to solve this problem with a ground-truth-preserving architecture.
The Problem
Standard context-window and RAG pipelines degrade over multi-session interactions:
- Context windows — Limited size, expensive to fill
- RAG — Lossy extraction, retrieval quality degrades
- No episodic memory — Cannot reference specific past conversations
MemMachine's Architecture
| Memory Type | Function | Storage |
|---|---|---|
| Short-term | Current conversation | Context window |
| Long-term episodic | Past conversations | Full episodes (not extracted summaries) |
| Profile | User preferences | Structured profile |
Key innovation: stores entire conversational episodes rather than lossy LLM-based extraction summaries.
Results
- LoCoMo benchmark: 0.9169 accuracy (using gpt4.1-mini)
- LongMemEvalS (ICLR 2025): 93.0% accuracy after six-dimension ablation
Retrieval Optimizations
The paper found that retrieval-stage optimizations outperformed ingestion-stage gains:
| Optimization | Accuracy Gain |
|---|---|
| Retrieval depth tuning | +4.2% |
| Context formatting | +2.0% |
| Search prompt design | +1.8% |
| Query bias correction | +1.4% |
Why It Matters
- Open source — Available for integration into any agent framework
- Ground-truth preserving — Stores actual conversations, not summaries
- Practical impact — Directly improves personalized AI assistant quality
- Contextualized retrieval — Expands nucleus matches with surrounding dialogue context
As AI agents become persistent companions, memory systems like MemMachine become critical infrastructure.
← Previous: AI Safety Verification Is Fundamentally Incomplete: Kolmogorov Complexity Proves No Finite Verifier Can Certify All Safe AI SystemsNext: ANX Protocol: An Open Agent-Native Framework for AI Agent Interaction Replaces GUI Automation and MCP Skills →
0