TLDR: Nemori is a novel self-organizing memory architecture for Large Language Models (LLMs) inspired by human cognitive principles. It addresses the issue of LLM amnesia by introducing a Two-Step Alignment Principle to autonomously organize conversational streams into semantically coherent episodes, solving memory granularity. Additionally, its Predict-Calibrate Principle enables proactive learning from prediction gaps, moving beyond predefined heuristics for adaptive knowledge evolution. Nemori significantly outperforms existing memory systems on benchmarks, especially in longer contexts, while also being highly token-efficient.
Large Language Models (LLMs) have shown incredible abilities, but they often struggle with a fundamental limitation: they forget past interactions. This ‘amnesia’ prevents them from acting as truly autonomous agents capable of genuine, long-term learning. While existing memory systems have tried to address this, they often rely on arbitrary ways to define what a basic memory unit is and use passive, rule-based methods for extracting knowledge. This limits their ability to truly learn and evolve.
Introducing Nemori: A New Approach to AI Memory
To tackle these core issues, researchers have introduced Nemori, a groundbreaking self-organizing memory architecture. Nemori draws inspiration directly from principles of human cognition, offering a more natural and effective way for AI agents to remember and learn from their experiences. Its innovation lies in two key principles:
The Two-Step Alignment Principle: Organizing Conversations into Coherent Episodes
Inspired by Event Segmentation Theory, which explains how humans break down continuous experiences into meaningful events, Nemori uses a principled, top-down method to organize raw conversational data. This solves the critical problem of memory granularity. It works in two steps:
-
Boundary Alignment: Nemori intelligently detects semantic shifts in a conversation. Think of it like a human recognizing when a new topic begins or an old one ends. This allows the system to autonomously group the raw conversational stream into semantically coherent ‘episodes’ – meaningful chunks of interaction rather than arbitrary segments.
-
Representation Alignment: Once an episode is identified, Nemori transforms it into a rich, narrative memory. This simulates how humans naturally recount past events, preserving salient information and context in a structured format, complete with a concise title and a detailed third-person narrative.
The Predict-Calibrate Principle: Proactive Learning from Prediction Gaps
Moving beyond predefined rules, Nemori enables agents to proactively learn from their own ‘prediction gaps.’ This principle is inspired by the Free-energy Principle in cognitive science, which suggests that genuine learning comes from actively distilling what was unexpected or surprising. Here’s how it works:
-
Prediction: When a new episode is generated, Nemori first tries to predict its content based on its existing knowledge. It retrieves relevant information from its semantic memory to make an informed forecast.
-
Calibration: The predicted content is then compared not to a summarized version, but to the original, unprocessed conversation. The difference between what was predicted and the actual conversation reveals a ‘prediction gap’ – new or surprising information. Nemori then distills this gap into new, actionable knowledge statements.
-
Integration: Finally, these newly validated knowledge statements are integrated into Nemori’s main Semantic Memory Database, enriching the agent’s understanding of the world and refining its internal model.
This dual-memory system, comprising detailed episodic memories and abstracted semantic knowledge, allows Nemori to learn in a way that mirrors human cognitive processes.
Performance and Efficiency
Extensive experiments on challenging benchmarks like LoCoMo and LongMemEvalS demonstrate Nemori’s effectiveness. It significantly outperforms prior state-of-the-art memory systems, especially in longer conversational contexts. In some cases, Nemori even surpasses the ‘Full Context’ baseline (where the LLM sees the entire conversation history), proving the power of its self-organizing memory. For instance, Nemori showed exceptional performance in temporal reasoning tasks, transforming complex questions into simple information retrieval by combining episodic context with pre-reasoned semantic facts.
Beyond accuracy, Nemori also boasts remarkable efficiency. It uses significantly fewer tokens on average compared to the Full Context baseline, demonstrating that it not only improves performance but does so with substantial computational savings.
An ablation study confirmed that both the episodic and semantic memory components are crucial and complementary to Nemori’s overall success. The study also directly validated the Predict-Calibrate Principle, showing that proactively learning from prediction gaps creates a much more effective knowledge base than simple, reactive extraction.
Nemori’s ability to generalize to significantly longer and more challenging conversational contexts, such as those found in the LongMemEvalS dataset (averaging 105K tokens), further highlights its robustness. It particularly excels in understanding user preferences, as its concise, high-quality structured memory allows the model to focus more effectively on user habits and inclinations, which might otherwise get lost in vast amounts of raw text.
Also Read:
- Sculptor: Giving Language Models the Power to Manage Their Own Thoughts
- Keeping LLMs Sharp: General Samples Replay for Continual Learning
The Future of Autonomous Agents
By reframing memory construction as an active learning process, Nemori offers a principled solution to the long-standing problem of AI amnesia. This innovative architecture, detailed further in the research paper available at arXiv.org, provides a foundational component for developing autonomous agents capable of genuine, human-like learning and evolution.


