TLDR: Memp is a new framework that gives AI agents a learnable, updatable, and lifelong procedural memory. It distills past experiences into instructions and scripts, uses smart retrieval, and dynamically updates memory based on success or failure. This significantly boosts task success rates, reduces execution steps, and allows knowledge transfer between models, making agents more efficient and adaptable.
Large Language Models (LLMs) have become incredibly powerful, enabling AI agents to perform complex tasks like web research, data analysis, and navigating software interfaces. However, these agents often struggle with a fundamental challenge: remembering how to perform multi-step procedures efficiently and adapting to unexpected changes. Imagine an agent needing to restart a long task from scratch every time a small error occurs – it’s inefficient and frustrating. This is where the concept of “procedural memory” becomes crucial.
A new research paper titled “Memp: Exploring Agent Procedural Memory” by researchers from Zhejiang University and Alibaba Group introduces a novel framework called Memp. This framework aims to equip LLM-based agents with a learnable, updatable, and lifelong procedural memory, allowing them to continuously improve their performance on diverse tasks. The core idea is to enable agents to distill and reuse their past experiences, much like humans learn from practice.
What is Procedural Memory for AI Agents?
Think of procedural memory as the “how-to” knowledge. For humans, it’s the unconscious ability to perform skills like riding a bike or typing. For an AI agent, it’s about internalizing and automating repetitive tasks, decision-making processes, and interaction patterns. Instead of figuring out every step for a similar task anew, an agent with procedural memory can recall and apply successful past approaches, leading to faster and more accurate execution.
How Memp Builds and Uses Memory
Memp focuses on three key aspects of procedural memory: Build, Retrieval, and Update.
Building Memory: Memp learns from an agent’s past successful task executions, called “trajectories.” It can store these trajectories verbatim, or it can abstract them into higher-level, script-like instructions. The research found that combining both approaches – using concrete examples (trajectories) and abstract guidance (scripts) – yielded the best performance. This combined method is termed “Proceduralization.”
Retrieving Memory: When a new task arrives, Memp needs to find the most relevant past experience. Instead of randomly picking memories, Memp uses smart retrieval methods. One method uses the task’s query description as a key for semantic matching, while another extracts keywords and averages their similarities. These precise retrieval strategies significantly improve the agent’s ability to access helpful knowledge.
Updating Memory: Unlike simpler systems that just add new memories, Memp introduces dynamic update mechanisms. This is vital for agents to adapt to changing environments and correct past mistakes. Strategies include simply adding new successful trajectories, validating and only keeping successful ones, or even adjusting existing memories when a retrieved memory leads to a failed execution. This “adjustment” or reflection-based update mechanism proved to be the most effective, allowing the agent to continuously refine its knowledge base.
Also Read:
- InfiGUI-G1: Boosting AI’s Understanding of User Interfaces with Adaptive Learning
- Advancing Table Reasoning with Multi-Agent Scientific Discussion
Real-World Impact and Benefits
The researchers evaluated Memp on two challenging datasets: TravelPlanner, for complex planning, and ALFWorld, for long-horizon household tasks. The results were compelling. Agents equipped with Memp consistently achieved higher success rates and significantly reduced the number of steps and token consumption needed to complete tasks. For instance, in an “egg heating” task, an agent without Memp might wander aimlessly and fail or take many steps. With Memp, guided by prior experience, it could quickly locate the egg, use the correct appliance (microwave), and complete the task in far fewer steps, saving time and computational resources.
Another exciting finding is the transferability of procedural memory. Memory built by a powerful model like GPT-4o could be successfully transferred to a weaker model (e.g., Qwen2.5-14B), giving the smaller model a substantial boost in task-solving ability. This suggests that high-quality procedural knowledge can be distilled and shared, making AI agents more adaptable and efficient across different models.
The study also explored how the quantity of retrieved memories affects performance. While more relevant memories generally help, retrieving too many can actually hinder performance by introducing less accurate information or exceeding context limits.
In conclusion, Memp represents a significant step towards creating more robust, efficient, and self-improving AI agents. By treating procedural memory as a core optimization target, this framework enables agents to learn from their experiences, adapt to new situations, and perform complex tasks with unprecedented accuracy and efficiency. For more details, you can read the full research paper here.


