TLDR: LEGOMem is a modular procedural memory framework designed for multi-agent Large Language Model (LLM) systems, particularly for workflow automation. It addresses the limitation of stateless LLM agents by decomposing past successful task executions into reusable memory units. These units are flexibly allocated to both central orchestrators (for high-level planning) and individual task agents (for execution guidance). Experiments on the OfficeBench benchmark demonstrate that LEGOMem significantly boosts task success rates across various LLM team configurations, enabling even smaller language models to perform more effectively by leveraging prior experiences for improved planning and tool use, ultimately leading to more efficient and reliable task execution.
Large Language Models (LLMs) are becoming increasingly vital for automating complex, multi-step workflows, especially in productivity environments like document editing, email management, and calendar scheduling. To handle the diverse and intricate nature of these tasks, many systems now use multi-agent designs, where several LLM-based agents work together, specialize, or delegate responsibilities. This approach mirrors the real world, which is inherently multi-agent, requiring coordinated decision-making and varied roles.
However, a significant limitation of current multi-agent systems is their stateless nature. Each task is typically solved from scratch, without learning from previous experiences. This absence of memory, particularly procedural memory—the knowledge of how to perform tasks—hinders their ability to improve over time. While some memory modules exist for single-agent LLMs, they don’t address the unique coordination and specialization challenges of multi-agent setups.
Introducing LEGOMem: A Memory Framework for LLM Teams
To bridge this gap, researchers have introduced LEGOMem, a modular procedural memory framework specifically designed for multi-agent LLM systems. LEGOMem focuses on a common architecture where a central orchestrator plans tasks and delegates subtasks to specialized tool-using agents. The goal is to equip both the orchestrator and individual task agents with memory derived from past successful task executions, leading to better planning, coordination, and task execution.
LEGOMem works by distilling successful task executions into structured memory units. These include ‘full-task memories’ (covering high-level plans and reasoning) and ‘subtask memories’ (detailing agent behavior and tool interactions). These modular memories are stored in a memory bank, indexed by semantic embeddings, and then reused when new tasks arise to enhance planning and execution.
How LEGOMem Operates
The framework operates in two main phases:
- Offline Memory Construction: Successful task trajectories are analyzed and converted into reusable memory units. Full-task memories capture the overall task description and plan, while subtask memories encapsulate specific agent actions, tool use, and observations. These are stored in a vector database.
- Online Memory-Augmented Inference: When a new task is presented, LEGOMem retrieves relevant memories. The orchestrator receives full-task memories to guide its planning and agent selection, while each task agent is given subtask memories relevant to its delegated responsibilities. This allows orchestrators to leverage past solutions for informed planning and error recovery, and task agents to improve their accuracy and efficiency in using tools.
LEGOMem also explores different memory retrieval strategies, including a ‘vanilla’ approach, ‘LEGOMem-Dynamic’ (which retrieves subtask memories dynamically during execution), and ‘LEGOMem-QueryRewrite’ (which uses an LLM to rewrite queries for more precise subtask memory retrieval before execution). These variants allow for a systematic study of how memory placement and retrieval affect multi-agent performance.
Also Read:
- A-MemGuard: Securing AI Agent Memory Against Subtle Attacks
- AutoContext: Learning Environment Facts for Smarter AI Agents
Key Findings and Impact
Evaluations on the OfficeBench benchmark, which includes multi-step office automation tasks, showed that LEGOMem variants consistently and significantly improved task success rates compared to systems without memory and other baseline methods. The framework was tested with various team configurations, including teams composed entirely of large LLMs, hybrid teams (LLM orchestrator, smaller LLM agents), and teams of only smaller LLMs.
A crucial finding was that orchestrator memory is vital for effective high-level planning and task decomposition. Fine-grained agent memory, on the other hand, significantly improves execution accuracy, especially for smaller language models. This means that even teams made up of less powerful language models can substantially benefit from procedural memory, narrowing the performance gap with stronger agents by using prior execution traces for more accurate planning and tool use. Furthermore, LEGOMem led to a reduction in the number of execution steps required and a lower rate of failed steps, indicating more efficient and reliable task completion.
In essence, LEGOMem serves as both a practical framework for building memory-augmented agent systems and a valuable research tool for understanding memory design in multi-agent workflow automation. For more in-depth information, you can refer to the original research paper.


