TLDR: ELMUR (External Layer Memory with Update/Rewrite) is a new transformer architecture that gives robotic agents a structured, layer-local external memory. This allows robots to remember and use information over extremely long periods, extending memory horizons up to 100,000 times beyond typical attention windows. It uses bidirectional memory interaction and a Least Recently Used (LRU) update system to achieve 100% success on T-Maze tasks up to one million steps, significantly improves performance on MIKASA-Robo manipulation tasks, and outperforms baselines on most POPGym tasks, demonstrating robust long-term recall and generalization under partial observability.
Imagine a robot trying to cook pasta. It adds salt, stirs, and then later, adds salt again, making the dish inedible. The problem isn’t a lack of cooking skill, but a fundamental inability to remember if salt was already added, especially since it dissolves and becomes invisible. This scenario highlights a critical challenge in robotics: partial observability and the need for long-term memory. While humans effortlessly recall past actions, robots often struggle with retaining information over extended periods, especially when key cues appear long before they are needed for decision-making.
Most modern artificial intelligence models, like standard recurrent neural networks or transformers, are limited by short observation windows. They struggle to retain and leverage long-term dependencies, leading to ‘forgetting’ crucial information over time. This is where ELMUR (External Layer Memory with Update/Rewrite) steps in, offering a novel solution to equip robots with efficient and persistent long-term memory.
What is ELMUR?
Developed by Egor Cherepanov, Alexey K. Kovalev, and Aleksandr I. Panov, ELMUR is a transformer architecture augmented with a structured external memory system. Unlike traditional models that rely solely on instantaneous information, ELMUR integrates memory directly into each layer of the transformer, allowing it to store and retrieve past information effectively. This design extends the effective memory horizons significantly, going up to 100,000 times beyond the typical attention window of a transformer.
How ELMUR Works
ELMUR operates with two main components within each transformer layer: a ‘token track’ and a ‘memory track’. The token track processes current observations and generates actions, similar to a standard transformer. The memory track, however, runs in parallel and is designed to persist information across different segments of a task. These two tracks interact bidirectionally through a mechanism called cross-attention:
- Memory to Token (mem2tok): The tokens (representing current observations) can ‘read’ from the external memory, enriching their understanding with insights from the past.
- Token to Memory (tok2mem): The tokens can also ‘write’ new information or update existing entries in the memory, ensuring that salient events are retained.
A crucial element of ELMUR is its Least Recently Used (LRU) memory module. This module intelligently manages memory slots. Initially, it fills empty slots with new information. Once all slots are occupied, it selectively rewrites the least recently used slot. This rewrite can happen either by completely replacing the old content or by ‘convex blending,’ which mixes new content with the previous memory. This blending mechanism, controlled by a hyperparameter called lambda (λ), allows for a balance between fast adaptation and long-term stability. Additionally, a ‘relative bias’ mechanism helps ELMUR understand the temporal distance between current observations and memory entries, ensuring that memory interactions are contextually grounded.
Unprecedented Performance
ELMUR’s innovative design has led to remarkable results across various benchmarks:
- T-Maze Task: On a synthetic T-Maze task, which requires recalling an early cue after navigating a very long corridor, ELMUR achieved a 100% success rate even with corridors up to one million steps long. This demonstrates its ability to retain information over extremely long durations.
- MIKASA-Robo: In sparse-reward manipulation tasks with visual observations, ELMUR nearly doubled the performance of strong baselines, showing its effectiveness in complex robotic scenarios like remembering colors or reversing actions after a delay.
- POPGym: Across 48 diverse partially observable puzzles and control tasks, ELMUR outperformed baselines on more than half of the tasks, achieving the best overall score. This highlights its robust generalization capabilities across different types of memory-intensive challenges.
The research also provides a theoretical analysis of ELMUR’s LRU-based memory dynamics, establishing formal bounds on how information is forgotten or retained. This analysis confirms that ELMUR’s memory system ensures stability and predictable retention horizons.
Also Read:
- ContextVLA: Enhancing Robot Dexterity with Efficient Temporal Understanding
- VER: A New Vision Expert Transformer for Adaptive Robot Learning
A Step Towards More Capable Robots
ELMUR represents a significant advancement in equipping AI agents with efficient long-term memory. By integrating structured, layer-local external memory with intelligent update mechanisms, it allows robots to overcome the limitations of short context windows and tackle complex, long-horizon tasks under partial observability. This approach is not only effective but also efficient, with ELMUR running faster per step than some baselines despite its enhanced capabilities.
This work paves the way for more capable and adaptable robotic agents that can operate reliably in real-world scenarios where remembering past events is crucial for successful decision-making. For more in-depth details, you can read the full research paper here.


