TLDR: IMAC (Imagined Autocurricula) is a novel method that combines diffusion-based world models with automatic curriculum learning to train AI agents. It leverages offline data to generate diverse ‘imagined’ environments and uses Prioritized Level Replay (PLR) to create an emergent curriculum, adapting task difficulty to the agent’s capabilities. This approach enables agents to learn robustly and generalize significantly better to new, unseen environments, outperforming state-of-the-art offline reinforcement learning methods on the Procgen benchmark.
In the quest to develop truly intelligent agents capable of operating in complex, real-world environments, a significant hurdle has always been the need for vast amounts of training data or highly accurate simulations. For many real-world scenarios, neither of these is readily available. However, a new approach called Imagined Autocurricula (IMAC) is emerging, leveraging ‘world models’ to generate diverse, simulated environments for training robust AI agents that can generalize to novel situations.
World models are a promising technology that can learn from large quantities of passively collected data, such as internet videos, and then use this knowledge to create ‘imagined’ environments. This allows AI agents to train within these generated worlds, exploring potential outcomes without direct interaction with the real environment. While powerful, a key challenge is ensuring that the agent trains on useful and progressively challenging data within these imagined worlds.
Introducing Imagined Autocurricula (IMAC)
IMAC addresses this challenge by integrating Unsupervised Environment Design (UED) into the world model framework. Specifically, it uses a technique called Prioritized Level Replay (PLR) to induce an automatic curriculum over the generated worlds. This means the system intelligently selects and generates training scenarios that are most beneficial for the agent’s learning, gradually increasing in difficulty as the agent improves.
How IMAC Works
The IMAC approach involves three main components:
First, a **diffusion world model** is trained using a diverse offline dataset. This model learns the dynamics of the environment, essentially understanding how the world works from observed interactions. Unlike some previous methods, IMAC’s world model uses full-image observations, allowing it to capture fine visual details crucial for complex tasks.
Second, an **AI agent is trained within these imagined environments**. Once the world model is trained, it’s used to generate imaginary trajectories—sequences of states, rewards, and termination signals. The agent then learns by rolling out its policy through these imagined scenarios. A crucial aspect here is that the imagined episode lengths are randomly sampled, introducing greater diversity into the training experiences and preventing the agent from becoming over-specialized to fixed-length tasks.
Third, the **autocurriculum** mechanism, powered by Prioritized Level Replay (PLR), guides the agent’s learning. Instead of training on random or fixed sets of imagined worlds, PLR maintains a buffer of previously encountered initial states and prioritizes those that offer the highest learning potential for the agent. This prioritization is based on the agent’s temporal difference errors, which indicate where its value estimates were unexpectedly low, suggesting valuable learning opportunities. This process naturally creates an emergent curriculum, exposing the agent to increasingly challenging tasks as its capabilities grow. This dynamic difficulty scaling is particularly beneficial for procedurally generated environments where the difficulty landscape is complex and unknown beforehand.
Experimental Validation
The researchers evaluated IMAC on a challenging subset of the Procgen Benchmark, a collection of procedurally generated environments designed to test generalization in reinforcement learning. They used a mixed offline dataset comprising expert, medium-quality, and random trajectories to train the world model. IMAC consistently outperformed state-of-the-art offline reinforcement learning algorithms across all tested Procgen environments, showing significant improvements in generalization to unseen levels. For instance, it achieved up to a 48% improvement on Jumper and 35% on Maze compared to the best model-free baselines.
A key finding was the demonstration of the emergent curriculum. Initially, IMAC’s PLR selected episode lengths similar to random sampling. However, as training progressed, PLR began prioritizing substantially longer and more complex imagined episodes, indicating that the system was automatically discovering and focusing on more challenging scenarios as the agent improved. This progression from simpler to more intricate environments, such as direct paths to complex obstacle arrangements, directly correlated with the method’s strong performance on test environments.
Also Read:
- Pre-trained Vision Models Enhance Robotic Generalization in Unforeseen Environments
- Tenma: A New Approach to Versatile Robot Manipulation
Looking Ahead
The work presented in this paper, available at arxiv.org/pdf/2509.13341, demonstrates that strong transfer performance on new, unseen environments is achievable by training agents solely within imagined trajectories generated by a world model, even when the world model itself is learned from a limited offline dataset. This opens up exciting possibilities for utilizing larger-scale ‘foundation world models’ to develop highly capable agents that can adapt to a wide range of novel task variations.
While promising, the approach acknowledges limitations such as the dependence on training data diversity and substantial computational requirements. Future work aims to address these through efficiency optimizations, improved uncertainty quantification, and extensions to more complex control domains.


