TLDR: A new research paper introduces a method to create autonomous AI agents that combine the strengths of model-based planning and model-free behavior. Using Meta-Interpretive Learning, a ‘Solver’ agent first learns to plan by understanding its environment. The solutions generated by this Solver are then used to train a ‘Controller’ agent, which learns to act and explore without needing a complete map. The study demonstrates that these two types of agents achieve equivalent problem-solving abilities in grid navigation tasks, particularly when the Controller is enhanced with techniques like Simultaneous Localisation and Mapping (SLAM) to avoid getting stuck in complex environments.
In the realm of artificial intelligence, autonomous agents often face a dilemma: should they rely on a complete understanding of their environment to plan their actions, or should they be able to act and explore without such a detailed map? A new research paper titled “From model-based learning to model-free behaviour with Meta-Interpretive Learning” by Stassa Patsantzis from the University of Surrey, UK, tackles this very challenge, proposing a novel way to combine both capabilities in a single agent.
The paper introduces two types of agents: a “model-based Solver” and a “model-free Controller.” A Solver is like a meticulous planner; it needs a full map or theory of its environment to predict the outcomes of its actions and devise a step-by-step plan to reach a goal. Think of it as having a detailed blueprint before starting construction. On the other hand, a Controller is more like an explorer; it doesn’t need a complete map and can act by observing only its immediate surroundings. It learns to react to situations as they arise, without a grand plan.
The core idea presented is to leverage Meta-Interpretive Learning (MIL), a form of Inductive Logic Programming, to first teach a Solver how to navigate. MIL is particularly powerful because it can learn recursive programs, which are essential for general problem-solving. Once the Solver has mastered planning in various environments, its successful navigation paths are then used as examples to train the model-free Controller. This innovative approach allows the Controller to learn effective behaviors without ever needing a full model of the environment itself.
The Solver, once learned, can generate a sequence of actions to move from a starting point to a goal, much like finding a path through a maze. The Controller, in contrast, operates using what are called Finite State Controllers (FSCs). These FSCs are essentially sets of rules that map a current internal state and an observation (e.g., what’s passable around it) to an action and a next internal state. They don’t hold a map; they just react to what they perceive.
A significant challenge for model-free agents, especially in environments with open areas or ambiguous paths, is getting stuck in loops. To address this, the research extends the concept of FSCs to “Nondeterministic FSCs” and introduces specialized “executors” that run these controllers. These executors include features like backtracking (allowing the agent to retrace steps in a simulated environment) and “Simultaneous Localisation and Mapping” (SLAM). SLAM helps the agent build a mental map as it explores, marking visited locations to avoid endlessly circling in open spaces, making the Controller more robust in complex environments.
The researchers implemented two new Prolog libraries: “Controller Freak” for learning FSCs from solvers, and “Grid Master” for managing grid-based navigation problems. They conducted experiments on two types of grid environments: randomly generated mazes and “Lake maps” (open areas with obstacles). The results were compelling: the learned model-free Controller, especially when paired with SLAM-enabled executors, was able to solve the same navigation problems as the model-based Solver, demonstrating the equivalence in their problem-solving capabilities. This indicates a promising path for creating autonomous agents that are both intelligent planners and adaptable explorers.
Also Read:
- Navigating the Future: How Deep Reinforcement Learning is Reshaping Autonomous Path Planning
- Guiding Autonomous Agents with Self-Awareness: The Constitutional Controller for Safe Navigation
For more in-depth details, you can read the full research paper here.


