TLDR: ReCode is a novel paradigm for AI agents that unifies planning and action within a single code representation. Unlike traditional methods that rigidly separate high-level plans from low-level actions, ReCode treats plans as abstract functions that are recursively broken down into executable steps. This allows agents to dynamically adjust their decision-making detail, from broad strategies to specific actions, enabling universal granularity control. Experiments show ReCode significantly improves performance and data efficiency across various complex tasks, laying the groundwork for more adaptive and capable AI.
In the rapidly evolving field of artificial intelligence, a new research paper introduces ReCode, a groundbreaking paradigm designed to enhance how AI agents make decisions. Authored by Zhaoyang Yu and a team of researchers from institutions including DeepWisdom and The Hong Kong University of Science and Technology (Guangzhou), ReCode addresses a fundamental limitation in current Large Language Model (LLM)-based agents: their inability to fluidly adapt decision-making across varying levels of detail.
The Challenge of Decision Granularity
Humans naturally excel at shifting between high-level plans and fine-grained actions. For instance, when preparing breakfast, one might decide on a broad plan like “making bacon and eggs” and then seamlessly transition to precise actions such as “cracking an egg.” This fluid adaptability is crucial for navigating the complexities of the real world. However, existing LLM-based agents often struggle with this. Traditional approaches typically separate planning from action, leading to rigid decision processes.
For example, agents following the ReAct paradigm strictly alternate between reasoning and basic actions, limiting them to fine-grained steps without strategic foresight. Other agents with dedicated planner modules also maintain a rigid boundary between high-level plans and low-level execution. These fixed structures prevent AI from dynamically adjusting its decision granularity as task complexities change, often resulting in brittle performance in dynamic environments.
ReCode’s Unified Approach: Planning as High-Level Action
The core insight behind ReCode is that planning and action are not distinct cognitive processes but rather decisions made at different levels of detail. A plan, in essence, is a higher-level action. ReCode, which stands for Recursive Code Generation, unifies these within a single code representation. It treats high-level plans as abstract “placeholder functions” that the agent then recursively breaks down into more detailed sub-functions until it reaches primitive, executable actions.
Imagine a task like “prepare breakfast.” ReCode might initially represent this as a placeholder function. This function would then be expanded into sub-functions like “get_ingredients()” and “cook_meal().” These, in turn, would be further decomposed until they become concrete actions such as “run(‘open refrigerator’)” or “run(‘turn on stove’).” This recursive process effectively dissolves the rigid boundary between planning and action, allowing the agent to dynamically control how detailed its decisions need to be.
How ReCode Works
The ReCode system begins by converting a natural language task instruction into an initial placeholder function. The agent’s policy model then expands this function into a block of child code. An executor processes this code sequentially: if it encounters a primitive action, it executes it directly in the environment. If it finds another placeholder function, it triggers a recursive expansion, asking the LLM to generate more detailed code for that specific sub-task. This process continues, building a hierarchical decision tree, until all parts of the task are broken down into executable primitive actions.
This dynamic approach also offers significant training advantages. Unlike traditional methods that produce flat sequences of actions, ReCode naturally generates structured decision trees. This rich, multi-granularity training data helps agents learn complex task decompositions and adaptive decision-making strategies more effectively, leading to better generalization and data efficiency.
Impressive Results and Efficiency
The researchers conducted extensive experiments across diverse environments, including ALFWorld (household tasks), WebShop (online shopping), and ScienceWorld (scientific experiments). ReCode consistently outperformed advanced baseline methods like ReAct and CodeAct. For instance, using GPT-4o mini, ReCode achieved an average performance improvement of over 20.9% compared to the best baseline. This robust performance was observed across various LLMs, including Gemini 2.5 Flash and DeepSeek-V3.1, demonstrating the paradigm’s broad applicability.
Beyond performance, ReCode proved to be remarkably cost-efficient. On average, a ReCode trajectory cost 78.9% less than ReAct and 84.4% less than CodeAct. This efficiency comes from ReCode’s structured exploration, which leads to shorter, more direct reasoning paths and fewer API calls. Furthermore, ReCode demonstrated superior data efficiency in training, achieving better results with significantly less training data compared to other methods.
Also Read:
- DeepAgent: Advancing AI with Autonomous Reasoning and Dynamic Tool Use
- Enhancing Robot Dexterity: A New Framework for Generalizable Skill Learning
Looking Ahead
While ReCode presents a powerful new direction for AI agents, its success still depends on the underlying LLM’s reasoning capabilities and the quality of initial examples. Future work will focus on enhancing models’ ability to understand and operate within the ReCode framework, potentially through specialized training or reinforcement learning to optimize hierarchical planning. These advancements promise to pave the way for more autonomous, capable, and adaptive AI systems.
For more technical details, you can read the full research paper here.


