TLDR: EvoCurr is a novel framework that enables Large Language Models (LLMs) to master complex decision-making tasks by generating their own adaptive learning curriculum. It uses two LLM agents: a ‘curriculum designer’ that creates progressively difficult tasks based on performance feedback, and a ‘behavior coder’ that generates Python decision-tree scripts to solve these tasks. Tested in StarCraft II, EvoCurr significantly improves task success rates and solution efficiency by allowing the solver LLM to incrementally acquire skills, demonstrating a promising path for enhancing AI reasoning in high-complexity domains.
Large Language Models, or LLMs, have shown incredible abilities in many areas, from writing code to making complex decisions. However, they often struggle when faced with problems that require very deep reasoning over many steps. This is because these complex problems lack clear, structured guidance, leading to inefficiencies or even failure.
To tackle this challenge, researchers have introduced a new framework called EvoCurr. This innovative system is designed to help LLMs progressively learn and master complex decision-making tasks by creating its own learning path, much like a human student would learn by starting with easier concepts and gradually moving to harder ones.
EvoCurr operates with two main components: a ‘solver’ LLM and a ‘curriculum-generation’ LLM. The solver LLM is responsible for generating Python decision-tree scripts, which are essentially sets of rules that guide its decisions. The curriculum LLM, on the other hand, acts as a dynamic teacher. It designs a sequence of problem instances, starting simple and gradually increasing in difficulty. What makes EvoCurr unique is that this curriculum isn’t static; it adapts in real-time based on how well the solver LLM is performing.
If the solver LLM successfully completes a task, the curriculum LLM makes the next task a bit harder, pushing the solver to learn more. If the solver struggles or fails, the curriculum LLM eases the challenge, allowing the solver to reinforce its understanding before moving forward. This continuous feedback loop ensures that the solver LLM is always learning at an optimal pace, never overwhelmed by too much difficulty too soon, nor bored by tasks that are too easy.
The framework breaks down the learning process into three interconnected stages. First, the curriculum design process generates a series of tasks that gradually increase in complexity towards a final, challenging objective. Each task specifies details like unit configurations, environmental settings, and win conditions. Second, the code synthesis stage involves the ‘behavior coder’ LLM translating each curriculum task into executable Python code for the decision tree. This stage includes planning the strategy, generating the code, and compiling it. Finally, the third stage involves the actual game interaction, where the compiled decision tree acts as the decision-making policy within the StarCraft II environment.
A key part of the behavior coder is its ‘planner–coder–critic’ loop. The planner creates high-level strategies. The coder then translates these strategies into actual Python code. The critic analyzes the performance of the generated code, identifying errors or areas for improvement, and provides feedback to refine the strategy and code in subsequent attempts. This iterative refinement helps the solver LLM to develop increasingly sophisticated and context-aware decision-making structures.
The effectiveness of EvoCurr was tested on challenging StarCraft II micro-management scenarios, a complex real-time strategy game known for its intricate decision-making requirements. The experiments showed that EvoCurr significantly improved task success rates and solution efficiency compared to traditional direct problem-solving methods. While not every experimental run achieved complete mastery of the final task, one path successfully navigated all stages, demonstrating the framework’s capability to generate advanced multi-unit strategies.
Also Read:
- Teaching LLMs to Be Concise: A New Approach to Efficient Reasoning
- Boosting LLM Reasoning: A New Approach to Overcome Learning Plateaus
The research highlights that LLM-driven curriculum learning holds strong potential for enhancing automated reasoning in complex real-world domains. Although the current single-agent architecture might show a bias towards certain unit types, future work aims to address this by exploring a multi-agent framework where different agents specialize in controlling specific unit types, leading to more balanced and robust performance. For more details, you can read the full research paper here.


