TLDR: AgentGym-RL is a new open-source framework for training LLM agents to make multi-turn decisions in diverse real-world environments using reinforcement learning (RL), without needing supervised fine-tuning. It features a modular architecture and introduces ScalingInter-RL, a progressive training method that balances exploration and exploitation by gradually increasing interaction turns. Experiments show that AgentGym-RL-trained agents, even smaller ones, can match or exceed the performance of larger commercial models across various tasks, demonstrating that smart RL training and test-time compute can be more effective than just scaling model size.
The development of autonomous AI agents capable of navigating complex, real-world tasks is a rapidly advancing field. These agents are expected to learn and adapt through interaction with their environments, much like humans do. However, a significant challenge has been the lack of a unified, interactive framework for training these agents from scratch using reinforcement learning (RL), without relying on initial supervised fine-tuning.
Addressing this gap, researchers from Fudan University and ByteDance Seed have introduced AgentGym-RL, a novel framework designed to train Large Language Model (LLM) agents for multi-turn interactive decision-making through reinforcement learning. This framework is built with a modular and decoupled architecture, ensuring it is highly flexible and can be easily extended for various research needs.
A Comprehensive Framework for Agent Training
AgentGym-RL stands out by encompassing a wide array of realistic scenarios, making it suitable for diverse applications. These include:
- Web Navigation: Agents learn to interact with dynamic websites for tasks like booking flights or extracting information.
- Deep Search: Agents perform multi-step, goal-directed queries using tools like browsers and Python interpreters, requiring strong information-seeking and reasoning skills.
- Digital Games: Agents explore and solve problems in interactive game environments, focusing on real-time decision-making and strategy.
- Embodied Tasks: Agents control virtual or physical bodies for navigation and manipulation, demanding spatial reasoning and goal-directed planning.
- Scientific Tasks: Agents conduct experiments and solve problems in knowledge-intensive settings, requiring precise execution and evidence-based reasoning.
The framework supports mainstream RL algorithms such as PPO, GRPO, and REINFORCE++, alongside other training paradigms like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). It also incorporates extensive engineering optimizations for scalability and reliability, including improved rollout parallelization and memory-leak mitigation. To foster collaboration and transparency, AgentGym-RL is open-source and includes a visualized user interface for detailed analysis of agent behavior.
ScalingInter-RL: A Progressive Training Approach
A key innovation within AgentGym-RL is ScalingInter-RL, a training approach designed to achieve a better balance between exploration and exploitation during RL optimization. Traditional RL training can struggle with stability when agents are given too many interaction turns early on, leading to unproductive exploration and training collapse. Conversely, too few turns can limit an agent’s ability to discover diverse problem-solving strategies.
ScalingInter-RL addresses this by progressively extending the agent-environment interaction horizon. It begins by restricting the number of interactions in early stages, encouraging the agent to exploit its current knowledge and master basic skills efficiently. As training progresses, the interaction horizon gradually increases, promoting deeper exploration, refinement of behaviors, and the ability to tackle more complex challenges. This phased approach helps agents develop more diverse behaviors and reduces the likelihood of performance collapse in long-horizon tasks.
Also Read:
- RLFactory: Empowering LLMs with Advanced Multi-Turn Tool Use
- Reinforcement Learning Unlocks Advanced Reasoning in Large Language Models
Impressive Performance and Key Insights
Extensive experiments validate the stability and effectiveness of both the AgentGym-RL framework and the ScalingInter-RL approach. Agents trained with this framework have demonstrated performance that matches or even surpasses commercial models across 27 tasks in diverse environments. For instance, an open-source 7B-scale model trained with AgentGym-RL achieved an average improvement of 33.65 points, rivaling or outperforming larger proprietary models like OpenAI-o3 and Gemini-2.5-Pro.
A significant insight from the research is that strategic investment in post-training and test-time computation can be more impactful than simply increasing a model’s parameter count. The 7B ScalingInter-RL model, for example, significantly outperformed much larger models (e.g., Llama3.1-70B and Qwen2.5-72B) in average success rate, highlighting the power of targeted training and inference-time optimization. The study also found that the effectiveness of reinforcement learning varies with environmental structure, yielding more substantial gains in simulated worlds with clear rules (like TextCraft, BabyAI, and SciWorld) compared to more open-ended environments (like WebArena and Deep Search).
The AgentGym-RL framework, along with the ScalingInter-RL method, represents a significant step forward in training LLM agents for complex decision-making. The researchers plan to open-source the complete framework, including code and datasets, to empower the broader research community in developing the next generation of intelligent agents. For more details, you can refer to the original research paper.


