AgentGym-RL: A New Framework for Training LLM Agents in Complex Environments

TLDR: AgentGym-RL is a new open-source framework for training LLM agents to make multi-turn decisions in diverse real-world environments using reinforcement learning (RL), without needing supervised fine-tuning. It features a modular architecture and introduces ScalingInter-RL, a progressive training method that balances exploration and exploitation by gradually increasing interaction turns. Experiments show that AgentGym-RL-trained agents, even smaller ones, can match or exceed the performance of larger commercial models across various tasks, demonstrating that smart RL training and test-time compute can be more effective than just scaling model size.

The development of autonomous AI agents capable of navigating complex, real-world tasks is a rapidly advancing field. These agents are expected to learn and adapt through interaction with their environments, much like humans do. However, a significant challenge has been the lack of a unified, interactive framework for training these agents from scratch using reinforcement learning (RL), without relying on initial supervised fine-tuning.

Addressing this gap, researchers from Fudan University and ByteDance Seed have introduced AgentGym-RL, a novel framework designed to train Large Language Model (LLM) agents for multi-turn interactive decision-making through reinforcement learning. This framework is built with a modular and decoupled architecture, ensuring it is highly flexible and can be easily extended for various research needs.

A Comprehensive Framework for Agent Training

AgentGym-RL stands out by encompassing a wide array of realistic scenarios, making it suitable for diverse applications. These include:

Web Navigation: Agents learn to interact with dynamic websites for tasks like booking flights or extracting information.
Deep Search: Agents perform multi-step, goal-directed queries using tools like browsers and Python interpreters, requiring strong information-seeking and reasoning skills.
Digital Games: Agents explore and solve problems in interactive game environments, focusing on real-time decision-making and strategy.
Embodied Tasks: Agents control virtual or physical bodies for navigation and manipulation, demanding spatial reasoning and goal-directed planning.
Scientific Tasks: Agents conduct experiments and solve problems in knowledge-intensive settings, requiring precise execution and evidence-based reasoning.

The framework supports mainstream RL algorithms such as PPO, GRPO, and REINFORCE++, alongside other training paradigms like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). It also incorporates extensive engineering optimizations for scalability and reliability, including improved rollout parallelization and memory-leak mitigation. To foster collaboration and transparency, AgentGym-RL is open-source and includes a visualized user interface for detailed analysis of agent behavior.

ScalingInter-RL: A Progressive Training Approach

A key innovation within AgentGym-RL is ScalingInter-RL, a training approach designed to achieve a better balance between exploration and exploitation during RL optimization. Traditional RL training can struggle with stability when agents are given too many interaction turns early on, leading to unproductive exploration and training collapse. Conversely, too few turns can limit an agent’s ability to discover diverse problem-solving strategies.

ScalingInter-RL addresses this by progressively extending the agent-environment interaction horizon. It begins by restricting the number of interactions in early stages, encouraging the agent to exploit its current knowledge and master basic skills efficiently. As training progresses, the interaction horizon gradually increases, promoting deeper exploration, refinement of behaviors, and the ability to tackle more complex challenges. This phased approach helps agents develop more diverse behaviors and reduces the likelihood of performance collapse in long-horizon tasks.

Also Read:

Impressive Performance and Key Insights

Extensive experiments validate the stability and effectiveness of both the AgentGym-RL framework and the ScalingInter-RL approach. Agents trained with this framework have demonstrated performance that matches or even surpasses commercial models across 27 tasks in diverse environments. For instance, an open-source 7B-scale model trained with AgentGym-RL achieved an average improvement of 33.65 points, rivaling or outperforming larger proprietary models like OpenAI-o3 and Gemini-2.5-Pro.

A significant insight from the research is that strategic investment in post-training and test-time computation can be more impactful than simply increasing a model’s parameter count. The 7B ScalingInter-RL model, for example, significantly outperformed much larger models (e.g., Llama3.1-70B and Qwen2.5-72B) in average success rate, highlighting the power of targeted training and inference-time optimization. The study also found that the effectiveness of reinforcement learning varies with environmental structure, yielding more substantial gains in simulated worlds with clear rules (like TextCraft, BabyAI, and SciWorld) compared to more open-ended environments (like WebArena and Deep Search).

The AgentGym-RL framework, along with the ScalingInter-RL method, represents a significant step forward in training LLM agents for complex decision-making. The researchers plan to open-source the complete framework, including code and datasets, to empower the broader research community in developing the next generation of intelligent agents. For more details, you can refer to the original research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AgentGym-RL: A New Framework for Training LLM Agents in Complex Environments

A Comprehensive Framework for Agent Training

ScalingInter-RL: A Progressive Training Approach

Impressive Performance and Key Insights

Gen AI News and Updates

Google Finance Unveils AI-Powered Deep Search and Prediction Market Integration

InfoFlow: Enhancing AI Agent Search Through Optimized Reward Feedback

Rubric-Guided AI Training for Better Medical Dialogue Systems

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates