New AI Agent Masters Generals.io, Reaching Top 0.003% on Human Leaderboard

TLDR: Researchers developed a new reinforcement learning environment for Generals.io and trained an AI agent that achieved top 0.003% ranking on the human 1v1 leaderboard. The agent uses supervised pre-training and self-play with reward shaping and memory features, demonstrating advanced strategic behaviors and outperforming previous state-of-the-art bots.

A new research paper titled “Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning” introduces a significant advancement in artificial intelligence within the realm of real-time strategy games. Authored by Matej Straka and Martin Schmid from Charles University and EquiLibre Technologies, Inc., this work presents a robust new environment and a highly competitive AI agent for the popular online game Generals.io. The paper was published on July 9, 2025.

Generals.io is a browser-based real-time strategy game played on a grid, where players aim to be the last one standing by capturing their opponent’s general. The game involves expanding territory, managing armies, and strategic maneuvering under partial observability, meaning players can only see their owned cells and their immediate surroundings. This complexity, combined with an active human player base, makes it an ideal benchmark for multi-agent reinforcement learning research, offering challenges comparable to larger games like StarCraft II but with a lighter computational footprint.

The core contributions of this research are twofold. Firstly, the authors developed a new real-time strategy environment that is vectorized, compatible with popular reinforcement learning frameworks like Gymnasium and PettingZoo, and capable of running thousands of frames per second on standard hardware. This environment is designed to be a flexible and customizable testbed for AI experimentation, even allowing trained agents to be deployed directly to official Generals.io servers for real-world comparison.

Secondly, they developed a Proximal Policy Optimization (PPO)-based agent that achieved remarkable performance. This agent, after just 36 hours of training on a single H100 GPU, reached the top 0.003% of the 1v1 human leaderboard. The training process involved two key stages: an initial phase of behavior cloning, where the agent learned from a curated dataset of expert human replays, followed by self-play fine-tuning. During self-play, the agent continuously improved by competing against a pool of its own past versions. To enhance learning efficiency and guide the agent towards more robust strategies, the researchers incorporated potential-based reward shaping and memory features, allowing the agent to retain crucial information about the game state over time.

The evaluation of the agent demonstrated its superior performance against both human experts and existing bots. Named “zero v3,” the agent consistently ranked among the top 25 players globally. In head-to-head matches, it achieved a 54.82% win-rate against “Human.exe,” which was previously considered the state-of-the-art community-developed bot, engineered without machine learning. The research also highlighted several emergent strategic behaviors exhibited by the AI, including sophisticated feints and sidesteps, effective “snowballing” (converting small leads into larger ones), and “backdooring” tactics where it creates isolated pockets within enemy territory for surprise attacks. While highly effective, the agent did show some limitations, such as occasionally getting stuck in dead ends or focusing too narrowly on one aspect of the game without balancing offense, defense, and resource acquisition.

Also Read:

This work establishes a new, accessible, yet strategically rich benchmark for the reinforcement learning community, paving the way for further innovations in multi-agent AI. Future research directions include extending the benchmark to multi-team and free-for-all game modes, adopting the JAX framework for higher performance, and exploring graph neural networks for agent policies to better capture the game’s inherent graphical structure. For more technical details, you can refer to the full research paper available here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

New AI Agent Masters Generals.io, Reaching Top 0.003% on Human Leaderboard

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates