TLDR: Researchers developed a new reinforcement learning environment for Generals.io and trained an AI agent that achieved top 0.003% ranking on the human 1v1 leaderboard. The agent uses supervised pre-training and self-play with reward shaping and memory features, demonstrating advanced strategic behaviors and outperforming previous state-of-the-art bots.
A new research paper titled “Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning” introduces a significant advancement in artificial intelligence within the realm of real-time strategy games. Authored by Matej Straka and Martin Schmid from Charles University and EquiLibre Technologies, Inc., this work presents a robust new environment and a highly competitive AI agent for the popular online game Generals.io. The paper was published on July 9, 2025.
Generals.io is a browser-based real-time strategy game played on a grid, where players aim to be the last one standing by capturing their opponent’s general. The game involves expanding territory, managing armies, and strategic maneuvering under partial observability, meaning players can only see their owned cells and their immediate surroundings. This complexity, combined with an active human player base, makes it an ideal benchmark for multi-agent reinforcement learning research, offering challenges comparable to larger games like StarCraft II but with a lighter computational footprint.
The core contributions of this research are twofold. Firstly, the authors developed a new real-time strategy environment that is vectorized, compatible with popular reinforcement learning frameworks like Gymnasium and PettingZoo, and capable of running thousands of frames per second on standard hardware. This environment is designed to be a flexible and customizable testbed for AI experimentation, even allowing trained agents to be deployed directly to official Generals.io servers for real-world comparison.
Secondly, they developed a Proximal Policy Optimization (PPO)-based agent that achieved remarkable performance. This agent, after just 36 hours of training on a single H100 GPU, reached the top 0.003% of the 1v1 human leaderboard. The training process involved two key stages: an initial phase of behavior cloning, where the agent learned from a curated dataset of expert human replays, followed by self-play fine-tuning. During self-play, the agent continuously improved by competing against a pool of its own past versions. To enhance learning efficiency and guide the agent towards more robust strategies, the researchers incorporated potential-based reward shaping and memory features, allowing the agent to retain crucial information about the game state over time.
The evaluation of the agent demonstrated its superior performance against both human experts and existing bots. Named “zero v3,” the agent consistently ranked among the top 25 players globally. In head-to-head matches, it achieved a 54.82% win-rate against “Human.exe,” which was previously considered the state-of-the-art community-developed bot, engineered without machine learning. The research also highlighted several emergent strategic behaviors exhibited by the AI, including sophisticated feints and sidesteps, effective “snowballing” (converting small leads into larger ones), and “backdooring” tactics where it creates isolated pockets within enemy territory for surprise attacks. While highly effective, the agent did show some limitations, such as occasionally getting stuck in dead ends or focusing too narrowly on one aspect of the game without balancing offense, defense, and resource acquisition.
Also Read:
- EMERALD AI Model Achieves Human-Level Performance in Complex Virtual Worlds
- Advancing AI in 2048: New Reinforcement Learning Approaches for Delayed Rewards
This work establishes a new, accessible, yet strategically rich benchmark for the reinforcement learning community, paving the way for further innovations in multi-agent AI. Future research directions include extending the benchmark to multi-team and free-for-all game modes, adopting the JAX framework for higher performance, and exploring graph neural networks for agent policies to better capture the game’s inherent graphical structure. For more technical details, you can refer to the full research paper available here.


