TLDR: HeMAC is a new benchmark environment for Heterogeneous Multi-Agent Reinforcement Learning (HeMARL), where agents have different capabilities. It offers three challenges with increasing complexity and agent diversity (Quadcopter, Observer, Provisioner) to test how well AI algorithms coordinate diverse teams. Initial tests show that current MARL algorithms struggle significantly as heterogeneity increases, highlighting HeMAC’s value as a testbed and the need for new HeMARL research.
Multi-Agent Reinforcement Learning (MARL) is a rapidly expanding field that extends the capabilities of Deep Reinforcement Learning to scenarios involving multiple intelligent agents. While much of the early research focused on homogeneous agents—teams of identical robots or entities—the real world is often far more complex, featuring diverse agents with unique abilities, sensors, and resources. This is the realm of Heterogeneous Multi-Agent Reinforcement Learning (HeMARL).
Imagine a disaster response team comprising aerial drones, ground robots, and human operators, each with distinct roles and limitations. Coordinating such a diverse team effectively is a prime example of a HeMARL problem. Despite the prevalence of such scenarios in real-world applications, HeMARL has remained relatively underexplored, largely due to a significant gap: the lack of standardized environments to test and benchmark new algorithms.
Just as environments like the Arcade Learning Environment (ALE) and StarCraft Multi-Agent Challenge (SMAC) have driven progress in single-agent and homogeneous MARL, a similar rigorous testbed was missing for cooperative HeMARL. This often led researchers to use overly simplistic environments where most algorithms performed well, or weakly heterogeneous settings that didn’t capture the full complexity of the challenge.
Introducing the Heterogeneous Multi-Agent Challenge (HeMAC)
To address this critical need, researchers from THALES, cortAIx Labs Canada, have introduced the Heterogeneous Multi-Agent Challenge (HeMAC). This innovative benchmarking environment, detailed in their paper The Heterogeneous Multi-Agent Challenge, is built upon the PettingZoo standard, a multi-agent extension of the popular Gymnasium framework. HeMAC provides a suite of challenges designed to evaluate MARL algorithms in settings with varied and controllable complexity and agent heterogeneity.
HeMAC features a 2D physics-based environment where a team of autonomous agents with distinct capabilities must coordinate to find and reach moving targets on a randomly generated map. The environment introduces three primary agent types, each with specialized roles:
- Quadcopter: Agile, low-altitude flying agents capable of reaching targets, but with limited energy and carry capacity in more complex scenarios.
- Observer: High-speed, high-altitude flying agents with a large field of view, primarily responsible for target detection and communication to guide Quadcopters. They have no energy constraints but must stay within range of buildings for communication.
- Provisioner: Autonomous ground vehicles that navigate road networks. In the most complex challenges, they can recharge Quadcopters and retrieve targets, acting as crucial logistical support.
The challenge in HeMAC stems from two main components: role specialization, driven by these capability differences, and the necessity for effective coordination among agents to achieve optimal performance. The tasks are inspired by complex real-world problems like the multi-depot vehicle routing problem.
The HeMAC Challenges
HeMAC offers three progressively complex challenges:
- Simple Fleet: Two types of agents (Quadcopter and Observer) cooperate to reach a single moving target as many times as possible. Observers guide Quadcopters, which are the only ones that can physically reach the target.
- Fleet: Extends to multi-target search with increased heterogeneity. Quadcopters gain limited energy and must recharge, while Observers have no energy constraints. Obstacles are introduced, requiring Quadcopters to navigate around them, while Observers fly above.
- Complex Fleet: The most challenging testbed, introducing the Provisioner agent. Quadcopters now have limited energy and carry capacity, needing to bring targets back to a gathering point one at a time. Provisioners recharge Quadcopters and retrieve targets, navigating a road network. This challenge involves multi-modal mobility and resource transfer, demanding sophisticated coordination.
A key aspect of HeMAC is its deliberate introduction of mixed (continuous and discrete) observation and action spaces with differing dimensions for each agent type. This pushes research towards HeMARL techniques that don’t rely on padding or homogenizing these spaces, which can often lead to inefficiencies.
Performance of Current Algorithms
The researchers benchmarked several state-of-the-art MARL algorithms, including Independent Proximal Policy Optimization (IPPO), Multi-Agent Proximal Policy Optimization (MAPPO), and QMIX, against these challenges. The results were telling: while advanced algorithms like MAPPO performed well in simpler cooperative tasks, their effectiveness significantly declined as heterogeneity increased. IPPO sometimes even outperformed MAPPO in highly diverse scenarios. QMIX, which assumes shared action values and agent homogeneity, struggled considerably, even in simpler settings.
These findings underscore HeMAC’s value as a rigorous testbed, demonstrating that current MARL methods often lack the reliability to effectively tackle problems with high levels of agent heterogeneity. The benchmark highlights the urgent need for further research into novel HeMARL approaches that can handle such complexity and diversity.
Also Read:
- Optimizing Multi-Agent System Initialization for Enhanced Collaboration
- EvA-RL: Training Reinforcement Learning Policies for Easier and More Accurate Evaluation
Future Directions
The HeMAC environment is open-sourced, inviting community contributions to expand its scenarios and agent types. This initiative aims to establish HeMAC as a standard benchmark, fostering the development of innovative methods for heterogeneous multi-agent reinforcement learning and ultimately bridging the gap between current research and the demands of real-world multi-agent systems.


