TLDR: This research introduces a single-agent Reinforcement Learning (RL) model for regional adaptive traffic signal control (ATSC), departing from common multi-agent frameworks. The model uses a centralized agent to coordinate signal timings across multiple intersections, defining state and reward functions based on queue length, which can be reliably estimated using probe vehicle data. Evaluated using the SUMO simulation platform, the proposed model effectively mitigates large-scale regional congestion and significantly reduces total travel time, demonstrating a scalable and practical solution for urban traffic management.
Urban traffic congestion is a persistent challenge affecting quality of life, safety, environment, and economic performance in cities worldwide. A major contributor to this issue is queuing at intersections. Effective Traffic Signal Control (TSC) systems are crucial for managing traffic flow and alleviating congestion.
Traditionally, traffic engineers have relied on optimization models with assumptions about traffic dynamics. While these models simplify the problem, they often struggle to adapt to the complex and unpredictable nature of real-world urban traffic. This limitation has paved the way for more advanced, learning-based approaches.
Reinforcement Learning (RL), a branch of artificial intelligence, has emerged as a promising solution for developing smart TSC systems. RL allows systems to learn optimal strategies through trial and error, making it well-suited for dynamic traffic environments. When dealing with multiple intersections, many researchers have adopted multi-agent frameworks, where several AI agents work together. However, these multi-agent systems can introduce challenges related to scalability and coordination overhead.
A recent study proposes a novel approach: a single-agent Reinforcement Learning model for regional adaptive traffic signal control. This model challenges the prevailing multi-agent paradigm by advocating for a centralized control system, much like how a single control center typically manages traffic across an entire urban area. This single agent can monitor traffic conditions across all roads and coordinate all intersections, simplifying the policy optimization process by removing the need for complex inter-agent coordination.
A key innovation of this model is its compatibility with probe vehicle technology. Probe vehicles, such as ride-hailing cars or navigation system users, provide real-time travel time data across most urban roads without requiring additional infrastructure. This data is particularly valuable for estimating queue length, a critical metric for assessing congestion that traditional sensors often struggle to capture accurately. By defining the state and reward functions based on queue length, and designing actions to regulate queue dynamics, the model enhances policy learning efficiency and its potential for widespread deployment.
The RL model’s design includes three core components: state, action, and reward. The state observed by the agent comprises the congestion state (quantified by queue length for each road segment), the signal phase scheme (time allocation for traffic movements at each intersection), and a regional state representation using an adjacency matrix. This matrix effectively unifies signal parameters and traffic states, preserving the network’s topological connectivity.
The action space is designed for scalability. Instead of simultaneously adjusting all intersections, the agent first selects an intersection and then chooses one of three actions: modifying its signal phase split by a predefined increment, decrement, or no change. This linear growth of the action space with the number of intersections avoids the exponential explosion seen in simultaneous adjustment approaches.
The reward function is crucial for guiding the agent’s learning. It is defined based on the queue length of each link in the region, aiming to alleviate congestion and minimize total travel time. The model applies penalties for light and heavy congestion, with a higher penalty for severe congestion, encouraging the agent to proactively manage traffic flow.
The researchers implemented the TSC environment using Gymnasium, an open-source Python library, and utilized SUMO, a traffic simulation software, for experiments. The DreamerV3 algorithm was chosen for policy training due to its data-efficient learning capabilities, which significantly reduce the need for extensive environment interaction compared to traditional RL methods.
Experimental results, evaluated across two traffic scenarios, demonstrated the effectiveness of the proposed single-agent model. When compared to a baseline with a fixed signal-timing scheme, the RL-based TSC model significantly reduced queue lengths and achieved a substantial reduction in average travel time, reaching 63% of the base case value. This indicates successful congestion control and improved traffic flow.
Also Read:
- Modeling Realistic Pedestrian-Driver Interactions with Human-Like Constraints
- A Dual-Agent Framework for Aligning LLMs with Human Travel Behavior
This research marks a significant step towards more adaptive and efficient urban traffic management. By leveraging a centralized single-agent reinforcement learning framework and integrating with readily available probe vehicle data, the model offers a practical and scalable solution for mitigating urban congestion. Future work will focus on expanding experiments to larger road networks and integrating graph neural networks for even greater data efficiency and policy learning in complex traffic environments. You can read the full paper here.


