Optimizing Urban Mobility: A New Multi-Agent Reinforcement Learning Approach for Resource Allocation

TLDR: HAG-PS is a new multi-agent reinforcement learning system designed to dynamically allocate urban mobility resources like shared bikes. It uses a hierarchical structure, adaptive agent grouping, and learnable identity embeddings to efficiently share policies and manage resources in large city environments. Tested with NYC bike sharing data, HAG-PS significantly improved bike availability and rebalancing compared to existing methods.

Urban environments face a constant challenge in balancing the demand and supply of mobility resources like shared bikes, e-scooters, and ride-sharing vehicles. Efficient allocation of these resources is vital for smooth urban mobility. Traditional methods often struggle with the dynamic nature of city environments and the sheer scale of operations.

A new research paper introduces a novel approach called Hierarchical Adaptive Grouping-based Parameter Sharing (HAG-PS) to tackle these complex issues using multi-agent reinforcement learning (MARL). The core idea is to enable regional coordinators (agents) to dynamically and adaptively share policies for distributing mobility resources, while also ensuring memory efficiency for city-wide deployment.

Addressing Key Challenges

The researchers identified two primary challenges in applying MARL to mobility resource allocation:

How to dynamically and adaptively share the mobility resource allocation policy among various coordinating agents.
How to achieve scalable and memory-efficient parameter sharing in a large urban setting.

HAG-PS addresses these by incorporating several innovative designs. It uses a hierarchical approach that considers both global and local information about mobility resource states, such as their distribution across different regions. This allows for more dynamic and adaptive policy sharing. Furthermore, the system employs an adaptive agent grouping mechanism that can split or merge groups of agents based on how similar their encoded trajectories (states, actions, and rewards) are. This ensures that agents with similar needs or behaviors can share policies effectively. To allow for individual agent specialization beyond simple policy copying, HAG-PS also includes learnable identity (ID) embeddings for each agent.

How HAG-PS Works

The system discretizes the urban service area into numerous rectangular regions and divides the time horizon into intervals. Each agent, acting as a re-allocator, manages resources within a specific region at each time interval. The system considers a global state, which includes temporal information (time of day, day of week), distribution of available resources, historical pickup statistics, and urban environment features like roads and points of interest.

Agents determine actions by deciding how many mobility resources to relocate to adjacent regions (north, south, east, and west). The system then updates resource availability based on these actions, pickup requests, and drop-offs. A reward function guides the learning process, favoring high service ratios (fulfilled demand), penalizing unfulfilled demand, and discouraging excessive relocation costs.

The hierarchical adaptive grouping is central to HAG-PS. It dynamically assigns roles to agents by forming global groups for macro-coordination (e.g., for a district) and local groups for micro-coordination (e.g., for neighborhoods). Agents within a global group share a feature network, while local groups maintain compact actor-critic networks. After each learning episode, agents encode their recent trajectories, and these embeddings are used to decide whether to split or merge groups. For instance, if agents within a group become too dissimilar in their behavior, the group might split. Conversely, similar groups might merge. The system also adaptively adjusts how frequently these regrouping operations occur, making them less frequent when the system’s behavior stabilizes.

Experimental Validation

The researchers conducted extensive experiments using real-world NYC bike sharing data, comprising over 1.2 million trips from January 2024. The study area covered 106 one-square-kilometer regions in central Manhattan. HAG-PS was compared against several baseline approaches, including methods with no sharing, full sharing, selective sharing, and dynamic sharing.

The results demonstrated HAG-PS’s superior performance. It achieved a fulfilled service ratio of 77.21% and rebalanced 472,212 bikes, outperforming all other baselines. Ablation studies, where specific components of HAG-PS were removed, highlighted the importance of each design element: the identity embeddings, the split-merge operations, the hierarchical grouping, and the adaptive regrouping period all contributed significantly to the overall performance. For example, removing the hierarchical adaptive grouping led to a 4% decrease in fulfilled service ratio, underscoring its critical role.

Also Read:

Conclusion and Future Directions

HAG-PS offers a robust solution for dynamic mobility resource allocation, effectively addressing the challenges of adaptive policy sharing and memory efficiency in urban-scale settings. The successful application to NYC bike sharing data validates its potential. Future work will involve expanding experimental studies and evaluating the system with multi-city data. For more technical details, you can refer to the full research paper available at arXiv.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Optimizing Urban Mobility: A New Multi-Agent Reinforcement Learning Approach for Resource Allocation

Addressing Key Challenges

How HAG-PS Works

Experimental Validation

Conclusion and Future Directions

Gen AI News and Updates

New AI Algorithm Prevents Self-Sabotage in Cooperative Multi-Agent Learning

MAC-Flow: A New Framework for Efficient Multi-Agent Coordination

Advancing Multi-Agent Learning: Faster Convergence to Coarse Correlated Equilibrium in Dynamic Games

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates