A New Approach to Balancing Delivery Routes: The Generate-and-Split Framework for Multiple Traveling Salesmen

TLDR: The Generate-and-Split (GaS) framework is a novel two-stage reinforcement learning approach for the Min-Max Multiple Traveling Salesmen Problem (m3-TSP). It uses an LSTM-enhanced path generator to create a single route for all cities, which is then optimally split among multiple salesmen by an efficient algorithm to minimize the longest tour. Experiments show GaS significantly outperforms existing learning-based methods in solution quality and transferability across different problem scales and distributions.

The Min-Max Multiple Traveling Salesmen Problem (m3-TSP) is a complex challenge that extends the classic Traveling Salesman Problem. Instead of one salesman, it involves multiple salesmen who start from a common depot, visit all cities exactly once, and return to the depot. The primary goal of m3-TSP is to minimize the length of the longest individual tour among all salesmen, aiming for better load balance and reduced worst-case latency in real-world applications like logistics and drone delivery.

The Challenge of m3-TSP

Solving m3-TSP is notoriously difficult because it belongs to a class of problems known as NP-hard. This means that finding an exact solution becomes computationally impractical as the number of cities grows. Traditional methods, such as mathematical programming or heuristic search, often require significant time to find even a feasible solution. In recent years, learning-based approaches, particularly those using reinforcement learning (RL), have emerged as promising alternatives for quickly generating high-quality approximate solutions.

Existing learning-based methods often fall into two categories: end-to-end training, where the model directly generates solutions, and two-stage methods, which decompose the problem. While end-to-end training aims for optimal solutions directly, it can be complex. Two-stage methods simplify the learning objective by breaking the problem into parts, like dividing cities among salesmen or generating a single path and then splitting it. However, this decomposition can sometimes lead to inconsistent optimization and potentially lower solution quality compared to end-to-end approaches.

Introducing the Generate-and-Split (GaS) Framework

To overcome these limitations, researchers Wen Wang, Xiangchen Wu, Liang Wang, Hao Hu, Xianping Tao, and Linghao Zhang from Nanjing University and State Grid Sichuan Electric Power Research Institute have proposed a novel two-stage framework called Generate-and-Split (GaS). This framework intelligently combines reinforcement learning with an optimal splitting algorithm within a joint training process. The core idea is to simplify the learning task for the RL component while ensuring the splitting part is handled optimally and efficiently.

The GaS framework operates in two main steps. First, an RL-based path generator creates a single sequence of all cities to be visited. Unlike methods that assign cities to salesmen directly, this generator focuses on finding an efficient overall path. Second, an optimal splitting algorithm takes this generated path and divides it into individual tours for each salesman. The objective of this splitting is to minimize the maximum tour length, aligning directly with the m3-TSP goal.

How GaS Works: The Path Generator

The path generator is designed using reinforcement learning, specifically modeled as a Partially Observable Markov Decision Process (POMDP). This formulation acknowledges that the generator doesn’t have complete information about how the path will eventually be split among salesmen while it’s making decisions. To address this “partial observability,” the framework incorporates an LSTM (Long Short-Term Memory) enhanced decoder.

The encoder part of the generator processes the city coordinates and the ratio of cities to salesmen, creating embeddings that capture the problem’s structure. The decoder then uses these embeddings, along with a hidden state maintained by the LSTM, to sequentially select the next unvisited city. The LSTM helps the model remember past decisions and infer the true state of the problem, even with incomplete information. The reward for the RL agent is the negative of the maximum tour cost calculated by the splitting algorithm at the very end, encouraging the generator to produce paths that lead to balanced tours.

The Optimal Splitting Algorithm

A crucial component of GaS is its optimal splitting algorithm. After the path generator creates a sequence of all cities, this algorithm efficiently determines the best way to divide that sequence into `m` tours for the salesmen. A naive approach of trying every possible split point would be incredibly slow, especially with many salesmen. The GaS framework, however, introduces a highly efficient, deterministic algorithm that can approximate the optimal splitting strategy with near-linear scalability.

This algorithm works by converting the splitting problem into a decision problem: “Can the path be split among `m` salesmen such that no tour exceeds a certain cost threshold?” This decision is quickly answered using a greedy check. By repeatedly applying this greedy check within a binary search framework, the algorithm rapidly finds the minimum possible maximum tour cost for any given path. This efficient and accurate splitting mechanism provides a strong, consistent reward signal for training the RL-based path generator.

Also Read:

Performance and Transferability

Extensive experiments have shown that the GaS framework significantly outperforms existing learning-based approaches, such as Equity-Transformer (Eqt) and Dpn, in terms of solution quality. For instances with 50 and 100 cities, GaS achieved better results in most test configurations, demonstrating a statistically significant improvement. While learning-based methods still have some ground to cover compared to highly optimized classical heuristics like LKH3 for specific scenarios (e.g., very few salesmen), GaS shows a clear advantage in inference efficiency, solving problems in under one second.

Furthermore, GaS exhibits excellent transferability. When tested on problems with different city distributions (Rotation, Gaussian, Explosion), the trained GaS model performed as well as or better than baselines in over 90% of cases. It also demonstrated better performance when fine-tuned and applied to larger-scale problems (200 and 500 cities), especially when the number of salesmen varied significantly from the training settings. This indicates that GaS can adapt well to new and larger problem instances, a critical feature for real-world deployment.

In conclusion, the Generate-and-Split framework offers a powerful and efficient solution to the Min-Max Multiple Traveling Salesmen Problem. By decoupling path generation from splitting and integrating an LSTM-enhanced RL model with an optimal splitting algorithm, it achieves superior solution quality and impressive transferability. This work highlights a promising direction for combining algorithmic design with reinforcement learning to tackle complex combinatorial optimization problems. For more details, you can refer to the full research paper available here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

A New Approach to Balancing Delivery Routes: The Generate-and-Split Framework for Multiple Traveling Salesmen

The Challenge of m3-TSP

Introducing the Generate-and-Split (GaS) Framework

How GaS Works: The Path Generator

The Optimal Splitting Algorithm

Performance and Transferability

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates