spot_img
HomeResearch & DevelopmentA New Approach to Balancing Delivery Routes: The Generate-and-Split...

A New Approach to Balancing Delivery Routes: The Generate-and-Split Framework for Multiple Traveling Salesmen

TLDR: The Generate-and-Split (GaS) framework is a novel two-stage reinforcement learning approach for the Min-Max Multiple Traveling Salesmen Problem (m3-TSP). It uses an LSTM-enhanced path generator to create a single route for all cities, which is then optimally split among multiple salesmen by an efficient algorithm to minimize the longest tour. Experiments show GaS significantly outperforms existing learning-based methods in solution quality and transferability across different problem scales and distributions.

The Min-Max Multiple Traveling Salesmen Problem (m3-TSP) is a complex challenge that extends the classic Traveling Salesman Problem. Instead of one salesman, it involves multiple salesmen who start from a common depot, visit all cities exactly once, and return to the depot. The primary goal of m3-TSP is to minimize the length of the longest individual tour among all salesmen, aiming for better load balance and reduced worst-case latency in real-world applications like logistics and drone delivery.

The Challenge of m3-TSP

Solving m3-TSP is notoriously difficult because it belongs to a class of problems known as NP-hard. This means that finding an exact solution becomes computationally impractical as the number of cities grows. Traditional methods, such as mathematical programming or heuristic search, often require significant time to find even a feasible solution. In recent years, learning-based approaches, particularly those using reinforcement learning (RL), have emerged as promising alternatives for quickly generating high-quality approximate solutions.

Existing learning-based methods often fall into two categories: end-to-end training, where the model directly generates solutions, and two-stage methods, which decompose the problem. While end-to-end training aims for optimal solutions directly, it can be complex. Two-stage methods simplify the learning objective by breaking the problem into parts, like dividing cities among salesmen or generating a single path and then splitting it. However, this decomposition can sometimes lead to inconsistent optimization and potentially lower solution quality compared to end-to-end approaches.

Introducing the Generate-and-Split (GaS) Framework

To overcome these limitations, researchers Wen Wang, Xiangchen Wu, Liang Wang, Hao Hu, Xianping Tao, and Linghao Zhang from Nanjing University and State Grid Sichuan Electric Power Research Institute have proposed a novel two-stage framework called Generate-and-Split (GaS). This framework intelligently combines reinforcement learning with an optimal splitting algorithm within a joint training process. The core idea is to simplify the learning task for the RL component while ensuring the splitting part is handled optimally and efficiently.

The GaS framework operates in two main steps. First, an RL-based path generator creates a single sequence of all cities to be visited. Unlike methods that assign cities to salesmen directly, this generator focuses on finding an efficient overall path. Second, an optimal splitting algorithm takes this generated path and divides it into individual tours for each salesman. The objective of this splitting is to minimize the maximum tour length, aligning directly with the m3-TSP goal.

How GaS Works: The Path Generator

The path generator is designed using reinforcement learning, specifically modeled as a Partially Observable Markov Decision Process (POMDP). This formulation acknowledges that the generator doesn’t have complete information about how the path will eventually be split among salesmen while it’s making decisions. To address this “partial observability,” the framework incorporates an LSTM (Long Short-Term Memory) enhanced decoder.

The encoder part of the generator processes the city coordinates and the ratio of cities to salesmen, creating embeddings that capture the problem’s structure. The decoder then uses these embeddings, along with a hidden state maintained by the LSTM, to sequentially select the next unvisited city. The LSTM helps the model remember past decisions and infer the true state of the problem, even with incomplete information. The reward for the RL agent is the negative of the maximum tour cost calculated by the splitting algorithm at the very end, encouraging the generator to produce paths that lead to balanced tours.

The Optimal Splitting Algorithm

A crucial component of GaS is its optimal splitting algorithm. After the path generator creates a sequence of all cities, this algorithm efficiently determines the best way to divide that sequence into `m` tours for the salesmen. A naive approach of trying every possible split point would be incredibly slow, especially with many salesmen. The GaS framework, however, introduces a highly efficient, deterministic algorithm that can approximate the optimal splitting strategy with near-linear scalability.

This algorithm works by converting the splitting problem into a decision problem: “Can the path be split among `m` salesmen such that no tour exceeds a certain cost threshold?” This decision is quickly answered using a greedy check. By repeatedly applying this greedy check within a binary search framework, the algorithm rapidly finds the minimum possible maximum tour cost for any given path. This efficient and accurate splitting mechanism provides a strong, consistent reward signal for training the RL-based path generator.

Also Read:

Performance and Transferability

Extensive experiments have shown that the GaS framework significantly outperforms existing learning-based approaches, such as Equity-Transformer (Eqt) and Dpn, in terms of solution quality. For instances with 50 and 100 cities, GaS achieved better results in most test configurations, demonstrating a statistically significant improvement. While learning-based methods still have some ground to cover compared to highly optimized classical heuristics like LKH3 for specific scenarios (e.g., very few salesmen), GaS shows a clear advantage in inference efficiency, solving problems in under one second.

Furthermore, GaS exhibits excellent transferability. When tested on problems with different city distributions (Rotation, Gaussian, Explosion), the trained GaS model performed as well as or better than baselines in over 90% of cases. It also demonstrated better performance when fine-tuned and applied to larger-scale problems (200 and 500 cities), especially when the number of salesmen varied significantly from the training settings. This indicates that GaS can adapt well to new and larger problem instances, a critical feature for real-world deployment.

In conclusion, the Generate-and-Split framework offers a powerful and efficient solution to the Min-Max Multiple Traveling Salesmen Problem. By decoupling path generation from splitting and integrating an LSTM-enhanced RL model with an optimal splitting algorithm, it achieves superior solution quality and impressive transferability. This work highlights a promising direction for combining algorithmic design with reinforcement learning to tackle complex combinatorial optimization problems. For more details, you can refer to the full research paper available here.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -