TLDR: A new unsupervised, non-autoregressive framework for the Traveling Salesman Problem (TSP) learns solutions directly from permutations without explicit search. By transforming Hamiltonian cycles, the model approximates permutation matrices using continuous relaxations. This method, especially with an ensemble approach, achieves competitive performance against classical heuristics, demonstrating that inherent problem structure can guide combinatorial optimization effectively without sequential decision-making or ground-truth supervision.
The Traveling Salesman Problem (TSP) is a classic challenge in computer science, asking for the shortest possible route that visits a set of cities exactly once and returns to the starting city. Traditionally, solving TSP has relied heavily on various search methods, from simple greedy approaches to complex local improvement algorithms and exact solvers like Concorde, which can be computationally very expensive for larger problems.
Recent advancements in artificial intelligence have explored using neural networks to tackle such combinatorial optimization problems. These often fall into categories like reinforcement learning (RL), where models learn to build tours step-by-step, or supervised learning (SL), which requires pre-computed optimal solutions for training. However, RL can struggle with sparse rewards, and SL demands expensive ground-truth data. Crucially, both still typically involve some form of search.
A new research paper, titled “Structure As Search: Unsupervised Permutation Learning for Combinatorial Optimization,” introduces a fundamentally different approach. Instead of relying on explicit search or supervision, this framework allows solutions to the TSP to emerge directly from learned permutations. The core idea is that the inherent structure of the problem itself can guide the optimization process.
The researchers propose a non-autoregressive method, meaning it doesn’t build the tour sequentially. They transform Hamiltonian cycles (a type of path that visits every city exactly once) using a similarity transformation. The model then learns to approximate permutation matrices, which represent the order of cities in a tour, through continuous relaxations. This allows for gradient-based optimization, a common technique in neural networks.
At a high level, the model uses a Graph Neural Network (GNN) to generate a ‘soft’ permutation matrix. During the training phase, a technique called Gumbel-Sinkhorn relaxation helps approximate this matrix in a way that allows for learning. When it’s time to find the actual tour, a ‘hard’ permutation matrix is extracted using the Hungarian algorithm, which then directly forms a valid Hamiltonian cycle. This ensures that the generated solution is always a feasible TSP tour, regardless of the quality of the learned permutation.
The unsupervised nature of this method is a significant advantage, as it doesn’t require any optimal training data, which is often difficult and expensive to obtain for NP-hard problems like TSP. It also avoids the complexities of sequential decision-making or explicit search procedures.
Experiments were conducted on TSP instances with 20, 50, and 100 cities. The results show that this unsupervised approach consistently outperforms classical baselines like the Greedy Nearest Neighbor algorithm. For example, on 100-node instances, the new method achieved significantly shorter tour lengths and smaller optimality gaps compared to Greedy NN and even some beam search variants.
To further enhance performance and address instances where the model might yield suboptimal solutions, the researchers introduced an ensemble approach. This involves training multiple separate models, each biased towards a different valid Hamiltonian cycle structure. At inference time, the best solution from this ensemble of models is selected for each specific TSP instance. This strategy effectively leverages diverse structural priors, leading to more robust and consistently better solutions, especially for larger problems.
The Hamiltonian cycle ensemble approach achieved impressive optimality gaps of 3.52% on TSP-20, 4.41% on TSP-50, and 7.10% on TSP-100, demonstrating competitive performance with other learning-based methods and even classical heuristics like farthest insertion. While it doesn’t yet match the very best reinforcement learning models that incorporate post-hoc local search, its ability to achieve strong results without supervision or explicit search is a notable achievement.
Also Read:
- Large Language Models Reshape Combinatorial Optimization: A Comprehensive Review
- Computational Language Processing: A New Frontier in Algorithm Discovery
This research suggests that integrating problem structure as an inductive bias into the learning process offers a promising alternative to traditional search-based methods in combinatorial optimization. For more details, you can read the full paper here.


