TLDR: A new deep reinforcement learning framework, UD3RL, has been developed to solve the Close-Enough Traveling Salesman Problem (CETSP). Unlike traditional methods, UD3RL uses a dual-decoder architecture to simultaneously decide which node to visit next and the precise waypoint within its neighborhood. It demonstrates superior solution quality, faster execution times, and strong generalization across various problem sizes, spatial distributions, and dynamic environments, making it highly effective for complex routing challenges like drone deliveries and robotic inspections.
The classic Traveling Salesman Problem (TSP) challenges us to find the shortest route that visits a set of cities exactly once and returns to the starting point. It’s a fundamental problem in computer science and logistics, but real-world scenarios often present a more flexible variant: the Close-Enough Traveling Salesman Problem (CETSP). Instead of visiting a precise location, an agent only needs to pass through a designated ‘neighborhood’ around each target. Imagine a drone inspecting solar panels or an autonomous robot reading RFID tags – they don’t need to hit an exact point, just get close enough to perform their task.
While this ‘close-enough’ requirement makes the problem more practical, it also significantly increases its complexity. CETSP combines two tough challenges: deciding the sequence of neighborhoods to visit (a discrete choice) and then picking the exact point within each neighborhood to pass through (a continuous optimization). This hybrid nature has made it notoriously difficult for traditional methods and even many modern deep reinforcement learning (DRL) approaches.
A recent research paper, A Unified Deep Reinforcement Learning Approach for Close Enough Traveling Salesman Problem, by Mingfeng Fan, Jiaqi Cheng, Yaoxin Wu, Yifeng Zhang, Yibin Yang, Guohua Wu, and Guillaume Sartoretti, introduces a groundbreaking solution called UD3RL (Unified Dual-Decoder Deep Reinforcement Learning). This new framework is designed to tackle the CETSP head-on, offering both high-quality solutions and remarkable efficiency.
How UD3RL Works
The core idea behind UD3RL is to break down the complex CETSP into two manageable, sequential decisions. First, the continuous neighborhoods are transformed into a finite set of candidate waypoints using a technique called Perimetral Discretization Scheme (PDS). This involves placing a fixed number of points uniformly along the circumference of each circular neighborhood, as optimal paths tend to touch the boundaries.
Once discretized, UD3RL employs a clever ‘dual-decoder’ architecture:
- Encoder: This component acts like the brain, processing all the input information – the locations of the depot, target points, and the radii of their neighborhoods – to extract meaningful features.
- Node-Decoder: After the encoder, this part of the system decides which target neighborhood to visit next in the sequence.
- Loc-Decoder: Once a neighborhood is chosen, the loc-decoder takes over to determine the precise ‘waypoint’ within that neighborhood’s discretized points that the agent should pass through. To make this decision even smarter, it incorporates a ‘k-nearest neighbors subgraph interaction strategy,’ which helps the system understand the local spatial context and choose the best waypoint.
This dual-decoder approach allows UD3RL to make decisions sequentially, first selecting the general area (node) and then the specific point (waypoint). The entire system is trained using a customized REINFORCE algorithm, enabling it to learn and generalize across different problem sizes and varying neighborhood radius types (constant or random).
Impressive Results and Generalization
The researchers conducted extensive experiments, comparing UD3RL against several conventional and hybrid methods. The results were compelling:
- Superior Solution Quality: UD3RL consistently outperformed all baselines in finding shorter routes, especially when combined with instance augmentation (UD3RL-Aug), which further refines solutions.
- Exceptional Efficiency: One of UD3RL’s most striking advantages is its speed. It solved test cases in seconds, a stark contrast to the minutes or even hours required by traditional heuristics. This efficiency is crucial for real-world applications.
- Strong Generalization: A key strength of UD3RL is its ability to generalize. It performed exceptionally well on larger problem sizes (up to 500 nodes) than it was trained on, different spatial distributions of nodes (clustered or mixed), and various radius ranges. This means a single trained model can handle a wide array of CETSP instances without needing retraining.
- Dynamic Environments: The efficiency of UD3RL makes it highly suitable for dynamic scenarios where new requests appear during a tour. The model can quickly re-encode remaining nodes and new requests, then generate an updated, optimized route in real-time.
- Benchmark Performance: On established benchmark instances, UD3RL also showed strong performance, achieving gaps generally below 7% relative to the best-known solutions, and even lower with augmentation.
An ablation study confirmed that both the k-NN subgraph interaction strategy and the adapted Transformer encoder are vital components, significantly contributing to UD3RL’s superior performance without adding noticeable computational overhead.
Also Read:
- Optimizing Container Stowage: A Deep Dive into Reinforcement Learning Benchmarks
- Smart Manufacturing: Automating Job Scheduling with AI and Specialized Languages
Conclusion
UD3RL represents a significant advancement in solving the Close-Enough Traveling Salesman Problem. By intelligently decoupling node selection and waypoint determination, and leveraging a robust deep reinforcement learning framework, it delivers high-quality solutions with unprecedented speed and adaptability. This research paves the way for more efficient and intelligent routing solutions in diverse applications, from drone deliveries to robotic inspections, even in complex and dynamic environments. Future work aims to extend UD3RL to even larger problem instances and explore cooperative routing for multiple agents.


