TLDR: A new autonomous driving framework, the Uncertainty-Weighted Decision Transformer (UWDT), improves navigation in complex traffic like roundabouts. It uses a ‘teacher’ model to identify uncertain, safety-critical situations and then weights the ‘student’ model’s learning to focus more on these high-impact scenarios. This approach significantly reduces collisions and enhances driving efficiency and stability compared to other methods.
Autonomous driving systems face significant challenges, particularly in dense and dynamic environments like busy intersections and roundabouts. These scenarios demand sophisticated decision-making that can understand both the immediate surroundings and predict future events over a longer time horizon, all while being robust to uncertainties inherent in real-world traffic.
Traditional approaches to autonomous driving decision-making often fall short. Modular rule-based systems require extensive manual design, while imitation learning struggles with situations not seen in training data. Search-based methods can be computationally expensive, and standard reinforcement learning often involves unsafe exploration or is limited to simpler scenarios.
A promising technique called Decision Transformers (DTs) has emerged from offline reinforcement learning. DTs reframe decision-making as a sequence modeling problem, using transformer architectures to capture long-term dependencies without needing risky online exploration. However, standard DTs can sometimes struggle with rare but critical safety situations, tending to overfit to more common, low-risk driving patterns.
Addressing this crucial limitation, researchers have introduced the Uncertainty-Weighted Decision Transformer (UWDT). This novel framework integrates multi-channel bird’s-eye-view occupancy grids, which provide a rich spatial understanding of the environment, with transformer-based sequence modeling. The key innovation lies in its uncertainty-aware training mechanism.
The UWDT operates in a clever three-stage process. First, a ‘teacher’ transformer model is trained on offline driving data and then its parameters are frozen. This teacher model is then used to estimate the predictive uncertainty for each decision point, specifically by calculating the entropy of its action distribution. Higher entropy indicates greater uncertainty in the teacher’s prediction. Finally, a ‘student’ transformer model is trained, but its learning process is weighted by these uncertainty estimates. This means that the student model learns more intensely from situations where the teacher was less certain, effectively amplifying the learning signal from rare, high-impact, and safety-critical states. This approach enhances robustness without requiring any changes to the model’s architecture.
The effectiveness of UWDT was rigorously tested in a high-fidelity roundabout simulator, simulating various traffic densities. The ego vehicle’s mission was to navigate a four-arm, two-lane roundabout safely and efficiently, dealing with circulating, interacting, and exiting traffic. The system uses a compact representation of the environment through occupancy grids, capturing spatial layout and dynamic context. The vehicle’s actions are high-level commands like lane changes, acceleration, deceleration, and cruising, which are then translated into continuous control signals.
The results were compelling. UWDT consistently outperformed other baseline methods, including Conservative Q-Learning (CQL), Soft Actor-Critic (SAC), standard Decision Transformer (DT), and a Transformer-based Behavior Cloning (BC Transformer) model. UWDT achieved the highest average reward, fastest average speed, greatest travel distance, and a near-perfect exit rate of 98.75%. Crucially, it also recorded the lowest collision rate at just 1.25%. While other methods like SAC were sometimes overly cautious, and standard DT showed good performance but lacked explicit uncertainty handling, UWDT demonstrated superior balance between efficiency and safety.
The research highlights that by explicitly incorporating epistemic uncertainty into the decision-making process, UWDT can choose high-reward trajectories when confident and adopt safer maneuvers when predictions are less certain. This makes it particularly effective in scenarios where critical situations are rare in the training data but have significant consequences during real-world deployment.
Also Read:
- Advancing Autonomous Driving with Large Foundation Models for Trajectory Prediction
- Proactive Beam Selection in Connected Vehicles Using AI
This work represents a significant step forward for autonomous navigation, offering a promising approach for safety-critical driving applications by delivering safer and more efficient decision-making in complex traffic environments. You can read the full research paper here.


