TLDR: The M-Diffusion Planner is a novel AI framework that enables autonomous vehicles to generate diverse driving behaviors based on human preferences. It uses a multi-head diffusion model fine-tuned with Group Relative Policy Optimization (GRPO) to create distinct driving styles (e.g., aggressive, conservative, comfortable). An integrated Large Language Model (LLM) interprets natural language commands, allowing real-time strategy switching without retraining. This approach addresses the rigidity of current autonomous systems, achieving state-of-the-art performance while offering personalized and adaptable motion planning.
Autonomous driving technology has made incredible strides, with vehicles now capable of navigating complex environments and generating highly accurate paths. However, a common limitation has been their rigid, one-size-fits-all driving style. These systems often default to a single, dominant policy, leading to predictable but impersonal trajectories that don’t account for individual human preferences or dynamic, instruction-driven demands.
Imagine a future where your autonomous car understands if you’re in a hurry and want to drive more aggressively, or if you prefer a smooth, comfortable ride. A new research paper, titled “Drive As You Like: Strategy-Level Motion Planning Based on A Multi-Head Diffusion Model,” introduces a groundbreaking solution to this challenge. Authored by Fan Ding, Xuewen Luo, Hwa Hui Tew, Ruturaj Reddy, Junn Yong Loo, and Xikun Wang, this work proposes the M-Diffusion Planner, a novel framework designed to bring personalized, strategy-level motion planning to autonomous vehicles.
The Core Idea: Flexible Driving with AI
At its heart, the M-Diffusion Planner leverages a multi-head diffusion model. Diffusion models are powerful generative AI tools known for their ability to create diverse and realistic outputs. In this context, they are used to generate a variety of high-quality driving trajectories. What makes the M-Diffusion Planner unique is its ability to go beyond simply generating paths; it learns to generate paths that align with specific driving strategies.
During the initial training, the model learns to produce excellent trajectories. Then, a clever technique called Group Relative Policy Optimization (GRPO) is applied. This fine-tunes the model, allowing different “heads” (output layers) to specialize in distinct driving styles – think aggressive, conservative, or comfortable – without compromising the vehicle’s fundamental planning capabilities. This means the car can learn to drive in multiple ways, adapting to your mood or the situation.
Bridging Human Intent and Machine Action
One of the most exciting aspects of this research is how it integrates human preferences. The M-Diffusion Planner incorporates a large language model (LLM) as an interpreter. This LLM acts as a bridge, translating natural language commands from a human user (like “please hurry up” or “drive carefully”) into specific planning strategy identifiers. This allows for dynamic, instruction-aware planning without needing to switch or retrain the entire model.
Unlike previous approaches that might require step-by-step, action-level user intervention, this system operates at a higher, strategy level. This means you can give a general command, and the car will adjust its overall driving behavior accordingly, making the interaction much more intuitive and aligned with real-world driving preferences.
Real-World Performance and Diverse Behaviors
The researchers put the M-Diffusion Planner through rigorous testing using closed-loop simulations on the nuPlan val14 benchmark, a standard for evaluating autonomous driving systems. The results were impressive: the base M-Diffusion Planner achieved state-of-the-art performance. Crucially, even after fine-tuning for different policy strategies (aggressive, conservative, comfortable), the model maintained high performance scores, demonstrating that GRPO effectively preserves core planning abilities while enabling diverse behaviors.
Open-loop evaluations further highlighted the model’s success in generating distinct driving styles. For instance, trajectories generated under the “aggressive” strategy showed higher speeds, acceleration, and jerk, while the “conservative” strategy resulted in reduced speed and smoother dynamics. The “comfortable” strategy produced trajectories with the lowest jerk values, prioritizing a smooth ride.
A qualitative case study illustrated these differences vividly. In a highway lane-changing scenario, the baseline strategy executed a balanced lane change. The aggressive strategy prompted an earlier, sharper lane change to overtake, while the conservative strategy chose to maintain the current lane for safety. These behaviors were triggered by simple natural language commands, showcasing the system’s ability to generate diverse planning behaviors aligned with user intent in a zero-shot manner.
Also Read:
- HDSim: Crafting Human-like Traffic for Advanced Self-Driving Tests
- Enhancing Urban Mobility Simulations with AI: The Preference Chain Approach
A Flexible Future for Autonomous Driving
The M-Diffusion Planner represents a significant step towards more flexible and human-centric autonomous driving. By combining the generative power of diffusion models with strategy-aware fine-tuning and an LLM interpreter, it allows autonomous vehicles to adapt to various driving styles and user instructions in real-time, without requiring model retraining or reloading. This adaptability and efficiency pave the way for a future where autonomous cars truly “drive as you like.”
You can read the full research paper for more technical details here: Drive As You Like: Strategy-Level Motion Planning Based on A Multi-Head Diffusion Model.


