spot_img
HomeResearch & DevelopmentOrbis: Advancing Long-Term Driving Scene Prediction for Autonomous Vehicles

Orbis: Advancing Long-Term Driving Scene Prediction for Autonomous Vehicles

TLDR: Orbis is a new driving world model that significantly improves long-horizon video prediction for autonomous driving, especially in complex scenarios like turns. It introduces a hybrid tokenizer to compare continuous (flow matching) and discrete modeling, finding continuous models to be superior. Orbis achieves state-of-the-art performance with fewer parameters (469M) and less training data (280 hours) than previous models, learning directly from raw video without extra sensors or supervision.

Autonomous driving systems rely heavily on their ability to predict future scenarios accurately, especially over long periods. This capacity, often referred to as “imagination” in the context of AI, is crucial for safe and effective navigation. However, existing world models for autonomous driving have struggled with generating realistic and consistent predictions for extended durations, particularly in challenging situations like turning maneuvers or dense urban traffic.

A new research paper introduces Orbis, a novel driving world model designed to overcome these limitations. Developed by researchers at the University of Freiburg, Orbis demonstrates state-of-the-art performance in long-horizon prediction, even with a relatively compact model size and less training data compared to its predecessors. The full research paper can be found here: Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models.

Addressing Key Challenges

Previous driving world models, often built on video diffusion techniques, tend to perform well only for a few frames. When faced with complex actions like turns, they frequently produce blurred content or unrealistic vehicle behaviors, such as stopping prematurely or drifting off course. This indicates a fundamental limitation in how these models capture state transitions—a core function of a world model.

Orbis tackles this by focusing on simple design choices and training exclusively on raw video data, without relying on additional supervision or sensors like maps, depth information, or multiple cameras. This approach makes the model more scalable and adaptable to new environments.

Continuous vs. Discrete Modeling

A central question in developing world models is whether they should process information in a continuous space (like diffusion models) or predict discrete tokens (similar to large language models). To answer this, the Orbis team developed a unique hybrid tokenizer compatible with both approaches, allowing for a direct, side-by-side comparison.

Their study concluded that the continuous autoregressive model, based on flow matching, significantly outperforms its discrete token counterpart. The continuous model proved to be more robust to individual design choices and more powerful, especially for long-term generation. While discrete models can achieve long rollouts, they often suffer from issues like content copying (where the model repeatedly generates the same token as the last context frame) and flickering artifacts, limiting their expressiveness for subtle motions crucial in driving scenarios.

Performance and Efficiency

Orbis stands out not only in its predictive capabilities but also in its efficiency. With only 469 million parameters and trained on just 280 hours of video data, it achieves superior performance on benchmarks like the NuPlan and Waymo datasets. This is significantly less than many existing models, which often require more extensive training data and larger architectures.

The model particularly excels in difficult scenarios, maintaining stable and realistic predictions for up to 20 seconds, far surpassing other models that degrade quickly over longer horizons. Orbis also demonstrates the ability to be easily modified for ego-motion control, allowing for conditioned generation based on steering angle and speed inputs, leading to more accurate trajectory tracking.

Also Read:

Future Directions and Limitations

While Orbis marks a significant step forward, the researchers acknowledge several limitations. The model currently struggles with reliably generating detailed content such as traffic lights and street signs, and the simulated traffic actors do not always adhere to traffic rules. Although it shows good diversity in multiple rollouts, the generated trajectories may not perfectly represent the true probability distribution of real-world driving behaviors.

Future work will involve scaling the model further by increasing parameters, training data volume, image resolution, and context length. The team also plans to investigate why other larger public video diffusion models fail on long rollouts, potentially due to biases inherited from pre-trained models. Ultimately, Orbis contributes to building more reliable and cost-efficient autonomous driving systems, paving the way for more capable interactive robotics.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -