TLDR: A new benchmark, EgoTraj-Bench, has been introduced to evaluate trajectory prediction models under realistic, noisy first-person (ego-view) observations from robots, addressing limitations of existing idealized bird’s-eye view datasets. The benchmark pairs noisy ego-view histories with clean future trajectories. Alongside, a novel dual-stream flow matching model called BiFlow is proposed. BiFlow simultaneously denoises historical observations and predicts future motion, incorporating an EgoAnchor mechanism to distill intent priors. Experiments show BiFlow achieves state-of-the-art performance, significantly improving robustness and accuracy (10-15% reduction in minADE and minFDE) under real-world ego-view noise, highlighting the need for noise-aware modeling in autonomous systems.
Autonomous systems like mobile robots and self-driving cars rely heavily on accurately predicting the future movements of pedestrians and other agents in their environment. This is known as trajectory prediction. However, most existing methods for this task operate under idealized conditions, assuming perfect, clear observations from a bird’s-eye view (BEV).
In reality, robots perceive the world through first-person cameras, which introduce significant challenges. These ‘ego-view’ observations are often noisy and incomplete due to factors like occlusions (when one person blocks another), identity switches (when the tracking system confuses two individuals), tracking drift, and perspective distortion. These real-world imperfections severely limit the robustness of current trajectory prediction models.
To address this critical gap, researchers have introduced EgoTraj-Bench, the first real-world benchmark specifically designed for trajectory prediction under these noisy, first-person visual conditions. This benchmark is unique because it grounds these imperfect, ego-centric visual histories in clean, human-verified future trajectories observed from a bird’s-eye view. This allows for robust learning and evaluation under realistic perceptual constraints.
EgoTraj-Bench was constructed using the TBD dataset, which provides synchronized bird’s-eye view and ego-view videos. The team extracted noisy historical trajectories from the real ego-view videos, capturing authentic imperfections. These noisy trajectories were then projected into world coordinates and carefully paired with corresponding clean, human-verified future trajectories from the BEV view. This meticulous process ensures that the benchmark reflects real-world challenges while providing accurate ground truth for supervision.
Initial evaluations using EgoTraj-Bench revealed a significant finding: state-of-the-art BEV-based trajectory prediction models suffer substantial performance degradation when faced with ego-view perception noise. This underscores the urgent need for new frameworks that can handle these realistic challenges.
Also Read:
- BridgeDrive: A Principled Advance in Autonomous Driving Trajectory Planning
- Accelerating Trajectory Prediction with Collaborative AI Distillation
Introducing BiFlow: A Robust Solution
To tackle the problem highlighted by their benchmark, the researchers also propose BiFlow, a novel dual-stream flow matching model. BiFlow is designed to concurrently denoise historical observations and forecast future motion by leveraging a shared latent representation. This means it learns to clean up the messy past data while simultaneously predicting what will happen next.
A key innovation within BiFlow is the EgoAnchor mechanism. This mechanism helps the model better understand the intent of agents by conditioning the prediction decoder on ‘distilled’ historical features. Essentially, it extracts compact, intent-aware representations from the agent’s and scene’s past, providing a robust prior to stabilize predictions even when the input is partial or corrupted.
Extensive experiments demonstrate that BiFlow achieves state-of-the-art performance. It significantly reduces common error metrics (minADE and minFDE) by 10–15% on average compared to existing methods, showcasing superior robustness in noisy environments. The model’s ability to jointly learn reconstruction and prediction, along with its EgoAnchor mechanism, proves highly effective in mitigating the impact of real-world ego-view perturbations.
The introduction of EgoTraj-Bench and BiFlow marks a significant step forward in developing trajectory forecasting systems that are truly resilient to the complexities of real-world, ego-centric perception. This work provides a critical foundation for future research aimed at making autonomous systems safer and more reliable in human-centric environments.


