TLDR: TRACE is a novel framework that learns 3D scene geometry, appearance, and physical dynamics directly from multi-view videos without human labels. It models each 3D point as a rigid particle and learns its translation-rotation dynamics, enabling accurate future motion prediction and automatic object segmentation. The method significantly outperforms existing techniques in future frame extrapolation and demonstrates strong continual learning capabilities.
Predicting how objects will move in a 3D environment, especially from just video footage, has long been a significant challenge in fields like robotics and mixed reality. Existing methods often struggle to accurately forecast future motion because they don’t fully grasp the underlying physics, or they require extensive human labeling of objects and their properties.
A new research paper introduces a groundbreaking framework called TRACE, which stands for “Learning 3D Gaussian Physical Dynamics from Multi-view Videos.” This innovative approach aims to model a 3D scene’s geometry, appearance, and crucial physical information directly from dynamic multi-view videos, without the need for any additional human labels.
Understanding the Core Innovation
The key novelty of TRACE lies in its unique way of understanding motion. Instead of just observing and interpolating movements, TRACE treats each 3D point in a scene as a rigid particle, complete with its own size and orientation. For each of these particles, the system directly learns a “translation rotation dynamics system.” This means it explicitly estimates a full set of physical parameters that govern how each particle moves over time, including its velocity and acceleration.
This is a significant departure from previous methods, which often rely on “physics-informed neural networks” (PINNs) that integrate complex equations into their learning process, or on specific physics models that are limited to certain object types. TRACE, by directly learning the physical parameters for individual particles, offers a more general and effective way to capture complex motion physics.
The framework naturally integrates with 3D Gaussian Splatting (3DGS), a recent and powerful 3D representation technique known for its high-fidelity reconstruction and real-time rendering capabilities. The particle-based nature of 3DGS aligns perfectly with TRACE’s concept of rigid particles, making it an ideal backbone for modeling both scene appearance and dynamics.
How TRACE Works
TRACE operates with two main components. First, a 3D scene representation module, based on 3DGS, learns the geometry and appearance of the scene at a foundational moment. Second, the core translation rotation dynamics system module, implemented using simple neural networks (MLPs), learns the physical parameters for each 3D rigid particle. These parameters then allow the system to derive the particle’s velocity based on classical mechanics, without needing extra physics constraints during training.
An auxiliary deformation field is also used in parallel to help stabilize the learning process, especially in the early stages of training when the Gaussian kernels might be less accurate. This combined approach allows TRACE to truly learn physical parameters and predict future frames, a task where many existing methods fall short.
Also Read:
- GRASPTrack: Enhancing Multi-Object Tracking with 3D Geometric Understanding
- Spatial Traces: Unlocking Deeper Robot Understanding of Movement and Environment
Impressive Results and Future Implications
Extensive experiments were conducted on three existing dynamic datasets and a newly created, challenging synthetic dataset called “Dynamic Multipart.” The results demonstrate TRACE’s “extraordinary performance” in future frame extrapolation, consistently outperforming baselines by a significant margin. This highlights the critical value of explicitly learning physical information for accurate future prediction.
A particularly interesting property of TRACE is its ability to easily segment multiple objects or parts within a scene. By simply clustering the learned physical parameters, the system can automatically identify distinct moving entities, a feature that is difficult for prior works to achieve without additional methods.
Furthermore, TRACE shows strong capabilities in continual learning, meaning it can adapt to new observations and rapidly changing dynamics over time. This makes it a promising technology for applications in highly dynamic environments, such as those encountered in robot perception and manipulation. For more technical details, you can refer to the full research paper available at https://arxiv.org/pdf/2508.09811.


