TLDR: Group Inertial Poser (GIP) is a novel method for tracking the full-body poses and global movements of multiple people using sparse wearable inertial sensors (IMUs) combined with Ultra-Wideband (UWB) ranging. It overcomes limitations of previous methods by leveraging inter-sensor distances to reduce drift and accurately estimate relative and global translations, even without calibrated starting positions. The method uses a structured state-space model for individual pose estimation, followed by two optimization steps for initial position alignment and trajectory refinement. GIP introduces a new dataset, GIP-DB, for multi-person IMU+UWB tracking and demonstrates superior accuracy and robustness compared to existing approaches.
Capturing the intricate movements of multiple people in a shared space has long been a challenging goal in technology. Traditional methods, like those relying on cameras, often struggle with issues such as objects blocking the view or the need for extensive equipment setups. While wearable sensors, specifically Inertial Measurement Units (IMUs), offer a promising alternative by being independent of environmental factors, they typically face their own set of problems, particularly with accurately tracking overall movement and the precise relative positions between individuals.
A new research paper introduces a groundbreaking solution called Group Inertial Poser (GIP), which aims to overcome these limitations. GIP combines sparse IMU sensors with Ultra-Wideband (UWB) ranging technology to accurately estimate the full-body poses and global translation (overall movement through space) for multiple people simultaneously. This innovative approach leverages the distances measured between sensors, both on the same person and across different individuals, to stabilize and enhance motion tracking.
How Group Inertial Poser Works
GIP’s method is built on a three-step pipeline. First, it performs individual pose estimation. Each person wears a sparse set of six sensors (on the head, pelvis, wrists, and knees), each containing an IMU and a UWB sensor. A specialized learning model, based on structured state-space models (SSMs), processes the acceleration, orientation, and same-person UWB distance data from these sensors to predict each individual’s full-body pose and their movement relative to their own starting point.
The second step is ‘Initial Position Optimization’. Since the individual pose estimations are initially separate, GIP needs to align everyone into a shared world frame. It does this by optimizing the initial relative positions between individuals, using the UWB distances measured between sensors on different people. This step is crucial because it eliminates the need for users to start at specific, calibrated positions.
Finally, ‘Trajectory Optimization’ refines the overall movements of all individuals. This step further integrates the between-person UWB distance constraints, ensuring that the predicted trajectories are consistent with the actual distances observed between people. It also incorporates regularization terms to promote smooth and physically realistic motion, preventing erratic or unstable movements.
The GIP-DB Dataset
To validate GIP in real-world scenarios, the researchers created GIP-DB, the first IMU+UWB dataset specifically designed for two-person tracking. This extensive dataset includes over 200 minutes of motion recordings from 14 participants engaging in diverse activities, from everyday movements like walking and jogging to interactive scenarios such as conversations and handshakes. Each participant was equipped with both commercial motion capture suits for ground truth data and the custom IMU+UWB sensors, providing a rich source of synchronized data for training and evaluation.
Key Achievements and Benefits
Evaluations on both synthetic and real-world data demonstrate that Group Inertial Poser significantly outperforms previous state-of-the-art methods like PIP and UIP. GIP shows marked improvements in accuracy and robustness across all pose and translation metrics. It effectively reduces the common problem of drift in translation estimates and provides more consistent relative translations between individuals over time. This capability is vital for understanding and reconstructing meaningful interaction dynamics between people.
A notable advantage of GIP is its ability to allow users to start at arbitrary locations, automatically determining their initial relative positions. The research also found that adding more people to the tracking process can further improve translation estimation for each individual, thanks to the increased spatial constraints provided by additional between-person distance measurements.
Also Read:
- Enhancing Autonomous System Safety Through Learning from Expert Behavior
- Shylock: Uncovering Causal Links in Time Series Data with Limited Information
Looking Ahead
While GIP represents a significant leap forward, the researchers acknowledge certain limitations, such as the impact of UWB noise, especially in crowded environments, and the current assumption of a mean body shape rather than individual variations. However, the work highlights the immense potential of combining IMU and UWB technologies for multi-person motion tracking, paving the way for new applications in fields ranging from virtual reality to human-robot interaction. For more details, you can read the full research paper here.


