TLDR: The Aria Gen 2 Pilot Dataset (A2PD) is a new, open, egocentric multimodal dataset captured with Meta’s advanced Aria Gen 2 glasses. It features comprehensive raw sensor data and machine perception outputs from daily activities of a primary subject and friends across five scenarios. A2PD showcases the enhanced capabilities of Aria Gen 2, including improved cameras, microphones, and physiological sensors, providing rich data for advancing research in machine perception, contextual AI, and robotics. It includes on-device and offline perception results like VIO, eye tracking, hand tracking, ASR, heart rate estimation, hand-object interaction, 3D object detection, and depth estimation, supported by open-source tools for researchers.
Meta’s Project Aria has been a cornerstone for researchers delving into machine perception, contextual AI, and robotics. Building on the success of the first generation, which saw widespread adoption globally, Project Aria has now unveiled a significant leap forward with the Aria Gen 2 glasses and its accompanying dataset, the Aria Gen 2 Pilot Dataset (A2PD).
The Aria Gen 2 Pilot Dataset is an open, egocentric multimodal dataset captured using the advanced Aria Gen 2 glasses. This dataset is being released incrementally, with ongoing enhancements to provide timely access to the research community. The initial release centers around Dia’ane, the primary subject, who recorded her daily life alongside friends, all equipped with Aria Gen 2 glasses. The dataset covers five main scenarios: cleaning, cooking, eating, playing, and outdoor walking.
What makes A2PD particularly valuable is its comprehensive nature. For each scenario, it provides extensive raw sensor data and output from various machine perception algorithms. This rich data illustrates the device’s capability to understand the wearer, the surrounding environment, and the interactions between them, all while maintaining robust performance across different users and conditions. The A2PD is freely available at projectaria.com, complemented by open-source tools and usage examples provided in Project Aria Tools.
The Aria Gen 2 glasses themselves represent a substantial technological upgrade from their predecessor. They feature an enhanced sensor suite, including four computer vision cameras (doubling the Gen 1’s two) with a wider field-of-view, improved RGB camera resolution, integrated contact and spatial microphones, and a photoplethysmography (PPG) sensor for physiological monitoring like heart rate. Other advancements include ultra-low-power on-device machine perception, integrated speakers for real-time interaction, and Sub-GHz radio technology for precise device time alignment.
The A2PD showcases these advanced capabilities, drawing inspiration from successful Aria Gen 1 datasets. It offers researchers a tangible resource to deeply understand the new hardware, the resulting improvements in data quality, temporal precision, contextual richness, and the types of algorithms that can be run on this data.
The dataset captures a weekend of activities involving four participants – Dia’ane and three co-participants. The recordings document everyday scenarios such as cleaning, meal preparation, group lunch, playing “Simon Says,” and an outdoor walk. In total, it comprises five distinct scenarios and twelve sequences, each lasting approximately five minutes. These sequences are rich with diverse behaviors, longitudinal context, complex hand-object interactions, frequent social interactions, varied conversations, eye movement patterns, human movement dynamics, and exposure to different lighting conditions.
A2PD is structured around four primary data modalities:
Raw Sensor Streams
These are directly acquired from the Aria Gen 2 devices. They include high-fidelity, time-synchronized data suitable for multimodal learning and sensor fusion. Visual data consists of RGB video (10 fps, 2560×1920), four computer vision video streams (30 fps, 512×512), and binocular eye-tracking imagery (5 fps, 200×200 per eye). Motion and environmental data include dual IMU signals (800 Hz), magnetometer readings (100 Hz), barometric pressure (50 Hz), GPS (1 Hz), ambient temperature (1 Hz), and ambient light sensor measurements. Audio and physiological data comprise eight-channel spatial audio (including contact microphones) and PPG signals (128 Hz). Connectivity data includes Bluetooth and Wi-Fi traces. For multi-participant scenes, a sub-GHz radio ensures sub-millisecond time alignment across all devices.
Real-time Machine Perception Outputs
These are generated on-device by embedded algorithms during data collection, running on Meta’s energy-efficient custom coprocessor. This includes Visual Inertial Odometry (VIO) for robust 6DOF tracking, eye tracking for gaze origin, direction, pupil position and diameter, and blinking, and hand tracking for 3D wrist position, rotation, and 21 finger joint landmarks for each hand.
Offline Machine Perception Services (MPS) Results
All recordings are processed offline by Meta’s Machine Perception Services. This includes MPS SLAM for accurate 6DOF poses, semi-dense point clouds, and online calibration. MPS Hand Tracking, an offline algorithm, provides higher precision and recall for 3D hand poses compared to the on-device version.
Also Read:
- Understanding Pedestrians Near Vehicles: Introducing the Valeo Near-Field Dataset
- SPLite Hand: Boosting 3D Hand Pose Estimation for AR/VR on Edge Devices
Outputs from Additional Offline Perception Algorithms
Beyond SLAM, hand tracking, and eye tracking, a suite of additional algorithms is applied. These include directional Automatic Speech Recognition (ASR) to distinguish between the wearer’s voice and others, heart rate estimation using PPG sensors, hand-object interaction recognition (segmenting hands and interacted objects), 3D object detection via Egocentric Voxel Lifting (EVL) for indoor scenes, and depth estimation using Foundation Stereo, which generates reliable depth maps from the CV cameras.
To support researchers, a toolkit is provided for easy download, loading, and visualization of the dataset. This includes a JSON file with download links, a Python-based data loader, and Jupyter notebooks demonstrating usage. Two visualizers are also available: one for raw sensor and on-device perception data, and another for MPS and offline perception algorithm results.
The Aria Gen 2 Pilot Dataset is a significant contribution to the field of egocentric AI research, offering an unprecedented level of detail and multimodal data. Researchers can access the dataset and tools at projectaria.com, with future releases planned to append more data and perception algorithms, potentially including full-body human motion generation and activity recognition.


