TLDR: PhysDrive is a new large-scale, multimodal dataset for remote physiological monitoring of drivers. It includes synchronized data from RGB cameras, near-infrared cameras, and mmWave radar, along with six physiological ground truths (ECG, BVP, Respiration, HR, RR, and SpO2) collected from 48 drivers under diverse real-world driving conditions. The dataset aims to address the scarcity of comprehensive resources for in-vehicle physiological sensing, providing a benchmark for developing robust non-invasive driver monitoring systems.
Ensuring safety and enhancing user experience in modern vehicles increasingly relies on understanding a driver’s internal state. Traditional methods for monitoring physiological signals, such as attaching sensors directly to the body, can be intrusive and distracting. This is where remote physiological measurement (RPM) comes in, offering a promising non-invasive way to gather vital health information without physical contact.
However, the development of robust RPM systems for real-world driving scenarios has been significantly hampered by a lack of comprehensive datasets. Existing resources often fall short in terms of scale, the variety of sensing modalities, the range of biometric annotations, and the diversity of captured driving conditions. These limitations mean that current systems struggle to cope with the complex challenges inherent in actual driving environments.
Addressing this critical gap, researchers have introduced PhysDrive, the first large-scale, multimodal dataset specifically designed for contactless in-vehicle physiological sensing. PhysDrive meticulously considers various sensing modalities and crucial driving factors, providing an invaluable resource for advancing driver monitoring technology.
What is PhysDrive?
PhysDrive is a groundbreaking dataset that includes data from 48 drivers. It features synchronized data from three contactless sensing modalities: RGB cameras (standard color cameras), near-infrared (NIR) cameras, and raw millimeter-wave (mmWave) radar data. Alongside these, it provides six synchronized ground truths, which are highly accurate measurements from contact-based sensors: Electrocardiography (ECG), Blood Volume Pulse (BVP), Respiration, Heart Rate (HR), Respiration Rate (RR), and Blood Oxygen Saturation (SpO2).
One of PhysDrive’s most significant contributions is its coverage of a wide spectrum of naturalistic driving conditions. This includes various driver motions, dynamic natural light changes, different vehicle types, and diverse road conditions. These real-world factors are crucial for developing systems that can perform reliably outside of controlled laboratory settings.
The Importance of Multimodal Sensing
Contactless in-vehicle physiological monitoring primarily uses vision-based approaches (RGB and NIR cameras) and radio frequency (RF) sensing (mmWave radar). Each modality has its strengths and weaknesses. RGB cameras are cost-effective but sensitive to light variations. NIR cameras offer more stable imaging under dynamic lighting but can have lower signal quality for some measurements. Millimeter-wave radar is robust to illumination changes and enhances privacy by detecting minute chest displacements, but it can be influenced by vehicle vibrations and is generally more expensive.
The researchers behind PhysDrive recognize that a practical solution for diverse driving scenarios requires a collaborative analysis of all three modalities. By providing synchronized data from RGB, NIR, and mmWave sensors, PhysDrive enables the development of systems that can leverage the strengths of each, potentially overcoming individual limitations through sensor fusion.
Data Collection and Experimental Design
The data for PhysDrive was collected in Guangzhou City, China, under carefully designed real-world driving experiments. The experimental setup considered lighting conditions (Noon, Early Morning/Dusk, Rainy/Cloudy Days, Nighttime) and vehicle types (A0-segment, B-segment, C-segment SUV) as between-subject factors. Driver actions (stationary vs. speaking) and road conditions (Flat and Unobstructed, Flat but Congested, Bumpy and Congested Roads) were treated as within-subject variables. Each driver completed six driving segments, totaling about 30 minutes of data collection.
The data acquisition utilized a specialized platform integrating ECG sensors, a respiratory belt, RGB and NIR cameras, and a fingertip blood oxygen meter. The mmWave radar was also configured to capture detailed physiological movements. Robust temporal synchronization across all devices was ensured to maintain data integrity.
Key Findings and Benchmarks
The researchers extensively evaluated both traditional signal-processing and deep-learning methods on PhysDrive, establishing a comprehensive benchmark across all modalities. Key findings include:
- **Modality Performance:** Millimeter-wave methods consistently outperformed vision-based approaches (RGB and NIR) in directly estimating heart rate and respiration rate, especially in terms of Pearson’s correlation coefficient. While vision-based methods excel at recovering BVP waveforms, mmWave struggled with ECG waveform reconstruction, likely due to the detailed nature of ECG signals requiring stricter synchronization.
- **Impact of Driving Conditions:** Driver motions (e.g., talking) and challenging road conditions (e.g., bumpy roads) significantly impacted the accuracy of all methods. mmWave performance was particularly affected by road surface smoothness, while vision-based methods were more sensitive to brightness and rapid lighting changes.
- **Model Generalization:** Supervised deep learning models, while effective within the dataset, showed poorer generalization across different vehicle, lighting, or road scenarios. Unsupervised pretraining on large volumes of unlabeled driving data led to more robust representations, suggesting a two-stage training strategy for future models.
- **STMap Effectiveness:** For RGB video, methods using Spatial-Temporal Maps (STMap) as input outperformed direct video input networks in multi-task estimation and generalization. STMap helps reduce noise from head motion and illumination variations, though it introduces some latency.
Limitations and Future Directions
Despite its comprehensiveness, PhysDrive has some limitations. The participant cohort primarily consists of individuals of East Asian descent, limiting the evaluation of camera-based methods across diverse skin tones. The dataset currently focuses only on drivers, and while SpO2 recordings are included, the lack of a dedicated acquisition protocol for varying SpO2 levels limits its utility. Future work will aim to address these limitations by including more diverse participants, exploring passenger monitoring, and implementing higher-precision hardware-level timestamping for better synchronization.
Also Read:
- Knowledge Grafting: A New Method for Efficient AI on Resource-Limited Hardware
- Optimizing Infrastructure Maintenance with Hierarchical AI Under Budget Constraints
Conclusion
PhysDrive marks a significant step forward in remote physiological monitoring for in-vehicle driver monitoring. By providing a large-scale, multimodal dataset with meticulously designed driving scenarios, it offers a much-needed public benchmark for researchers. This dataset is poised to accelerate the development of advanced sensor fusion techniques and robust physiological measurement algorithms, paving the way for the next generation of intelligent cockpits and enhanced driving safety. You can find more details about the dataset and access the open-source code at the research paper’s page.


