TLDR: This research introduces an early-stage self-supervised foundation model for critical care time series, based on the Bi-Axial Transformer (BAT) architecture. Trained on pooled electronic health record datasets (MIMIC-III, MIMIC-IV, eICU), the model demonstrates effective transfer learning for mortality prediction, outperforming supervised baselines, particularly with smaller datasets. The work highlights the potential of self-supervised learning to create robust and generalizable clinical applications in settings with limited data.
In the rapidly evolving landscape of healthcare technology, domain-specific foundation models have seen significant growth. However, the realm of critical care time series data has remained relatively underexplored. This is largely due to the inherent challenges of limited dataset sizes and availability. A recent research paper, “Towards Self-Supervised Foundation Models for Critical Care Time Series”, introduces a groundbreaking early-stage pre-trained foundation model designed to address these very issues.
Authored by Katja Naasunnguaq Jagd, Rachael DeVries, and Ole Winther, this work presents a novel approach using the Bi-Axial Transformer (BAT) architecture. The model is trained on a collection of electronic health record (EHR) datasets, leveraging a technique called self-supervised pre-training. This method allows the model to learn rich, generalized representations from unlabeled data, which is particularly valuable in healthcare where labeled datasets are scarce.
The Challenge of Critical Care Data
Traditional general-purpose foundation models often struggle with healthcare applications due to the complex nature of medical data and the lack of extensive publicly available labeled datasets. While some healthcare-specific models exist for areas like clinical natural language processing or medical imaging, critical care time-series data, which involves continuous physiological measurements, has posed a unique challenge. Existing models in this domain often suffer from reproducibility issues, rely on simple supervised tasks, or are trained on small, homogeneous datasets, making them difficult to transfer to new clinical settings.
A New Approach: Bi-Axial Transformer and Self-Supervised Learning
The researchers modified the Bi-Axial Transformer (BAT) architecture for self-supervised pre-training. BAT is particularly well-suited for irregular multivariate time series data because it can attend to both temporal (time) and clinical feature axes simultaneously. Crucially, it explicitly accounts for missing values, a common characteristic of real-world clinical data. The self-supervised pre-training involves a forecasting task, where the model learns to predict future measurements based on past observations from auxiliary datasets.
After pre-training, the model is fine-tuned on a distinct dataset for a specific clinical task: mortality prediction. This process demonstrates effective transfer learning, meaning the knowledge gained during pre-training can be successfully applied to new, unseen data and tasks.
Key Findings and Performance
The experiments were conducted using three widely recognized ICU datasets: MIMIC-III, MIMIC-IV, and eICU. The pre-trained BAT model, particularly when trained on larger pooled datasets like eICU and MIMIC-IV, consistently outperformed supervised baseline models. This performance advantage was most significant in scenarios with smaller datasets (fewer than 5,000 samples), highlighting the model’s potential in resource-limited clinical environments where obtaining large amounts of labeled data is challenging.
An interesting finding was that fine-tuning only the binary classification head of the pre-trained model yielded performance comparable to fine-tuning the entire model. This suggests that the pre-trained model learns highly informative and transferable embeddings, which are valuable for various downstream tasks.
Also Read:
- Continuous Lab Value Estimation from PPG: The UNIPHY+Lab Framework
- DACL: Enhancing Biosignal Analysis with Diffusion-Powered Contrastive Learning
Implications and Future Directions
This research underscores the feasibility and benefits of developing self-supervised foundation models for critical care time series data within a transparent and reproducible framework. Such models hold immense promise for creating robust and generalizable clinical applications, especially in settings with limited labeled data and computational resources.
The authors acknowledge limitations, primarily the reliance on a few specific datasets. Future work will aim to incorporate more diverse and larger datasets, potentially even from other domains like weather or electricity consumption, to further enhance the model’s generalizability and assess the necessity of domain-specific data for critical care applications.


