TLDR: This research introduces “Impute With Confidence,” a framework for multivariate time series imputation that quantifies and uses model uncertainty. By selectively imputing only values the model is confident about, it reduces imputation errors and improves performance in downstream tasks like 24-hour mortality prediction, especially in healthcare settings with challenging missing data patterns. The framework uses Monte Carlo Dropout for uncertainty estimation, which is shown to correlate highly with imputation error.
Time series data, which tracks measurements over time, is incredibly common across many fields, from finance to environmental science. However, it often comes with a significant challenge: missing values. This issue is particularly pronounced in healthcare, where sensor disconnections during patient procedures or other events can lead to large, continuous gaps in vital sign data. When dealing with such critical information, simply filling in the blanks isn’t enough; knowing how confident we are in those filled-in values is crucial for reliable decision-making.
Most existing methods for imputing (filling in) missing time series data either ignore the uncertainty associated with their predictions or lack a clear way to measure it. To address this, researchers Addison Weatherhead and Anna Goldenberg from the University of Toronto have introduced a novel framework called “Impute With Confidence.” This framework not only quantifies the uncertainty of imputed values but also leverages it to make smarter imputation decisions.
The core idea behind “Impute With Confidence” is to avoid imputing values that the model is highly uncertain about. By focusing only on values where the model expresses high confidence, the framework aims to prevent unreliable imputations from negatively impacting subsequent analyses or predictions. This selective approach is particularly beneficial in sensitive domains like healthcare, where inaccurate data can have serious consequences.
The researchers conducted experiments using multiple Electronic Health Record (EHR) datasets, including MIMIC-IV, eICU, and HiRID. These datasets represent diverse types of missingness patterns, mimicking real-world scenarios such as data missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR), and specific patterns like “Black Out” and “Block Black Out” (where entire blocks of data are missing, common during patient transfers or sensor disconnections).
To quantify uncertainty, the framework utilizes a technique called Monte Carlo Dropout. This method, rooted in deep learning, allows the model to provide not just a single imputed value but a distribution of possible values. The variability (standard deviation) within this distribution then serves as a measure of the model’s uncertainty. The study empirically demonstrated a strong correlation between this uncertainty measure and the actual imputation error, validating its effectiveness.
The impact of this uncertainty-aware imputation was evaluated on a critical downstream task: 24-hour mortality prediction. The results showed that selectively imputing less-uncertain values significantly reduced imputation errors and, more importantly, improved the performance of the mortality prediction model. For instance, in some cases, the downstream task performed best when only the 60% most confident missing values were imputed, outperforming both full imputation and no imputation at all.
This work highlights a significant insight for machine learning applications in healthcare: a simple and widely applicable uncertainty measure can be computed for many time series imputation models. Communicating this uncertainty to end-users can build trust in the models, and incorporating uncertainty values into downstream tasks can lead to more accurate predictions. While the framework requires multiple forward passes of the model to compute uncertainty, which might be computationally intensive in some environments, the benefits of improved accuracy and reliability, especially in critical healthcare applications, are substantial.
Also Read:
- Keeping AI Models Reliable: A New Approach to Monitoring Performance During Adaptation
- Quantum-Enhanced AI: A New Frontier for Filling Missing Data
For more detailed information, you can refer to the full research paper: Impute With Confidence: A Framework for Uncertainty Aware Multivariate Time Series Imputation.


