TLDR: Federated Markov Imputation (FMI) is a new privacy-preserving method for filling in missing time-series data in multi-centric ICU environments using federated learning. It allows ICUs to collaboratively build a global transition model without sharing raw patient data. Evaluated on a sepsis prediction task using the MIMIC-IV dataset, FMI outperforms local imputation methods, especially when ICUs collect data at varying temporal granularities, improving predictive performance while maintaining privacy.
In the evolving landscape of healthcare, the use of data from multiple hospitals, particularly Intensive Care Units (ICUs), holds immense potential for improving patient care and prediction models. However, a significant hurdle in this collaborative effort is the presence of missing data, especially in time-series records, and the varying ways different institutions collect this information. This challenge is further complicated by the need to protect patient privacy, making traditional data sharing difficult.
A new research paper introduces a novel solution called Federated Markov Imputation (FMI). This method allows ICUs to work together to fill in missing time-series data without directly sharing sensitive patient information. FMI is designed to address the problem of incomplete data in federated learning environments, where multiple institutions train a shared model without centralizing raw data.
The Challenge of Missing Data in ICUs
Clinical prediction tasks, such as predicting the onset of sepsis, heavily rely on continuous time-series data. However, this data is often incomplete due to irregular sampling intervals or clinical priorities. When multiple hospitals collaborate using federated learning, this missing data problem is amplified. Institutions often collect data at different temporal granularities – some hourly, others every two or three hours. This disparity makes it difficult to align data for a unified model, and existing local imputation techniques struggle when fine-grained temporal transitions are needed across varied collection schedules.
How Federated Markov Imputation Works
FMI offers a privacy-preserving approach to temporal imputation, consisting of three main steps:
Step 1: Local Transition Matrix. Each ICU first processes its own time-series data. It discretizes features into bins and then creates a “first-order Markov transition matrix.” This matrix essentially records the empirical probability of transitioning from one data state (bin) to another between adjacent time steps, using only the observed, non-missing data. This step is performed entirely locally, keeping individual ICU data private.
Step 2: Federated Transition Matrix. The next step involves securely aggregating these local transition counts. Using a technique called Secure Aggregation, all participating ICUs contribute their masked transition counts. These are then summed up to create a “global transition matrix” (Tfed). Crucially, this aggregation happens without any individual ICU’s specific statistics being exposed, maintaining patient data privacy.
Step 3: Federated Markov Imputation. Once the global transition matrix is established, each ICU can use it to impute its missing values. If a missing value has both a preceding and succeeding known data point, FMI selects the most likely bin that best fits the expected transitions on both sides, based on the global matrix. If only one neighbor is known, a single-directional transition is used. For consecutive missing values, the most probable path is inferred recursively. The final imputed value is the midpoint of the selected bin.
Evaluating FMI: Sepsis Onset Prediction
The researchers evaluated FMI using the MIMIC-IV dataset, focusing on a real-world sepsis onset prediction task. The study involved data from 28,610 patients across seven different ICUs. They considered two main scenarios: a “Regular” setting where all ICUs had hourly data, and an “Irregular” setting where some ICUs were assigned 2-hour or 3-hour intervals to simulate heterogeneous sampling granularities.
FMI was compared against two baselines: local mean imputation (filling missing values with the ICU’s feature-wise mean) and Local Markov Imputation (LMI), which uses only each ICU’s local transition matrix. LMI, however, was not feasible in the Irregular setting for ICUs with coarser intervals due to the lack of hourly resolution.
Key Findings
In the Regular setting, FMI showed moderate improvements in the mean AUC (Area Under the Curve, a measure of predictive performance) over both baselines. All three imputation methods resulted in models with an AUC greater than 0.8, generally considered the threshold for clinical applicability.
The true strength of FMI became apparent in the Irregular setting. While overall performance declined across all methods due to increased data scarcity and lower resolution, FMI significantly reduced the negative impact of irregular sampling. It particularly improved performance for ICUs with 3-hour intervals (MICU/SICU and NSICU), where local mean imputation degraded significantly and LMI was not even applicable. This highlights FMI’s ability to harmonize heterogeneous clinical data and impute missing values effectively without compromising privacy.
Also Read:
- Federated Learning Enhances Early Sepsis Prediction with Flexible Time Windows
- Advancing Critical Care Predictions with Self-Supervised Models
Conclusion and Future Directions
Federated Markov Imputation represents a significant step forward in privacy-preserving time-series data imputation for federated clinical settings. Its ability to improve predictive performance, especially when ICUs collect data at different temporal granularities, underscores its potential to facilitate more robust and collaborative healthcare AI models. The full research paper can be found here.
Future work aims to conduct more detailed analyses, extend comparisons with additional baselines, and validate FMI in real-world clinical case studies.


