Bridging Data Gaps: A New Approach to Privacy-Preserving Imputation in ICU Federated Learning

TLDR: Federated Markov Imputation (FMI) is a new privacy-preserving method for filling in missing time-series data in multi-centric ICU environments using federated learning. It allows ICUs to collaboratively build a global transition model without sharing raw patient data. Evaluated on a sepsis prediction task using the MIMIC-IV dataset, FMI outperforms local imputation methods, especially when ICUs collect data at varying temporal granularities, improving predictive performance while maintaining privacy.

In the evolving landscape of healthcare, the use of data from multiple hospitals, particularly Intensive Care Units (ICUs), holds immense potential for improving patient care and prediction models. However, a significant hurdle in this collaborative effort is the presence of missing data, especially in time-series records, and the varying ways different institutions collect this information. This challenge is further complicated by the need to protect patient privacy, making traditional data sharing difficult.

A new research paper introduces a novel solution called Federated Markov Imputation (FMI). This method allows ICUs to work together to fill in missing time-series data without directly sharing sensitive patient information. FMI is designed to address the problem of incomplete data in federated learning environments, where multiple institutions train a shared model without centralizing raw data.

The Challenge of Missing Data in ICUs

Clinical prediction tasks, such as predicting the onset of sepsis, heavily rely on continuous time-series data. However, this data is often incomplete due to irregular sampling intervals or clinical priorities. When multiple hospitals collaborate using federated learning, this missing data problem is amplified. Institutions often collect data at different temporal granularities – some hourly, others every two or three hours. This disparity makes it difficult to align data for a unified model, and existing local imputation techniques struggle when fine-grained temporal transitions are needed across varied collection schedules.

How Federated Markov Imputation Works

FMI offers a privacy-preserving approach to temporal imputation, consisting of three main steps:

Step 1: Local Transition Matrix. Each ICU first processes its own time-series data. It discretizes features into bins and then creates a “first-order Markov transition matrix.” This matrix essentially records the empirical probability of transitioning from one data state (bin) to another between adjacent time steps, using only the observed, non-missing data. This step is performed entirely locally, keeping individual ICU data private.

Step 2: Federated Transition Matrix. The next step involves securely aggregating these local transition counts. Using a technique called Secure Aggregation, all participating ICUs contribute their masked transition counts. These are then summed up to create a “global transition matrix” (Tfed). Crucially, this aggregation happens without any individual ICU’s specific statistics being exposed, maintaining patient data privacy.

Step 3: Federated Markov Imputation. Once the global transition matrix is established, each ICU can use it to impute its missing values. If a missing value has both a preceding and succeeding known data point, FMI selects the most likely bin that best fits the expected transitions on both sides, based on the global matrix. If only one neighbor is known, a single-directional transition is used. For consecutive missing values, the most probable path is inferred recursively. The final imputed value is the midpoint of the selected bin.

Evaluating FMI: Sepsis Onset Prediction

The researchers evaluated FMI using the MIMIC-IV dataset, focusing on a real-world sepsis onset prediction task. The study involved data from 28,610 patients across seven different ICUs. They considered two main scenarios: a “Regular” setting where all ICUs had hourly data, and an “Irregular” setting where some ICUs were assigned 2-hour or 3-hour intervals to simulate heterogeneous sampling granularities.

FMI was compared against two baselines: local mean imputation (filling missing values with the ICU’s feature-wise mean) and Local Markov Imputation (LMI), which uses only each ICU’s local transition matrix. LMI, however, was not feasible in the Irregular setting for ICUs with coarser intervals due to the lack of hourly resolution.

Key Findings

In the Regular setting, FMI showed moderate improvements in the mean AUC (Area Under the Curve, a measure of predictive performance) over both baselines. All three imputation methods resulted in models with an AUC greater than 0.8, generally considered the threshold for clinical applicability.

The true strength of FMI became apparent in the Irregular setting. While overall performance declined across all methods due to increased data scarcity and lower resolution, FMI significantly reduced the negative impact of irregular sampling. It particularly improved performance for ICUs with 3-hour intervals (MICU/SICU and NSICU), where local mean imputation degraded significantly and LMI was not even applicable. This highlights FMI’s ability to harmonize heterogeneous clinical data and impute missing values effectively without compromising privacy.

Also Read:

Conclusion and Future Directions

Federated Markov Imputation represents a significant step forward in privacy-preserving time-series data imputation for federated clinical settings. Its ability to improve predictive performance, especially when ICUs collect data at different temporal granularities, underscores its potential to facilitate more robust and collaborative healthcare AI models. The full research paper can be found here.

Future work aims to conduct more detailed analyses, extend comparisons with additional baselines, and validate FMI in real-world clinical case studies.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Bridging Data Gaps: A New Approach to Privacy-Preserving Imputation in ICU Federated Learning

The Challenge of Missing Data in ICUs

How Federated Markov Imputation Works

Evaluating FMI: Sepsis Onset Prediction

Key Findings

Conclusion and Future Directions

Gen AI News and Updates

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

A New Benchmark for Evaluating AI in Electronic Health Records: Introducing EHRStruct

Hybrid Federated Learning Secures Omics Data While Boosting Performance

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates