TLDR: Researchers have developed MBT-CB, a Multi-target Bayesian Transformer framework, to predict key cardiovascular disease (CVD) biomarkers (LDL-C, HbA1c, BMI, SysBP) from Electronic Health Records (EHR) data during pandemics. The model uniquely combines Bayesian Variational Inference for uncertainty estimation, a BERT-based transformer for temporal patterns, and DeepMTR for biomarker interdependencies. Evaluated on COVID-19 era data, MBT-CB significantly outperformed other models, demonstrating high accuracy and the ability to provide confidence levels for its predictions, crucial for clinical decision-making.
The COVID-19 pandemic brought unprecedented challenges to healthcare systems globally, particularly affecting individuals with chronic conditions like cardiovascular disease (CVD). The disruptions, ranging from delayed medical care to changes in lifestyle, significantly impacted crucial CVD biomarkers such as LDL cholesterol (LDL-C), HbA1c, BMI, and systolic blood pressure (SysBP). Accurately predicting these changes is vital for understanding disease progression and guiding preventive care, especially during health crises.
Traditional approaches to predicting these biomarkers often fall short. They typically predict each biomarker independently, ignoring the complex interdependencies between them. Furthermore, they often fail to account for the temporal patterns in patient data or to quantify the inherent uncertainty in predictions, which is crucial for reliable clinical decision-making.
Introducing MBT-CB: A Novel Predictive Framework
To address these critical gaps, researchers have developed MBT-CB, a Multi-target Bayesian Transformer framework. This innovative model is designed to jointly predict LDL-C, HbA1c, BMI, and SysBP from Electronic Health Records (EHR) data. What makes MBT-CB stand out is its ability to simultaneously capture biomarker interdependencies, temporal patterns, and predictive uncertainty.
The MBT-CB model integrates several advanced techniques:
- Bayesian Variational Inference: This component helps estimate uncertainties, distinguishing between data noise (aleatoric uncertainty) and model-related uncertainty (epistemic uncertainty). This means the model not only makes a prediction but also tells us how confident it is in that prediction.
- Pre-trained BERT-based Transformer: Leveraging the power of large language models, MBT-CB uses a ClinicalBERT-based transformer. This allows the model to understand and learn complex temporal relationships within a patient’s EHR data, treating sequences of biomarker readings over time as ‘sentences’ of clinical information.
- Deep Multi-Target Regression (DeepMTR): This layer is crucial for capturing the intricate inter-relationships between different biomarkers. Instead of predicting each biomarker in isolation, DeepMTR enables the model to learn shared patterns and target-specific characteristics, leading to more coherent and accurate predictions.
How MBT-CB Works
The framework processes patient EHR data by structuring each clinical visit’s biomarker values into a ‘sentence’. These sentences are then fed into the ClinicalBERT model, which has been pre-trained on a vast amount of clinical text, allowing it to generate contextualized embeddings. To these embeddings, the model adds positional information (to capture visit order), segment embeddings (to distinguish between pre-pandemic and pandemic visits), and demographic identifiers (like gender, race, and income).
The core of MBT-CB’s innovation lies in its Variational Self-Attention mechanism. Unlike standard transformers that use fixed weights, MBT-CB treats attention weights as probability distributions. This allows the model to quantify uncertainty directly within its attention mechanism. The output from this attention mechanism then goes into the DeepMTR head, which makes the final multi-biomarker predictions while considering their interdependencies.
Impressive Performance and Clinical Relevance
MBT-CB was evaluated on retrospective EHR data from 3,390 CVD patient records (304 unique patients) in Central Massachusetts during the COVID-19 pandemic. The results were highly promising: MBT-CB significantly outperformed a wide range of baseline models, including other BERT-based machine learning models. It achieved superior accuracy metrics (MAE of 0.00887, RMSE of 0.0135, and MSE of 0.00027).
Beyond just accuracy, MBT-CB effectively captured data and model uncertainty, patient biomarker inter-relationships, and temporal dynamics. For instance, it showed narrow uncertainty intervals for SysBP and LDL-C, with occasional spikes indicating model uncertainty in less familiar data regions. For BMI and HbA1c, it displayed broader uncertainty bands, suggesting higher intrinsic variability or measurement noise.
The model’s ability to provide uncertainty estimates is particularly valuable in clinical settings, enabling healthcare providers to make more risk-aware decisions, especially during periods of healthcare disruption. By understanding not just what the prediction is, but also how confident the model is in that prediction, clinicians can better tailor preventive care strategies and identify at-risk patients proactively.
Also Read:
- Unlocking Patient Data: How LLMs Are Transforming OPQRST Extraction
- Unveiling Hidden Biases: A New Framework for Fair AI in Clinical Decisions
Future Directions
While MBT-CB represents a significant leap forward, the researchers acknowledge limitations, such as the generalizability of findings due to the training data primarily coming from two hospitals in Central Massachusetts with a predominantly White patient population. Future work aims to scale the model to larger, more diverse EHR datasets and to enhance its explainability further, for example, by integrating SHAP values to clarify feature contributions.
In conclusion, MBT-CB offers a robust and uncertainty-aware framework for predicting critical cardiovascular disease biomarkers. Its superior performance and ability to interpret complex clinical data make it a powerful tool for improving CVD management and supporting clinical decision-making, particularly in challenging environments like pandemics. You can read the full research paper here: A Multi-target Bayesian Transformer Framework for Predicting Cardiovascular Disease Biomarkers during Pandemics.


