spot_img
HomeResearch & DevelopmentPredicting Heart Health Risks with ASCENDgpt: A New AI...

Predicting Heart Health Risks with ASCENDgpt: A New AI Approach Using Patient Records

TLDR: ASCENDgpt is a new transformer-based AI model designed for cardiovascular risk prediction from electronic health records. It introduces a novel phenotype-aware tokenization scheme, mapping over 47,000 raw ICD codes to 176 clinically meaningful phenotype tokens, significantly reducing vocabulary size and enhancing interpretability. The model is pretrained using masked language modeling and fine-tuned for time-to-event prediction of five cardiovascular outcomes (MI, stroke, MACE, cardiovascular death, all-cause mortality), achieving an average C-index of 0.816. This approach demonstrates strong predictive performance, computational efficiency, and clinical interpretability, marking a significant advancement in EHR-based risk prediction.

Cardiovascular disease remains a global health crisis, responsible for millions of deaths annually. Identifying individuals at high risk early on is vital for effective prevention and improving patient outcomes. Traditionally, cardiovascular risk prediction has relied on established clinical scores like the Framingham Risk Score, which use a limited set of variables and assume linear relationships. However, these methods often fail to fully utilize the vast amount of information available in modern Electronic Health Records (EHRs), which contain rich, longitudinal data about a patient’s health journey.

Recent advancements in artificial intelligence, particularly deep learning, have shown great promise in extracting meaningful patterns from complex EHR data. Inspired by the success of transformer models in natural language processing, researchers have begun applying similar architectures to healthcare data, treating patient medical histories as sequences of events.

Introducing ASCENDgpt: A New Approach to Cardiovascular Risk Prediction

A new transformer-based model, ASCENDgpt, has been developed specifically for cardiovascular risk prediction using longitudinal electronic health records. This model introduces several key innovations to address the unique challenges of EHR-based prediction.

One of ASCENDgpt’s most significant contributions is its novel phenotype-aware tokenization scheme. Instead of treating the vast number of raw ICD codes (which can be over 47,000) as individual tokens, ASCENDgpt maps them to a much smaller set of 176 clinically meaningful phenotype tokens. This process consolidates diagnosis codes by 99.6%, drastically reducing the vocabulary size while preserving crucial semantic information and clinical interpretability. This reduction makes the model more computationally efficient and its predictions easier for clinicians to understand.

The model is then pretrained on sequences derived from the EHRs of 19,402 unique individuals using a masked language modeling objective. This step allows ASCENDgpt to learn robust representations of cardiovascular disease patterns and temporal relationships within patient histories. Following pretraining, the model is fine-tuned for time-to-event prediction of five major cardiovascular outcomes: myocardial infarction (MI), stroke, major adverse cardiovascular events (MACE), cardiovascular death, and all-cause mortality. This fine-tuning uses survival analysis methods, which are essential for handling the inherent censoring in clinical data where not all patients experience an event.

How ASCENDgpt Works

ASCENDgpt processes patient data by constructing sequences of medical events. Each event is encoded in a structured format that captures its type, the relevant phenotype, context (like outpatient or emergency), and temporal information (days from the first event, patient age). For example, a diagnosis of hypertension in an outpatient setting would be concisely represented as ‘EVT_DIAG PHENO_HYPERTENSION CTX_OUTPATIENT DAY_0 AGE_45’. This domain-optimized structure maintains semantic meaning while being computationally efficient.

The model itself is a transformer encoder with 103.3 million parameters, designed to handle sequences up to 2,048 tokens long. After pretraining, task-specific layers are added to predict the risk scores for each of the five cardiovascular outcomes, using a specialized loss function for survival analysis.

Impressive Performance and Efficiency

ASCENDgpt achieved excellent discrimination on a held-out test set, with an average C-index of 0.816 across all five outcomes. Notably, it performed exceptionally well for cardiovascular death prediction (0.842), and strongly for stroke (0.824) and all-cause mortality (0.824). The model’s performance demonstrates its ability to generalize well to unseen data.

Beyond its predictive accuracy, ASCENDgpt offers significant computational advantages. The phenotype-based approach leads to a 77.9% reduction in vocabulary size compared to using raw ICD codes, resulting in a smaller model and substantially faster training and inference times. This efficiency makes it more practical for real-world clinical applications.

Also Read:

Clinical Interpretability and Future Directions

A key benefit of ASCENDgpt’s phenotype-aware design is enhanced clinical interpretability. By operating on clinically meaningful concepts rather than thousands of granular codes, clinicians can better understand the model’s predictions in terms of familiar disease patterns. This interpretability is crucial for building trust and facilitating the adoption of AI in healthcare.

While ASCENDgpt represents a significant step forward, the researchers acknowledge limitations, including its reliance on data from a single institution and the need for further refinement of phenotype mappings. Future work will focus on integrating multi-modal data (like laboratory values and vital signs), refining phenotype groupings, external validation on independent healthcare systems, and ultimately, prospective validation in clinical settings.

This work highlights the importance of incorporating clinical knowledge into deep learning architectures for healthcare. By combining domain-specific tokenization with powerful transformer models, ASCENDgpt offers a promising path toward more accurate, efficient, and interpretable cardiovascular risk prediction. You can read the full research paper here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -