Predicting Heart Health Risks with ASCENDgpt: A New AI Approach Using Patient Records

TLDR: ASCENDgpt is a new transformer-based AI model designed for cardiovascular risk prediction from electronic health records. It introduces a novel phenotype-aware tokenization scheme, mapping over 47,000 raw ICD codes to 176 clinically meaningful phenotype tokens, significantly reducing vocabulary size and enhancing interpretability. The model is pretrained using masked language modeling and fine-tuned for time-to-event prediction of five cardiovascular outcomes (MI, stroke, MACE, cardiovascular death, all-cause mortality), achieving an average C-index of 0.816. This approach demonstrates strong predictive performance, computational efficiency, and clinical interpretability, marking a significant advancement in EHR-based risk prediction.

Cardiovascular disease remains a global health crisis, responsible for millions of deaths annually. Identifying individuals at high risk early on is vital for effective prevention and improving patient outcomes. Traditionally, cardiovascular risk prediction has relied on established clinical scores like the Framingham Risk Score, which use a limited set of variables and assume linear relationships. However, these methods often fail to fully utilize the vast amount of information available in modern Electronic Health Records (EHRs), which contain rich, longitudinal data about a patient’s health journey.

Recent advancements in artificial intelligence, particularly deep learning, have shown great promise in extracting meaningful patterns from complex EHR data. Inspired by the success of transformer models in natural language processing, researchers have begun applying similar architectures to healthcare data, treating patient medical histories as sequences of events.

Introducing ASCENDgpt: A New Approach to Cardiovascular Risk Prediction

A new transformer-based model, ASCENDgpt, has been developed specifically for cardiovascular risk prediction using longitudinal electronic health records. This model introduces several key innovations to address the unique challenges of EHR-based prediction.

One of ASCENDgpt’s most significant contributions is its novel phenotype-aware tokenization scheme. Instead of treating the vast number of raw ICD codes (which can be over 47,000) as individual tokens, ASCENDgpt maps them to a much smaller set of 176 clinically meaningful phenotype tokens. This process consolidates diagnosis codes by 99.6%, drastically reducing the vocabulary size while preserving crucial semantic information and clinical interpretability. This reduction makes the model more computationally efficient and its predictions easier for clinicians to understand.

The model is then pretrained on sequences derived from the EHRs of 19,402 unique individuals using a masked language modeling objective. This step allows ASCENDgpt to learn robust representations of cardiovascular disease patterns and temporal relationships within patient histories. Following pretraining, the model is fine-tuned for time-to-event prediction of five major cardiovascular outcomes: myocardial infarction (MI), stroke, major adverse cardiovascular events (MACE), cardiovascular death, and all-cause mortality. This fine-tuning uses survival analysis methods, which are essential for handling the inherent censoring in clinical data where not all patients experience an event.

How ASCENDgpt Works

ASCENDgpt processes patient data by constructing sequences of medical events. Each event is encoded in a structured format that captures its type, the relevant phenotype, context (like outpatient or emergency), and temporal information (days from the first event, patient age). For example, a diagnosis of hypertension in an outpatient setting would be concisely represented as ‘EVT_DIAG PHENO_HYPERTENSION CTX_OUTPATIENT DAY_0 AGE_45’. This domain-optimized structure maintains semantic meaning while being computationally efficient.

The model itself is a transformer encoder with 103.3 million parameters, designed to handle sequences up to 2,048 tokens long. After pretraining, task-specific layers are added to predict the risk scores for each of the five cardiovascular outcomes, using a specialized loss function for survival analysis.

Impressive Performance and Efficiency

ASCENDgpt achieved excellent discrimination on a held-out test set, with an average C-index of 0.816 across all five outcomes. Notably, it performed exceptionally well for cardiovascular death prediction (0.842), and strongly for stroke (0.824) and all-cause mortality (0.824). The model’s performance demonstrates its ability to generalize well to unseen data.

Beyond its predictive accuracy, ASCENDgpt offers significant computational advantages. The phenotype-based approach leads to a 77.9% reduction in vocabulary size compared to using raw ICD codes, resulting in a smaller model and substantially faster training and inference times. This efficiency makes it more practical for real-world clinical applications.

Also Read:

Clinical Interpretability and Future Directions

A key benefit of ASCENDgpt’s phenotype-aware design is enhanced clinical interpretability. By operating on clinically meaningful concepts rather than thousands of granular codes, clinicians can better understand the model’s predictions in terms of familiar disease patterns. This interpretability is crucial for building trust and facilitating the adoption of AI in healthcare.

While ASCENDgpt represents a significant step forward, the researchers acknowledge limitations, including its reliance on data from a single institution and the need for further refinement of phenotype mappings. Future work will focus on integrating multi-modal data (like laboratory values and vital signs), refining phenotype groupings, external validation on independent healthcare systems, and ultimately, prospective validation in clinical settings.

This work highlights the importance of incorporating clinical knowledge into deep learning architectures for healthcare. By combining domain-specific tokenization with powerful transformer models, ASCENDgpt offers a promising path toward more accurate, efficient, and interpretable cardiovascular risk prediction. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Predicting Heart Health Risks with ASCENDgpt: A New AI Approach Using Patient Records

Introducing ASCENDgpt: A New Approach to Cardiovascular Risk Prediction

How ASCENDgpt Works

Impressive Performance and Efficiency

Clinical Interpretability and Future Directions

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates