TLDR: DiagECG is a novel AI framework that enables large language models (LLMs) to interpret 12-lead ECG signals for clinical tasks such as question answering and diagnostic report generation. It achieves this by discretizing continuous ECG data into symbolic tokens, effectively allowing LLMs to process physiological signals and natural language in a unified manner. This approach avoids the need for paired ECG-text data for initial alignment and demonstrates state-of-the-art performance and strong generalization across various diagnostic tasks.
Electrocardiography, or ECG, is a cornerstone in diagnosing heart conditions. It provides vital information about the heart’s electrical activity, helping doctors identify various cardiovascular diseases. However, automating the interpretation of these complex signals has always presented challenges. Traditional automated systems often struggle to adapt to new diagnostic categories or perform open-ended reasoning, requiring extensive retraining for every new task.
In parallel, large language models (LLMs) have shown incredible capabilities in understanding and generating human language. The idea of extending these powerful AI models to interpret physiological data like ECGs is compelling, but it’s not straightforward. ECG signals are continuous, often noisy, and lack the clear, symbolic structure of text. This fundamental difference makes it difficult to integrate ECG data directly into language models.
Addressing these challenges, researchers have introduced DiagECG, a groundbreaking framework that allows LLMs to process 12-lead ECG signals for clinical text generation tasks. DiagECG aims to bridge the gap between continuous physiological data and discrete language representations, enabling more flexible and generalizable AI-driven diagnostic reasoning.
The core of DiagECG lies in three innovative contributions. First, it employs a unique lead-wise encoder. This component processes each of the 12 ECG leads independently, capturing fine-grained temporal patterns without interference between leads. Think of it as carefully examining each individual stream of heart data before combining them.
Second, DiagECG introduces a discretization-based tokenizer. This is where the magic happens: continuous ECG data is converted into discrete, symbolic tokens. Imagine taking a continuous sound wave and breaking it down into individual musical notes that an AI can then ‘read’ like words. These ECG-specific tokens extend the LLM’s vocabulary, allowing it to handle both ECG and natural language inputs in a unified manner. This process avoids the need for complex, often unstable, alignment strategies that typically require paired ECG-text data for supervision.
Third, the framework utilizes autoregressive pretraining on these newly created ECG tokens. This means the LLM learns to predict the next ECG token in a sequence, much like it predicts the next word in a sentence. This pretraining step helps the LLM understand the temporal dynamics and patterns within ECG signals using its inherent language modeling capabilities. Following this, the model undergoes instruction tuning for specific clinical tasks, such as answering questions about ECGs or generating diagnostic reports, using efficient adaptation techniques.
Performance and Generalization
DiagECG has been rigorously evaluated on two key ECG understanding benchmarks: question answering (ECG-QA) and diagnostic report generation (ECG-Report). The results are impressive, demonstrating state-of-the-art performance across multiple datasets. For instance, in ECG-QA, DiagECG consistently achieved the highest accuracy, especially in complex open-ended query scenarios. In diagnostic report generation, it also outperformed existing models across various metrics, producing clinically coherent and relevant reports.
A significant advantage of DiagECG is its strong generalization to out-of-distribution settings. This means the model performs well even on ECG data it hasn’t explicitly seen during training, indicating its robustness and adaptability in real-world clinical scenarios. Ablation studies confirmed that each component of DiagECG—the discretization module, fine-tuning, and the inclusion of tabular patient features—contributes meaningfully to its superior performance.
Furthermore, analysis showed that DiagECG’s attention mechanism focuses on clinically meaningful regions of the ECG waveform depending on the query. For example, when asked about T-wave abnormalities, the model emphasized the T-wave segments, while for myocardial infarction queries, attention shifted to P and QRS complexes, aligning with diagnostic criteria. This context-dependent focus highlights the model’s ability to link ECG segments with specific clinical semantics.
Also Read:
- CardAIc-Agents: A Flexible AI Framework for Comprehensive Cardiac Support
- Unlocking Better Clinical Predictions with Advanced AI Training
Looking Ahead
DiagECG represents a significant step forward in integrating physiological signals with large language models for medical reasoning. By transforming continuous ECG data into a symbolic vocabulary, it overcomes a major hurdle in multimodal AI for healthcare. While currently designed for offline processing, future work may explore extending this approach to real-time ECG analysis and incorporating more medical knowledge to further enhance its interpretability and utility in clinical settings.
For more detailed information, you can read the full research paper here.


