TLDR: The George B. Moody PhysioNet Challenge 2025 focused on developing AI algorithms to detect Chagas disease from ECGs, addressing limited serological testing capacity. It utilized diverse ECG datasets, including hidden test sets to assess generalizability, and introduced a novel evaluation metric that prioritized true positives within a fixed referral capacity (top 5% of predictions). Over 630 participants contributed, and while models showed performance drops on unseen data, the best algorithms identified significantly more Chagas-positive patients than random screening, demonstrating the potential of ECG-based triage for early detection.
Chagas disease, a parasitic infection primarily transmitted by insects, is endemic to South America, Central America, and increasingly, the U.S. While chronic Chagas disease can lead to severe cardiovascular and digestive issues, serological testing capacity is often limited. However, Chagas cardiomyopathy frequently manifests in electrocardiograms (ECGs), offering a crucial opportunity to identify and prioritize patients for confirmatory testing and treatment.
The George B. Moody PhysioNet Challenge 2025 was launched to address this critical need, inviting teams to develop algorithmic approaches for detecting Chagas disease from standard 12-lead ECGs. This initiative aimed to create open-source solutions that could help prioritize potential Chagas patients for diagnosis and care.
Innovations and Approach
The Challenge introduced several key innovations. Firstly, it brought together a diverse collection of datasets, including a large dataset with weak labels and smaller datasets with strong, serologically validated labels. Secondly, data augmentation techniques were employed to enhance model robustness and ensure generalizability across different, unseen data sources. Thirdly, a novel evaluation metric was designed to reflect real-world constraints, specifically the limited serological testing capacity in Chagas-endemic regions. This framed the machine learning problem as a triage task, focusing on prioritizing patients effectively.
The impact of the Challenge was significant, attracting over 630 participants from 111 teams who submitted more than 1300 entries, showcasing a wide array of approaches from both academia and industry globally.
The Role of ECGs in Detection
Electrocardiography (ECG) stands out as a widely available, low-cost, and non-invasive tool for capturing the heart’s electrical activity. Early research has demonstrated that alterations in heart rate variability and ECG patterns can be present even before clinical symptoms of cardiac involvement become evident. More recently, advancements in artificial intelligence, particularly deep learning, have shown transformative potential for automated ECG analysis. These models have achieved high performance in classifying common arrhythmias and can even predict latent conditions not traditionally identifiable from ECGs alone. In the context of Chagas disease, deep learning has been applied to predict left ventricular systolic dysfunction and to develop screening models capable of distinguishing seropositive individuals using standard ECG recordings.
Challenge Data and Preprocessing
For the Challenge, a comprehensive dataset of 378,624 12-lead ECG recordings was assembled from six different sources. These included public datasets like CODE-15%, SaMi-Trop, and PTB-XL, which formed part of the public training set. Crucially, hidden validation and test sets, comprising private datasets such as REDS-II, SaMi-Trop 3, and ELSA-Brasil, were used to rigorously assess common machine learning problems like overfitting and model generalizability to new data. The prevalence rates of Chagas disease were kept approximately equal across training, validation, and test sets.
Data preprocessing involved reformatting ECG signals into a consistent WFDB format, truncating zero-padded signals, and removing empty ones. To deidentify data, ages above 89 were standardized to 90. For datasets like REDS-II and SaMi-Trop 3, which initially had balanced Chagas-positive and negative cases, Chagas-negative cases were oversampled to achieve a positivity rate of approximately 2%, mirroring the prevalence in the ELSA-Brasil data. Additionally, various forms of noise, filters, and resampling were applied to both positive and negative records to create augmented, highly similar records, further enhancing the dataset’s diversity and robustness.
Evaluation Metric: Prioritizing Patients
A distinctive feature of the 2025 PhysioNet Challenge was its scoring metric, which directly addressed the severe limitations in serological testing capacity prevalent in many Chagas-endemic regions. Unlike traditional machine learning metrics such as the area under the receiver-operating characteristic (AUROC) curve, this Challenge evaluated algorithms based on their ability to prioritize true Chagas-positive patients within a fixed referral capacity. Specifically, algorithms were scored by computing the true positive rate (TPR) among the top 5% of patients ranked by their predicted probability of having Chagas disease. This 5% threshold was chosen to represent an estimated bound for testing capacity in real-world scenarios in Brazil.
This formulation transformed the task into a constrained ranking problem, closely mirroring deployment conditions where testing resources are scarce. The metric emphasized high-precision triage and encouraged the development of models that are both discriminative and resource-aware, which are essential characteristics for scalable Chagas disease screening programs.
Also Read:
- CardioForest: Enhancing Wide QRS Complex Tachycardia Diagnosis with Explainable AI
- New AI Framework Translates Wearable PPG Signals into Comprehensive 12-Lead ECGs
Outcomes and Future Implications
The Challenge garnered significant participation, with 1317 code submissions from 111 teams. After rigorous evaluation, 41 teams met all requirements to be ranked. The top-performing team was Biomed-Cardio. A key finding was the observed decrease in model performance on the hidden test sets, particularly on the ELSA-Brasil data. This dataset, representing a more asymptomatic patient population and different ECG collection practices, highlighted the difficulties in generalizing models to unseen data and diverse clinical environments.
Despite these challenges, the highest-performing models demonstrated a remarkable ability to identify nearly three times as many Chagas-positive patients compared to indiscriminate testing. This suggests a strong potential for these ECG-based screening tools to facilitate early detection and intervention, potentially arresting Chagas disease development in patients while they are still largely asymptomatic. The Challenge successfully provided a rich compendium of ECG data and a clinically relevant evaluation framework, fostering the development of machine learning models that can prioritize testing and improve outcomes for Chagas disease patients. For more details, you can refer to the original research paper.


