TLDR: This research compares CNN and BEATs transformer models for automated heart murmur detection using phonocardiograms (PCGs), evaluating fixed-length and novel heart cycle normalization methods. While specialized CNNs show superior accuracy (AUROC 79.5%), BEATs transformers offer efficiency benefits. Heart cycle normalization improved BEATs performance but slightly degraded CNNs, highlighting the impact of preprocessing on different AI architectures for cardiac diagnostics.
A recent study delves into the advancements of automated phonocardiogram (PCG) classification, a significant step forward in diagnosing cardiovascular diseases. The research, titled “Comparative Analysis of CNN and Transformer Architectures with Heart Cycle Normalization for Automated Phonocardiogram Classification,” explores how different artificial intelligence models perform in detecting heart murmurs from PCG recordings. You can find the full paper here.
Cardiovascular diseases remain a leading cause of death globally, with untreated valvular issues severely impacting quality of life. Early detection through heart sound analysis is crucial, but traditional methods relying on stethoscopes often lead to subjective assessments and require extensive clinical experience. This highlights the need for more accurate and efficient automated systems, especially for deployment in underserved regions.
The Challenge of Automated Classification
Automated classification of heart sounds faces several hurdles. The complex temporal structure and physiological variability of PCG signals, coupled with the necessity for high diagnostic accuracy, make it a challenging task. Signal quality can vary greatly due to recording equipment and environmental conditions, leading to inconsistent and noisy data. Additionally, diverse patient populations introduce a wide range of acoustic patterns, complicating model generalization. Resource constraints in many clinical settings also underscore the need for computationally efficient models.
Models and Methods Explored
The study systematically compared four distinct models for heart murmur detection: two specialized convolutional neural networks (CNNs) and two zero-shot universal audio transformers (BEATs). These models were evaluated using two different signal processing approaches: fixed-length windowing and a novel heart cycle normalization method tailored to individual cardiac rhythms. The PhysioNet2022 dataset, comprising 3163 PCG recordings from 816 patients, was utilized for this evaluation. The dataset includes both murmur-positive and murmur-negative samples, with a pediatric focus.
A key innovation in this research is the custom heart cycle normalization. Unlike fixed-duration windowing, this method aligns PCG signals based on physiological heart cycle boundaries, aiming to preserve critical temporal dynamics. This involves identifying complete heart cycles between S1 peaks and stretching or compressing them to achieve a uniform sample count. While this approach ensures physiological alignment, it comes at the cost of data utilization efficiency, as only about half of the signal length contains sufficient annotated cycle information for processing.
The CNN models employed a deep architecture with sequential convolutional blocks, designed to extract local features from spectrogram representations of heart sounds. The BEATs transformer approach, on the other hand, leveraged a pre-trained audio transformer for embedding extraction combined with a k-Nearest-Neighbor (k-NN) classifier, offering a zero-shot classification pipeline without extensive fine-tuning on the PCG data.
Key Findings and Performance
The evaluation used a 10-fold cross-validation strategy, ensuring robust performance assessment and preventing data leakage. Metrics such as AUROC (Area Under the Receiver Operating Characteristic curve), MCC (Matthews Correlation Coefficient), Precision, Recall, and F2-Score were used to assess model performance, with a focus on clinical relevance for imbalanced datasets.
The results showed that the CNN model with fixed-length windowing achieved the highest AUROC of 79.5%, demonstrating superior discrimination ability. The CNN model with heart cycle normalization scored 75.4%. For the BEATs transformer, fixed-length windowing resulted in an AUROC of 65.7%, while heart cycle normalization improved its performance to 70.1%.
These findings highlight that physiological signal constraints, particularly those introduced by different normalization strategies, significantly impact model performance. While specialized CNNs generally showed superior accuracy, the zero-shot transformer models offered promising efficiency advantages, such as faster training and evaluation cycles, despite their lower classification accuracy. Interestingly, heart cycle normalization had a divergent effect: it slightly degraded CNN performance but improved BEATs model metrics. This suggests that pre-trained audio representations in BEATs might better capture physiological patterns when presented with cycle-normalized inputs, even with reduced data availability due to annotation inconsistencies.
Also Read:
- Normalcy Score: Enhancing Anomaly Detection by Quantifying Uncertainty
- Advancing Speech Emotion Recognition with Spectral Learning and Attention
Limitations and Future Directions
The study acknowledges several limitations, primarily the quality of PCG annotations in the dataset, where only 51.7% of recordings had complete heart cycle annotations. This limited the full potential of physiological signal normalization. Variability in recording quality across different clinical sites and the binary classification approach (murmur positive/negative) also presented challenges. Future research could focus on developing robust automated cycle detection methods, integrating signal quality assessment, exploring multi-class classification for specific heart conditions, and investigating ensemble methods combining both CNN and transformer approaches.
In conclusion, this research provides valuable insights into architecture selection for automated cardiac diagnostics, emphasizing the balance between accuracy and computational efficiency. It underscores the potential of these systems to enhance cardiac diagnostics and improve patient care, while also outlining a clear roadmap for future advancements in the field.


