TLDR: A new AI model combines CNN and Transformer networks to accurately classify pediatric lung sounds from scalogram images, outperforming previous methods in diagnosing respiratory diseases in children under six. This multi-stage hybrid framework offers a promising solution for scalable and objective respiratory health assessment, especially in resource-limited settings, by effectively handling data imbalance and enhancing feature discrimination.
Diagnosing respiratory diseases in children, especially those under six, presents unique challenges. Traditional methods like lung auscultation, where doctors listen to breathing sounds with a stethoscope, are simple and cost-effective but highly depend on the clinician’s experience and can be inconsistent. This is particularly problematic in areas with limited access to skilled healthcare professionals. To address this, researchers have been exploring automated analysis of lung sounds using artificial intelligence (AI).
A new study introduces a sophisticated AI framework designed specifically for pediatric lung sound classification. This innovative system, called a multi-stage hybrid CNN-Transformer network, aims to provide accurate and consistent diagnoses, bridging the gap between event-level precision and overall recording-level reliability.
The Challenge of Pediatric Lung Sounds
Children’s developing lungs have different acoustic properties compared to adults, making their respiratory sounds unique and requiring specialized diagnostic approaches. Furthermore, a significant hurdle in developing AI systems for this age group has been the scarcity of publicly available datasets. The recent release of the SPRSound dataset, specifically curated for pediatric patients, has been a crucial step forward.
How the New AI Model Works
The proposed model transforms lung sound recordings into visual representations called scalogram images. These images are then fed into a two-part AI system:
- Feature Extraction: It uses MobileNetV2, a lightweight Convolutional Neural Network (CNN), to efficiently extract important features from the scalogram images. CNNs are excellent at identifying patterns in visual data.
- Feature Emphasizing: A Transformer-based self-attention mechanism then refines these extracted features. Unlike traditional CNNs that focus on local patterns, the Transformer captures global relationships across the sound’s temporal and spectral dimensions, helping the model to prioritize the most informative parts of the lung sounds.
To tackle the common issue of data imbalance in medical datasets (where normal sounds are far more common than abnormal ones), the model incorporates a special ‘class-weighted sparse categorical focal loss’ function. This function helps the AI to focus more on the harder-to-classify, rarer abnormal sounds, improving its ability to detect critical conditions.
Impressive Performance
The research team conducted extensive experiments, comparing their model against existing state-of-the-art systems. The results were highly promising:
- For classifying individual breath events (e.g., normal vs. adventitious sounds, or specific sounds like wheeze, crackle, rhonchi), the model achieved overall scores of 0.9039 and 0.8448 respectively.
- At the recording level (classifying the entire respiratory recording), the model attained scores of 0.720 for ternary classification (Normal, Adventitious, Poor Quality) and 0.571 for multiclass classification (Normal, Continuous Adventitious Sounds, Discontinuous Adventitious Sounds, CAS & DAS, or Poor Quality).
These scores represent a significant improvement, outperforming previous best models by 3.81% and 5.94% respectively, demonstrating the model’s superior accuracy and robustness.
An analysis of the model’s internal workings showed that the feature-enhancing block significantly improved the separation of different lung sound classes in the AI’s understanding, making it better at distinguishing between various conditions.
Also Read:
- MedSymmFlow: Enhancing Medical Imaging with Integrated AI Capabilities
- Advancing Chest X-Ray Diagnosis for Rare Diseases with CXR-CML
Impact and Future Directions
This AI-powered approach offers a promising solution for scalable pediatric respiratory disease diagnosis, particularly valuable in resource-limited settings where access to specialized care is scarce. By providing objective and repeatable assessments, it can reduce reliance on expert interpretation and support clinical decision-making.
While the model shows great potential, the researchers acknowledge limitations, including the need for more diverse and larger datasets, and further optimization for real-time performance in clinical environments. Future work will explore integrating patient clinical history, symptoms, and even radiological images for a more holistic and personalized diagnostic approach.
For more technical details, you can refer to the full research paper: A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification.


