TLDR: A new research paper introduces a novel method for training AI models for lung ultrasound segmentation by incorporating expert-supplied, per-pixel confidence values during annotation. This approach, which models the inherent uncertainty in medical imaging, significantly improves segmentation accuracy and, more importantly, enhances performance on critical downstream clinical tasks such as S/F ratio estimation and 30-day patient readmission prediction. The study found that training models with a 60% confidence threshold yielded the most optimal results, demonstrating the value of treating label confidence as a crucial signal for AI in healthcare.
In the complex world of medical imaging, interpreting scans can often be subjective, leading to variations in how different experts label the same image. This ‘label uncertainty’ is particularly challenging in modalities like lung ultrasound (LUS), where images can contain a mix of clear and ambiguous regions. Traditional artificial intelligence (AI) models often treat these labels as absolute truths, overlooking the inherent doubt or confidence an expert might have in a particular annotation.
A recent research paper introduces a groundbreaking approach to tackle this issue by incorporating expert-supplied, per-pixel confidence values directly into both the labeling process and the training of AI models. Instead of simply marking a region as ‘present’ or ‘absent’, radiologists assign a confidence score to each pixel, reflecting their certainty. This method aims to model the natural uncertainty found in real-world clinical data.
A New Way to Label and Train
The core idea is to move beyond binary (yes/no) annotations. By capturing how confident an expert is about each pixel in a segmented region, the AI model can learn to better understand the nuances of medical images. The researchers developed a data annotation protocol that allows clinicians to express their confidence, for example, on a scale from 0 to 100 percent, for features like pleural lines, fascia bands, and vertical lines (B-lines) in lung ultrasound images.
The study utilized an in-house dataset of lung ultrasound videos from 42 patients with Congestive Heart Failure (CHF). CHF often leads to fluid accumulation in the lungs, creating distinct patterns visible in LUS. The first frame of each video was manually segmented by expert clinicians, who also assigned confidence scores to each pixel within the segmented regions.
Impact on AI Performance
The researchers trained a Feature Pyramid Network (FPN) segmentation model using these confidence-aware labels. They explored various ‘confidence thresholds’ (e.g., 0%, 20%, 60%, 100%), where pixels with confidence at or above the threshold were considered positive. They found that incorporating these confidence values during training significantly improved segmentation performance, especially when focusing on confidently labeled pixels.
More importantly, this enhanced segmentation quality translated into better performance on crucial downstream clinical tasks. The study evaluated three such tasks:
- S/F Ratio Change Prediction: Predicting whether a patient’s oxygenation ratio (S/F ratio) increased, decreased, or stayed the same between two LUS videos.
- S/F Ratio Estimation: Estimating a patient’s S/F ratio on a specific day, combining information from multiple lung views.
- 30-Day CHF Readmission Prediction: Predicting whether a patient would be readmitted to the hospital within 30 days of discharge.
Across these tasks, models trained with higher confidence thresholds generally outperformed those trained with conventional labels (0% threshold) or a naive 50% threshold. Specifically, the model trained with a 60% confidence threshold consistently achieved the best results in estimating S/F ratios and predicting CHF readmission, and showed improved trends in S/F change prediction.
Why 60% Confidence Matters
The findings suggest that training AI models on pixels where experts are highly confident (e.g., 60% or more) leads to more reliable and clinically useful predictions. This indicates that including too many uncertain features in segmentations might be detrimental to diagnostic accuracy. The 60% threshold appears to strike an optimal balance, allowing the AI to focus on the most reliable visual information.
Also Read:
- Uncertainty in AI: The Role of Data Augmentation in Diabetic Retinopathy Prediction
- Precise Cell Segmentation in Brightfield Microscopy Using a Novel AI Model
Looking Ahead
While the results are promising, the researchers acknowledge limitations, primarily the relatively small dataset size, which led to larger error bars in some results. Future work will involve larger, more diverse datasets and the development of standardized protocols for clinicians to assign confidence levels, ensuring greater consistency and generalizability.
This innovative approach highlights that treating label uncertainty as a valuable signal, rather than noise, can significantly enhance the reliability and clinical utility of AI in medical imaging. By leveraging expert confidence, AI models can become more attuned to the complexities of human interpretation, ultimately leading to better patient outcomes. You can read the full research paper here.


