TLDR: A new multimodal deep learning framework combines breast ultrasound images and patient clinical data to improve the preoperative classification of phyllodes tumors. This AI model, using a dual-branch neural network, significantly outperforms methods relying on a single data type, achieving high accuracy and F1-scores. The approach aims to reduce unnecessary surgeries by providing more accurate diagnoses of benign versus borderline/malignant phyllodes tumors.
Phyllodes tumors (PTs) are a rare type of breast lesion that often pose a significant challenge for doctors to classify accurately before surgery. These tumors can look very similar to common, harmless fibroadenomas on radiological scans, frequently leading to unnecessary surgical procedures. Such surgeries can result in complications, scarring, and increased healthcare costs, amounting to millions of dollars annually in the United States alone.
To address this critical issue, researchers have developed a new approach using a multimodal deep learning framework. This innovative system combines breast ultrasound (BUS) images with structured clinical data from patients to significantly improve the accuracy of diagnosing phyllodes tumors. The goal is to differentiate between benign (non-cancerous) and borderline/malignant (potentially cancerous) PTs more effectively, thereby reducing the need for unnecessary excisional biopsies.
The Multimodal Approach
The core of this new framework is a dual-branch neural network. This network is designed to process two different types of information simultaneously: visual features from ultrasound images and patient-specific metadata from clinical records. By integrating these two distinct data sources, the model can capture a more comprehensive picture of the tumor, leveraging complementary diagnostic cues that might not be apparent from a single data type.
The study utilized a dataset from 81 subjects at Massachusetts General Hospital, all with confirmed phyllodes tumors. This dataset included both ultrasound images and clinical information such as age, BMI, tumor size, race, menopausal status, and tumor echogenicity. To ensure robust and unbiased training, the researchers employed techniques like class-aware sampling to handle the imbalance between benign and borderline/malignant cases, and subject-stratified 5-fold cross-validation to prevent data leakage.
Key Findings and Performance
The results demonstrate that the proposed multimodal method consistently outperforms models that rely solely on either ultrasound images or clinical data. Among the various image encoders tested, ConvNeXt and ResNet18 showed the best performance within the multimodal setting. ConvNeXt achieved an impressive AUC-ROC score of 0.9427, while ResNet18 had the highest F1-score of 0.7294, indicating its strong ability to balance precision and recall, especially important in imbalanced datasets.
A comparison of different input modalities highlighted the benefits of fusion:
-
Clinical data alone: AUC of 0.7846
-
BUS image alone: AUC of 0.8919
-
Multimodal (BUS image + clinical data): AUC of 0.9427
This clearly shows that combining both types of data leads to a substantial improvement in diagnostic accuracy. While ultrasound images contributed more significantly to the predictions (63%), the clinical data played a crucial complementary role, aligning with how radiologists make real-world assessments.
Also Read:
- Enhancing Fetal Ultrasound Analysis with Adaptive Contrast Adjustment
- AI Pruning Method Boosts Fairness in Skin Lesion Diagnosis
Explainability and Future Directions
To enhance the trustworthiness of the AI model, the researchers used Score-CAM, a visualization technique that highlights the regions within ultrasound images that the model focuses on when making a decision. This revealed that for correctly classified malignant cases, the model paid attention to relevant tumor areas. In misclassified cases, the attention patterns were often diffuse or ambiguous, suggesting that visual cues alone can sometimes be insufficient, further justifying the need for multimodal reasoning.
While promising, the study acknowledges limitations, primarily the relatively small dataset, especially for borderline and malignant cases. Future work aims to expand the dataset, incorporate radiologist annotations, and explore advanced AI architectures to further improve interpretability and performance. Prospective validation in real-world clinical settings will be essential to assess the system’s generalizability and its practical utility in guiding biopsy decisions.
This research marks a significant step towards developing a non-invasive diagnostic tool that could reduce unnecessary biopsies and improve clinical decision-making in breast tumor management. For more details, you can read the full research paper here.


