spot_img
HomeResearch & DevelopmentAdvancing Multimodal Medical Image Classification with Synergistic Learning

Advancing Multimodal Medical Image Classification with Synergistic Learning

TLDR: This research introduces MICS, a novel ‘pretraining + fine-tuning’ framework for multimodal medical image classification, particularly effective with scarce expert-annotated data. It uses synergistic learning (consistency, reconstructive, and aligned learning) for robust feature representation during pre-training, and a distribution shift method with uncertainty-based evidential fusion during fine-tuning to improve accuracy and mitigate overfitting. Experiments on gastroscopy datasets show MICS outperforms existing methods, especially with limited labeled samples.

Medical imaging plays a crucial role in diagnosing and treating diseases. However, relying on a single type of medical image often provides limited information, making it challenging to fully understand complex conditions. Multimodal medical images, which combine different imaging techniques, offer a more comprehensive view, but effectively merging information from these diverse sources remains a significant challenge, especially when there’s a scarcity of expert-annotated data.

Traditional computer vision methods struggle with multimodal image analysis due to difficulties in fusing different modalities and the immense effort required for doctors to label data across multiple image types. To overcome these hurdles, researchers have developed a new approach called Multimodal Medical Image Classification via Synergistic Learning Pre-training (MICS).

A Novel Framework: Pre-training and Fine-tuning

MICS introduces a novel “pretraining + fine-tuning” framework designed for semi-supervised medical image classification, particularly effective when labeled data is scarce. The core idea is to treat one modality of an image as an enhanced version or ‘augmented sample’ of another, allowing the model to learn robust feature representations even without extensive expert annotations.

Synergistic Learning in Pre-training

The pre-training stage of MICS employs a synergistic learning framework that combines three key components:

  • Consistency Learning: This method ensures that the model learns consistent features from paired images across different modalities. It treats each modality as a distinct augmented sample, helping the model understand the underlying similarities.
  • Reconstructive Learning: Inspired by masked autoencoders, this component randomly masks parts of the original images. A unified decoder then reconstructs these images from features extracted by different encoders, helping the model focus on local details and interactions between modalities.
  • Aligned Learning: This part forces the features from different modalities to become more similar in a high-dimensional space. By using instance-level contrastive learning, it enhances the model’s ability to recognize paired image representations.

Together, these learning strategies significantly boost the baseline model’s ability to extract meaningful features from multimodal medical images in a self-supervised manner, meaning it learns from unlabeled data.

Multimodal Fusion During Fine-tuning

After pre-training, the model enters the fine-tuning stage, where it learns to fuse the features from different modalities for classification. A dedicated multimodal fusion encoder is used to combine features extracted from the original modalities. To address the problem of overfitting and prediction uncertainty caused by limited labeled samples, MICS introduces a unique distribution shift method.

This method involves creating a Shift Vector Dictionary (SVD) from the pre-trained encoders. The SVD generates ‘shift vectors’ that perturb the fused features. These perturbed features act as implicit augmentations, expanding the training data and helping the model better understand the relationship between fused and original modalities, thereby reducing overfitting risks.

Enhancing Reliability with Evidential Fusion

Further enhancing the classification reliability, MICS incorporates an uncertainty-based evidential fusion method. This technique, adapted from Trusted Multi-View Classification (TMC), combines the ‘beliefs’ and ‘uncertainties’ from different modalities at the decision level. This means the model not only fuses features but also considers the confidence of its predictions from each modality, leading to more robust and trustworthy diagnoses.

Also Read:

Experimental Validation

The researchers conducted extensive experiments on publicly available gastroscopy image datasets, Kvasir and Kvasirv2. These datasets included white light images and paired narrow-band images generated by an algorithm. The results demonstrated that MICS significantly outperforms current state-of-the-art classification methods, especially when only a small percentage of labeled data (e.g., 5%) is available. Qualitative results, such as UMAP feature visualizations, showed that MICS could better differentiate between various gastric conditions and anatomical structures, even distinguishing between lesion locations and the lesions themselves.

In conclusion, MICS offers a promising solution for multimodal medical image classification, particularly in scenarios with limited expert annotations. By synergistically pre-training on unlabeled data and employing intelligent fusion and distribution shift techniques during fine-tuning, it enhances feature representation and classification accuracy. The full research paper can be accessed here.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -