Advancing Multimodal Medical Image Classification with Synergistic Learning

TLDR: This research introduces MICS, a novel ‘pretraining + fine-tuning’ framework for multimodal medical image classification, particularly effective with scarce expert-annotated data. It uses synergistic learning (consistency, reconstructive, and aligned learning) for robust feature representation during pre-training, and a distribution shift method with uncertainty-based evidential fusion during fine-tuning to improve accuracy and mitigate overfitting. Experiments on gastroscopy datasets show MICS outperforms existing methods, especially with limited labeled samples.

Medical imaging plays a crucial role in diagnosing and treating diseases. However, relying on a single type of medical image often provides limited information, making it challenging to fully understand complex conditions. Multimodal medical images, which combine different imaging techniques, offer a more comprehensive view, but effectively merging information from these diverse sources remains a significant challenge, especially when there’s a scarcity of expert-annotated data.

Traditional computer vision methods struggle with multimodal image analysis due to difficulties in fusing different modalities and the immense effort required for doctors to label data across multiple image types. To overcome these hurdles, researchers have developed a new approach called Multimodal Medical Image Classification via Synergistic Learning Pre-training (MICS).

A Novel Framework: Pre-training and Fine-tuning

MICS introduces a novel “pretraining + fine-tuning” framework designed for semi-supervised medical image classification, particularly effective when labeled data is scarce. The core idea is to treat one modality of an image as an enhanced version or ‘augmented sample’ of another, allowing the model to learn robust feature representations even without extensive expert annotations.

Synergistic Learning in Pre-training

The pre-training stage of MICS employs a synergistic learning framework that combines three key components:

Consistency Learning: This method ensures that the model learns consistent features from paired images across different modalities. It treats each modality as a distinct augmented sample, helping the model understand the underlying similarities.
Reconstructive Learning: Inspired by masked autoencoders, this component randomly masks parts of the original images. A unified decoder then reconstructs these images from features extracted by different encoders, helping the model focus on local details and interactions between modalities.
Aligned Learning: This part forces the features from different modalities to become more similar in a high-dimensional space. By using instance-level contrastive learning, it enhances the model’s ability to recognize paired image representations.

Together, these learning strategies significantly boost the baseline model’s ability to extract meaningful features from multimodal medical images in a self-supervised manner, meaning it learns from unlabeled data.

Multimodal Fusion During Fine-tuning

After pre-training, the model enters the fine-tuning stage, where it learns to fuse the features from different modalities for classification. A dedicated multimodal fusion encoder is used to combine features extracted from the original modalities. To address the problem of overfitting and prediction uncertainty caused by limited labeled samples, MICS introduces a unique distribution shift method.

This method involves creating a Shift Vector Dictionary (SVD) from the pre-trained encoders. The SVD generates ‘shift vectors’ that perturb the fused features. These perturbed features act as implicit augmentations, expanding the training data and helping the model better understand the relationship between fused and original modalities, thereby reducing overfitting risks.

Enhancing Reliability with Evidential Fusion

Further enhancing the classification reliability, MICS incorporates an uncertainty-based evidential fusion method. This technique, adapted from Trusted Multi-View Classification (TMC), combines the ‘beliefs’ and ‘uncertainties’ from different modalities at the decision level. This means the model not only fuses features but also considers the confidence of its predictions from each modality, leading to more robust and trustworthy diagnoses.

Also Read:

Experimental Validation

The researchers conducted extensive experiments on publicly available gastroscopy image datasets, Kvasir and Kvasirv2. These datasets included white light images and paired narrow-band images generated by an algorithm. The results demonstrated that MICS significantly outperforms current state-of-the-art classification methods, especially when only a small percentage of labeled data (e.g., 5%) is available. Qualitative results, such as UMAP feature visualizations, showed that MICS could better differentiate between various gastric conditions and anatomical structures, even distinguishing between lesion locations and the lesions themselves.

In conclusion, MICS offers a promising solution for multimodal medical image classification, particularly in scenarios with limited expert annotations. By synergistically pre-training on unlabeled data and employing intelligent fusion and distribution shift techniques during fine-tuning, it enhances feature representation and classification accuracy. The full research paper can be accessed here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Advancing Multimodal Medical Image Classification with Synergistic Learning

A Novel Framework: Pre-training and Fine-tuning

Synergistic Learning in Pre-training

Multimodal Fusion During Fine-tuning

Enhancing Reliability with Evidential Fusion

Experimental Validation

Gen AI News and Updates

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates