MEGAN: Enhancing AI Confidence in Ulcerative Colitis Assessment

TLDR: MEGAN is a novel AI framework that improves the reliability of medical AI, specifically for assessing Ulcerative Colitis severity from endoscopy videos. It addresses the challenge of inter-rater variability by combining predictions and uncertainty estimates from multiple AI ‘experts’ (Evidential Deep Learning models trained on diverse data) using a sophisticated ‘gating network.’ This approach significantly boosts prediction accuracy (F1-score) and, crucially, provides more trustworthy uncertainty estimates (reduced Expected Calibration Error), helping to identify cases where AI is confident versus those needing human review, thereby reducing expert workload in clinical trials.

Artificial intelligence is rapidly transforming various fields, and medicine is no exception. From diagnosing diseases to assisting in surgical procedures, AI promises to enhance efficiency and accuracy. However, a critical challenge in medical AI is ensuring its reliability, especially when dealing with complex and subjective assessments. This is where the concept of ‘uncertainty quantification’ (UQ) becomes vital – understanding when an AI model is confident in its prediction and when it’s not.

In medical image analysis, particularly in areas like endoscopy, deep learning models often make predictions with high confidence, even when there’s inherent ambiguity in clinical assessments. A significant issue is the ‘inter-rater variability,’ meaning different human experts might interpret the same medical data differently. Traditional AI training often relies on a single expert’s opinion as the ‘ground truth,’ overlooking this natural disagreement among clinicians. This can lead to AI models being overly confident, even when human experts would disagree.

Existing methods for uncertainty quantification, such as Monte Carlo Dropout and Deep Ensembles, can be computationally intensive and slow, making them less suitable for real-time clinical use. Evidential Deep Learning (EDL) offers a more efficient alternative, estimating both prediction confidence and uncertainty in a single pass. However, even EDL typically trains on single-expert annotations, limiting its ability to handle the real-world variability found in medical practice.

Introducing MEGAN: A Multi-Expert Approach to Uncertainty

To tackle these challenges, researchers have developed MEGAN (Multi-Expert Gating Network), a novel framework designed to provide robust uncertainty estimates in medical AI, specifically for endoscopy videos. MEGAN’s core innovation lies in its ability to aggregate predictions and uncertainty estimates from multiple AI ‘experts.’ Each of these experts is an EDL model trained with different ground truths (e.g., annotations from various clinicians) and diverse modeling strategies.

Think of it like a panel of specialist doctors. Instead of relying on just one doctor’s opinion, MEGAN brings together several AI ‘doctors,’ each with their own perspective. A smart ‘gating network’ then optimally combines their individual predictions and uncertainties. This process helps MEGAN to reduce the impact of inter-rater variability, leading to more accurate predictions and, crucially, better-calibrated uncertainty estimates – meaning the AI’s confidence levels are more aligned with its actual accuracy.

How MEGAN Works

MEGAN operates in two main stages. First, individual EDL models are trained independently. These models learn to assess disease severity and quantify their own uncertainty. For instance, in the context of Ulcerative Colitis (UC) severity estimation using the Mayo Endoscopic Subscore (MES), these EDL models might be trained on annotations from different central or local readers, or with varying architectural configurations.

Once these individual EDL experts are trained, their weights are frozen. Then, MEGAN’s lightweight ‘gating network’ comes into play. This network learns to assign optimal weights to each expert’s predictions and uncertainties, effectively deciding how much to trust each AI expert for a given case. This dynamic weighting allows MEGAN to leverage the strengths of multiple experts, leading to a more refined and reliable overall assessment.

Evaluating MEGAN on Ulcerative Colitis

The MEGAN framework was extensively evaluated on endoscopy videos for Ulcerative Colitis (UC) disease severity estimation. UC is a chronic inflammatory bowel disease, and its severity is assessed using the Mayo Endoscopic Subscore (MES), which is known for its high inter-rater variability among gastroenterologists. Accurate MES estimation is critical for patient enrollment and treatment efficacy in clinical trials.

In large-scale prospective UC clinical trials, MEGAN demonstrated significant improvements. Compared to existing methods, MEGAN achieved a 3.5% improvement in F1-score (a measure of prediction accuracy) and a substantial 30.5% reduction in Expected Calibration Error (ECE), indicating much better uncertainty calibration. This means MEGAN is not only more accurate but also more trustworthy in its confidence assessments.

Beyond just improving scores, MEGAN also facilitated ‘uncertainty-guided sample stratification.’ This practical application allows the system to identify cases where it is highly confident versus those where it is uncertain. By flagging uncertain cases for expert review, MEGAN can significantly reduce the annotation burden on human clinicians, potentially increasing efficiency and consistency in UC trials. For confident cases, MEGAN’s predictions were even more accurate than the consensus rating of human experts, while it successfully identified difficult cases that truly needed further expert evaluation.

Also Read:

Looking Ahead

The introduction of MEGAN represents a significant step forward in making medical AI more robust and reliable, especially in subjective domains like endoscopy. By effectively capturing and aggregating multi-expert uncertainty, MEGAN enhances prediction accuracy and provides better-calibrated uncertainty estimates. This framework has the potential to extend beyond UC, offering a valuable tool for broader clinical decision support systems. For more detailed information, you can refer to the full research paper: MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

MEGAN: Enhancing AI Confidence in Ulcerative Colitis Assessment

Introducing MEGAN: A Multi-Expert Approach to Uncertainty

How MEGAN Works

Evaluating MEGAN on Ulcerative Colitis

Looking Ahead

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Get Well and RhythmX AI Unite to Form GW RhythmX, Pioneering AI-Native Healthcare Intelligence

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates