TLDR: The Medico 2025 challenge focuses on developing Explainable AI (XAI) models for Visual Question Answering (VQA) in gastrointestinal imaging. It aims to create AI systems that not only accurately answer clinical questions based on endoscopy images but also provide clear, interpretable justifications aligned with medical reasoning. The challenge uses the Kvasir-VQA-x1 dataset and includes two subtasks: one for AI performance in VQA and another for generating clinician-oriented multimodal explanations, with human expert evaluation for the latter.
The field of Artificial Intelligence (AI) continues to make significant strides, particularly in healthcare. A new initiative, the Medico 2025 challenge, is set to push the boundaries of AI in gastrointestinal (GI) imaging, focusing on a crucial aspect often overlooked: explainability. Organized as part of the MediaEval tasks series, this challenge aims to develop AI models that can answer clinically relevant questions based on GI endoscopy images, while also providing clear, interpretable justifications that align with medical reasoning.
Gastrointestinal diseases are a major global health concern, with conditions like Colorectal Cancer requiring early diagnosis. While AI-driven systems show great promise in assisting clinicians, their ‘black-box’ nature often limits their adoption in clinical practice. This is where Explainable Artificial Intelligence (XAI) comes in. XAI methods aim to make AI decisions transparent, building trust and enabling healthcare professionals to understand why a system makes a particular diagnosis or recommendation.
The Medico 2025 challenge, building on previous successful Medico editions, specifically addresses Visual Question Answering (VQA) in GI imaging with a strong emphasis on multimodal explanations. Medical VQA combines computer vision and natural language processing to answer questions directly from medical images. The challenge encourages participants to develop models that not only provide accurate answers but also offer clear justifications, ensuring the reliability of AI-generated insights.
The challenge is structured into two main subtasks. Subtask 1, titled ‘AI Performance on Medical Image Question Answering,’ challenges participants to create AI models that accurately interpret and respond to clinical questions based on GI images. This subtask utilizes the Kvasir-VQA-x1 dataset, which is a substantial benchmark comprising 6,500 GI endoscopy images and an impressive 159,549 complex question-answer pairs. Questions in this dataset span six categories, including Yes/No, Single-Choice, Multiple-Choice, Color-Related, Location-Related, and Numerical Count, requiring models to process both visual and textual information. Performance is evaluated using standard language quality metrics such as BLEU, ROUGE, and METEOR, with assessments stratified by overall performance, question category, and complexity level.
Subtask 2, ‘Clinician-Oriented Multimodal Explanations in GI,’ builds directly on the first. Here, participants are required to justify their model’s predictions using multiple complementary forms of reasoning. The goal is to generate rich, multimodal explanations that are transparent, understandable, and trustworthy for clinicians. At a minimum, explanations must include a detailed textual narrative in clinical language that directly supports the predicted answer. Participants are also strongly encouraged to provide an accompanying visual explanation, such as a heatmap, segmentation mask, or bounding box, that clearly links to the textual reasoning and highlights the relevant findings. Optional confidence scores can also be included. Crucially, all outputs in Subtask 2 are human-evaluated by domain experts and medical professionals based on criteria like clarity, coherence between modalities, and medical relevance, ensuring the explanations truly support clinical decision-making.
The Kvasir-VQA-x1 dataset, central to this challenge, is an extension of the original Kvasir-VQA. It features GI endoscopic images from HyperKvasir and Kvasir-Instrument, with QA pairs stratified by reasoning complexity (Level 1 for single atomic QAs, Level 2 for two merged QAs, and Level 3 for synthesis across three atomic QAs). Each QA pair is also assigned one or more ‘question_class’ labels, such as polyp type or instrument presence, allowing for fine-grained analysis. The dataset is publicly available for researchers to access and use for reproducible experimentation.
Also Read:
- Making AI Accountable: Falsifying and Quantifying Explanations in Deep Learning
- Generative AI’s Impact on Medical Imaging: From Foundations to Clinical Use
The Medico 2025 challenge represents a significant step towards integrating powerful deep learning models into clinical settings. By emphasizing explainable VQA for GI imaging, it promotes the development of AI models that are not only accurate but also provide transparent justifications aligned with medical reasoning. This initiative fosters interdisciplinary collaboration between AI and medical communities, paving the way for clinically viable AI tools that are both trusted and actionable in real-world healthcare scenarios. For more detailed information, you can refer to the research paper: Medico 2025: Visual Question Answering for Gastrointestinal Imaging.


