MEDGELLAN: Enhancing Medical Diagnosis with AI-Generated Clinical Guidance

TLDR: MEDGELLAN is a novel, lightweight framework that uses a Large Language Model (LLM) to generate structured clinical guidance from raw medical records (triage notes, radiology reports). This guidance, created using a Bayesian-inspired prompting strategy, assists physicians in making more accurate diagnoses. Preliminary experiments show that MEDGELLAN improves diagnostic performance, particularly in recall and F1 score, by providing comprehensive insights without requiring LLM fine-tuning or extensive annotations. It represents a significant step towards effective human-AI collaboration in high-stakes medical decision-making.

Medical diagnosis is a complex and critical task where accuracy is paramount. While fully automated systems for diagnosis are still a distant reality due to the high stakes involved, researchers are exploring hybrid approaches that combine the power of artificial intelligence with human oversight. A new framework called MEDGELLAN proposes an innovative way to assist physicians in this crucial process by leveraging Large Language Models (LLMs) to generate clinical guidance.

MEDGELLAN is designed as a lightweight and efficient system that doesn’t require extensive data annotation or fine-tuning of the underlying LLMs. Its core idea is to use an LLM to synthesize raw medical records, such as triage notes and radiology reports, into structured, evidence-weighted summaries that can then inform a physician’s diagnostic decisions.

How MEDGELLAN Works

The framework operates in two main modules. The first module involves an ‘ASSISTANT LLM’ which takes in patient data. It processes this information using a unique Bayesian-inspired prompting strategy. This means it first considers the triage note as ‘prior knowledge’ to form an initial clinical suspicion. Then, it incorporates the radiology report as ‘new evidence,’ updating its reasoning and confidence levels. This temporal ordering ensures that the LLM’s thought process mirrors how a physician would typically evaluate information, leading to a coherent and uncertainty-aware summary of the patient’s condition.

Crucially, the ASSISTANT LLM does not provide a final diagnosis or medical codes. Instead, its role is to generate comprehensive guidance, highlighting key abnormalities, trends, and risk factors with qualitative confidence levels (e.g., high, moderate, low likelihood). This guidance is then passed to the second module, where a ‘PHYSICIAN LLM’ (simulating a real doctor for experimental purposes) uses only this guidance to make the final diagnosis, represented by ICD-10 codes at the chapter and category levels.

Experimental Insights

To evaluate MEDGELLAN’s effectiveness, the researchers combined data from several MIMIC datasets (MIMIC-CXR, MIMIC-IV-ED, and MIMIC-IV) which provide chest radiographs, emergency department information, and patient diagnoses, respectively. They simulated the physician’s role using various state-of-the-art LLMs, including Llama 3, Gemma 2, and Qwen2.

The performance of MEDGELLAN was compared against two baselines: one where the PHYSICIAN LLM only received the triage note, and another where it received both the triage note and radiology report but without the intermediate guidance generated by MEDGELLAN. The results showed that incorporating MEDGELLAN’s guidance consistently improved diagnostic performance, particularly in terms of recall and F1 score. While there was a slight decrease in precision, the gains in recall and F1 are significant in medical contexts, as they indicate the model’s ability to be more comprehensive in identifying conditions and reducing the risk of missing a diagnosis (false negatives).

In essence, MEDGELLAN demonstrates that well-crafted, LLM-generated guidance can significantly enhance the quality of predicted diagnoses, offering a practical solution for human-AI collaboration in healthcare. The research paper can be found here: MEDGELLAN Research Paper.

Also Read:

Future Directions

The researchers plan to extend this work by investigating the impact of MEDGELLAN’s guidance when presented to actual human physicians. They also aim to incorporate rich non-textual information, such as radiology images themselves, into the framework, further enhancing its capabilities in supporting medical decision-making.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

MEDGELLAN: Enhancing Medical Diagnosis with AI-Generated Clinical Guidance

How MEDGELLAN Works

Experimental Insights

Future Directions

Gen AI News and Updates

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

MAKER System Achieves Million-Step LLM Task with Perfect Accuracy

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates