TLDR: MEDGELLAN is a novel, lightweight framework that uses a Large Language Model (LLM) to generate structured clinical guidance from raw medical records (triage notes, radiology reports). This guidance, created using a Bayesian-inspired prompting strategy, assists physicians in making more accurate diagnoses. Preliminary experiments show that MEDGELLAN improves diagnostic performance, particularly in recall and F1 score, by providing comprehensive insights without requiring LLM fine-tuning or extensive annotations. It represents a significant step towards effective human-AI collaboration in high-stakes medical decision-making.
Medical diagnosis is a complex and critical task where accuracy is paramount. While fully automated systems for diagnosis are still a distant reality due to the high stakes involved, researchers are exploring hybrid approaches that combine the power of artificial intelligence with human oversight. A new framework called MEDGELLAN proposes an innovative way to assist physicians in this crucial process by leveraging Large Language Models (LLMs) to generate clinical guidance.
MEDGELLAN is designed as a lightweight and efficient system that doesn’t require extensive data annotation or fine-tuning of the underlying LLMs. Its core idea is to use an LLM to synthesize raw medical records, such as triage notes and radiology reports, into structured, evidence-weighted summaries that can then inform a physician’s diagnostic decisions.
How MEDGELLAN Works
The framework operates in two main modules. The first module involves an ‘ASSISTANT LLM’ which takes in patient data. It processes this information using a unique Bayesian-inspired prompting strategy. This means it first considers the triage note as ‘prior knowledge’ to form an initial clinical suspicion. Then, it incorporates the radiology report as ‘new evidence,’ updating its reasoning and confidence levels. This temporal ordering ensures that the LLM’s thought process mirrors how a physician would typically evaluate information, leading to a coherent and uncertainty-aware summary of the patient’s condition.
Crucially, the ASSISTANT LLM does not provide a final diagnosis or medical codes. Instead, its role is to generate comprehensive guidance, highlighting key abnormalities, trends, and risk factors with qualitative confidence levels (e.g., high, moderate, low likelihood). This guidance is then passed to the second module, where a ‘PHYSICIAN LLM’ (simulating a real doctor for experimental purposes) uses only this guidance to make the final diagnosis, represented by ICD-10 codes at the chapter and category levels.
Experimental Insights
To evaluate MEDGELLAN’s effectiveness, the researchers combined data from several MIMIC datasets (MIMIC-CXR, MIMIC-IV-ED, and MIMIC-IV) which provide chest radiographs, emergency department information, and patient diagnoses, respectively. They simulated the physician’s role using various state-of-the-art LLMs, including Llama 3, Gemma 2, and Qwen2.
The performance of MEDGELLAN was compared against two baselines: one where the PHYSICIAN LLM only received the triage note, and another where it received both the triage note and radiology report but without the intermediate guidance generated by MEDGELLAN. The results showed that incorporating MEDGELLAN’s guidance consistently improved diagnostic performance, particularly in terms of recall and F1 score. While there was a slight decrease in precision, the gains in recall and F1 are significant in medical contexts, as they indicate the model’s ability to be more comprehensive in identifying conditions and reducing the risk of missing a diagnosis (false negatives).
In essence, MEDGELLAN demonstrates that well-crafted, LLM-generated guidance can significantly enhance the quality of predicted diagnoses, offering a practical solution for human-AI collaboration in healthcare. The research paper can be found here: MEDGELLAN Research Paper.
Also Read:
- Revolutionizing Medical Diagnosis: How AI’s KERAP Framework Offers Accurate Zero-Shot Predictions
- AI Models Streamline Clinical Data Standardization with HL7 FHIR
Future Directions
The researchers plan to extend this work by investigating the impact of MEDGELLAN’s guidance when presented to actual human physicians. They also aim to incorporate rich non-textual information, such as radiology images themselves, into the framework, further enhancing its capabilities in supporting medical decision-making.


