TLDR: ProMed is a reinforcement learning framework designed to transform reactive medical Large Language Models (LLMs) into proactive ones. It enables LLMs to ask clinically valuable questions before making diagnoses by using a novel Shapley Information Gain (SIG) reward to quantify question utility. Through a two-stage training pipeline involving SIG-guided Monte Carlo Tree Search for initialization and SIG-augmented policy optimization, ProMed significantly improves diagnostic accuracy and generalizes robustly, outperforming existing methods and demonstrating the necessity of targeted training for interactive medical AI.
In the evolving landscape of artificial intelligence, Large Language Models (LLMs) have shown remarkable capabilities across various domains. However, when it comes to critical fields like medicine, their traditional ‘reactive’ approach poses significant risks. Imagine a doctor who, given only a few initial symptoms, immediately jumps to a diagnosis without asking any follow-up questions. This is often how current medical LLMs operate, leading to potential misdiagnoses due to insufficient information. The core challenge lies in transitioning these powerful AI tools from merely answering based on initial input to proactively seeking out crucial additional information, much like a human physician would during a clinical consultation.
Introducing ProMed: A Framework for Proactive Medical AI
To address this vital limitation, researchers have introduced ProMed, a novel framework that leverages reinforcement learning (RL) to empower medical LLMs with the ability to ask clinically valuable questions before making a diagnosis. This marks a significant shift from the reactive paradigm to a proactive one, aiming to enhance diagnostic accuracy and patient safety.
At the heart of ProMed is a sophisticated reward mechanism called Shapley Information Gain (SIG). Unlike simpler methods that might treat all pieces of information equally, SIG quantifies the true clinical utility of each question. It does this by combining the amount of newly acquired information with its contextual importance, estimated using Shapley values from cooperative game theory. This approach is crucial because, in medicine, facts are rarely isolated; their value often emerges when combined with others, and some facts are inherently more diagnostically significant than others. SIG ensures that the AI prioritizes questions that are both novel and clinically impactful.
A Two-Stage Training Approach
ProMed integrates SIG into a robust two-stage training pipeline:
1. SIG-Guided Model Initialization: In this initial stage, ProMed uses a technique called Monte Carlo Tree Search (MCTS). Guided by the SIG reward, MCTS systematically explores and constructs high-reward interaction trajectories – essentially, optimal doctor-patient dialogue paths. These high-quality trajectories then serve as supervision to initialize the LLM, teaching it effective information-seeking behaviors from the outset. This step is vital for mitigating instability and poor convergence often seen in reinforcement learning with weak initial policies, and it also helps overcome the scarcity of high-quality medical interaction data.
2. SIG-Augmented Policy Optimization: Building on the initialized model, this stage further refines the LLM’s proactive abilities using reinforcement learning. ProMed enhances a method called Group Relative Policy Optimization (GRPO) with a novel SIG-guided Reward Distribution Mechanism. While standard GRPO might assign uniform rewards to all parts of a conversation, ProMed’s mechanism allocates rewards proportionally to each question’s clinical utility. This means that questions that elicit more valuable information receive higher rewards, enabling more targeted and fine-grained optimization that truly reinforces the LLM’s proactive information-gathering strategy.
Demonstrated Superiority and Generalization
Extensive experiments were conducted on two newly curated medical benchmarks, MedQA and CMB, which simulate partial-information clinical scenarios. ProMed consistently outperformed a wide range of existing methods, including prompt-based and other fine-tuning approaches. On average, ProMed delivered a significant 6.29% improvement over state-of-the-art methods and a striking 54.45% gain compared to the reactive paradigm where LLMs answer immediately without seeking more information. The research also demonstrated ProMed’s robust generalization to out-of-domain cases, meaning its proactive abilities are transferable and not just overfitted to specific training data.
Furthermore, when compared against leading open-source medical LLMs like HuatuoGPT-o1-7B and OpenBioLLM-8B, ProMed-optimized models, even at comparable or smaller scales, showed superior interactive reasoning. This highlights that specialized medical pretraining alone does not equip LLMs with robust interactive diagnostic abilities; targeted training with frameworks like ProMed is essential for enabling clinically valuable interactions.
Also Read:
- Unlocking Better Clinical Predictions with Advanced AI Training
- Optimizing Medical Diagnosis with Adaptive AI Collaboration
The Future of Medical AI
ProMed represents a significant step forward in making medical LLMs more reliable and effective for clinical consultations. By enabling these AI systems to proactively ask relevant questions and gather necessary information, the framework addresses a critical safety concern in AI-driven healthcare: the risk of misdiagnosis due to incomplete data. This research paves the way for future advancements in AI that can genuinely assist medical professionals by engaging in more human-like, information-seeking dialogues. For more detailed information, you can refer to the original research paper.


