TLDR: EAG-RL is a new two-stage training framework that uses reinforcement learning and guidance from expert EHR models to significantly improve large language models’ ability to reason with electronic health records. It achieves this by first generating high-quality, step-by-step reasoning paths and then refining the LLM’s attention to focus on clinically important features, leading to more accurate, robust, and generalizable clinical predictions like mortality and readmission.
Large Language Models (LLMs) have shown incredible promise in understanding medical text, but they often struggle with the complex, time-sensitive data found in Electronic Health Records (EHR). This limitation prevents them from making accurate and widely applicable clinical predictions, which are crucial for assisting doctors with diagnoses and treatment plans.
Current approaches often treat LLMs as simple information retrievers, relying on separate deep learning models for the actual predictions. While this works to some extent, it doesn’t truly enhance the LLM’s inherent ability to reason through medical cases, and it inherits the limitations of traditional models in adapting to different healthcare systems.
Introducing EAG-RL: A New Training Framework
To address this, researchers have proposed a novel two-stage training framework called EAG-RL (Expert-Attention Guided Reinforcement Learning). The core idea behind EAG-RL is to intrinsically improve how LLMs reason with EHR data by guiding them with insights from specialized expert EHR models.
The framework is inspired by how physicians think: they break down complex cases into smaller questions, gather evidence step-by-step, and focus on the most important clinical features. EAG-RL aims to teach LLMs to do the same.
Stage 1: Learning from Expert-Guided Paths
The first stage, called Expert-Guided Trajectory Distillation, focuses on teaching the LLM how to reason in a structured, step-by-step manner. It uses a technique called Monte Carlo Tree Search (MCTS), which is like a smart trial-and-error process, to explore different reasoning paths. This process is guided by an existing, highly accurate expert EHR model (like Concare) that can identify clinically important features. The LLM learns to generate sub-questions and answers, mimicking a doctor’s thought process.
During this stage, the LLM receives two types of feedback: a ‘classification reward’ for making accurate predictions, and an ‘attention alignment reward’ that measures how well the features the LLM focuses on match the features highlighted by the expert model. This helps the LLM learn not just to be correct, but to be correct for the right reasons.
Stage 2: Refining with Attention-Aligned Reinforcement Learning
The second stage, Attention-Aligned Policy Optimization, takes the LLM’s initial reasoning abilities and further refines them using reinforcement learning. This stage continues to use the combined reward system, encouraging the LLM to make accurate predictions while aligning its attention with clinically salient features identified by the expert model.
A key innovation in this stage is ‘Entropy-Aware Adaptive Up Clipping’. This mechanism helps the LLM explore and learn from less obvious but potentially very informative clinical patterns. It adaptively adjusts how much the model learns from different reasoning paths, giving more weight to those that are uncertain but could lead to valuable insights, preventing the model from getting stuck on only the most common features.
Promising Results in Real-World Scenarios
Extensive experiments were conducted on two real-world EHR datasets, MIMIC-IV and TJH, for tasks like predicting in-hospital mortality and patient readmission. EAG-RL consistently outperformed existing state-of-the-art methods, showing an average improvement of 14.62% across various models and tasks. This demonstrates that EAG-RL significantly enhances the LLM’s intrinsic ability to reason with EHR data.
Beyond just accuracy, EAG-RL also showed impressive robustness. It maintained strong performance even when the order of patient features was shuffled, which is a common challenge in real-world healthcare data due to varying data collection methods. This suggests that EAG-RL learns deeper, order-independent clinical reasoning strategies.
Furthermore, the framework demonstrated excellent generalization capabilities. When trained on one dataset (MIMIC-IV) and tested on another (TJH), EAG-RL still achieved superior results. This indicates that the model learns transferable clinical patterns rather than just memorizing dataset-specific quirks.
Also Read:
- Optimizing Medical Diagnosis with Adaptive AI Collaboration
- Enhancing Language Model Reasoning Through Targeted Exploration
Looking Ahead
The success of EAG-RL highlights its practical potential for deployment in real-world clinical prediction tasks. The researchers plan to explore even richer forms of supervision beyond just attention from expert models and to incorporate insights from multiple expert models to capture a wider range of clinical reasoning patterns. You can find more details about this research in the paper: Toward Better EHR Reasoning in LLMs: Reinforcement Learning with Expert Attention Guidance.


