AI Streamlines Radiology Reports with Personalized LLM Impressions

TLDR: A new “Coarse-to-Fine” AI framework uses open-source Large Language Models (LLMs) to automatically generate and personalize the “Impression” section of radiology reports. This system first creates a draft and then refines it using machine learning and human feedback to match individual radiologist styles and ensure accuracy. It aims to reduce radiologist burnout and improve reporting efficiency while maintaining high clinical precision.

The demanding task of manually creating the “Impression” section in radiology reports is a significant contributor to radiologist burnout. This crucial part of a report summarizes clinical findings and guides referring physicians, but its creation is complex, time-consuming, and requires high personalization and domain-specific language. To address this, researchers have introduced a novel “Coarse-to-Fine” framework that leverages open-source Large Language Models (LLMs) to automate and personalize these impressions.

A New Approach to Radiology Reporting

The proposed framework aims to significantly reduce the administrative workload on radiologists and enhance reporting workflows while maintaining high standards of clinical precision. Unlike general-purpose LLMs, which often lack the specialized vocabulary, style, and clinical nuances required for medical reporting, this new system is designed for fine-grained control over content and structure, ensuring consistency and alignment with medical standards.

The Coarse-to-Fine framework operates in two main stages. It begins with a “coarse-grained” summary of the clinical findings, capturing essential information. This initial draft is then iteratively refined through a “fine-grained” customization process. This refinement incorporates patient-specific context, ensures clinical precision, and aligns the output with individual radiologists’ stylistic preferences. Reinforcement Learning from Human Feedback (RLHF) is a key component in this stage, ensuring the generated impressions are factually accurate and tailored to the needs of both clinicians and patients.

Under the Hood: Models and Data

The research involved fine-tuning prominent open-source LLMs, specifically LLaMA and Mistral models, on a vast dataset of 957,134 de-identified radiology reports from the University of Chicago Medicine. This extensive dataset, curated over 12 years, provides a rich source of clinical information, detailed findings, and concise impressions, making it ideal for training LLMs for summarization tasks in the medical domain.

During the model selection phase, LLaMA-3.1-8b consistently outperformed other models like Gemma-2-9b and Mistral-7b across various metrics, including ROUGE, BLEU, and BERTScore, which measure syntactic similarity, lexical accuracy, and semantic similarity, respectively. While Mistral-7b slightly edged out LLaMA-3.1-8b in factual consistency, LLaMA-3.1-8b demonstrated the most balanced performance overall, making it the chosen base model for the framework.

The model’s training involved parameter-efficient fine-tuning (PEFT) using Low-Rank Adaptation (LoRA), a technique that allows efficient adaptation to new tasks with minimal computational overhead. This approach, combined with Supervised Fine-Tuning (SFT), enabled the model to learn from domain-specific radiology datasets and generalize effectively even with limited examples.

Personalization and Evaluation

A key feature of the Coarse-to-Fine framework is its ability to generate personalized impressions tailored to different target audiences. This is achieved through a sophisticated prompt engineering strategy that allows for three types of summaries:

Brief Summarization: Simplified for non-English speakers.
Bullet Point Summarization: Concise insights for quick review.
Comprehensive Summarization: Detailed summaries for experts.

The effectiveness of the framework was rigorously evaluated. Human assessments involving radiologists from UC Medicine and an independent board-certified radiologist were conducted. Out of 200 generated reports, 79.5% received either “neutral” or “positive” ratings, indicating that the AI-generated impressions were considered by radiologists to be at least as accurate as human-generated ones. Notably, the model sometimes captured incidental findings that were omitted in original human impressions.

Furthermore, the model demonstrated remarkable stability against real-world data entry errors, showing minimal degradation in performance even with a simulated 3% typographical error rate. Its generalizability was also validated by successfully summarizing key clinical findings from an external dataset (CheXpert Plus), with radiologists rating 80% of these generated impressions as equal to or better than the originals.

Also Read:

Looking Ahead

This research marks a significant step towards integrating advanced AI into clinical workflows, offering a promising solution to alleviate radiologist burnout and improve the efficiency and quality of medical reporting. Future work aims to integrate visual data and explore advanced multi-modal models to further enhance clinical reasoning and support real-world diagnostic workflows. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI Streamlines Radiology Reports with Personalized LLM Impressions

A New Approach to Radiology Reporting

Under the Hood: Models and Data

Personalization and Evaluation

Looking Ahead

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates