TLDR: A new framework, C-SRRG, significantly improves automated structured radiology report generation by incorporating rich clinical context like multi-view images, patient indications, imaging techniques, and prior studies. This approach enhances report quality, diagnostic accuracy, and drastically reduces “temporal hallucinations” where AI models incorrectly reference non-existent prior exams. The research demonstrates that clinical context becomes more crucial as multimodal large language models (MLLMs) increase in size.
Automated radiology report generation (RRG) has emerged as a promising solution to ease the demanding workload of radiologists and enhance diagnostic efficiency. These systems aim to create structured reports from medical images, ensuring clarity, consistency, and adherence to clinical standards. However, a significant challenge with existing automated systems is their tendency to overlook crucial clinical context, which radiologists routinely use in their diagnostic process. This oversight can lead to critical errors, such as ‘temporal hallucinations,’ where reports reference non-existent prior studies.
To address these limitations, researchers have proposed a novel framework called Contextualized Structured Radiology Report Generation (C-SRRG). This innovative approach comprehensively integrates rich clinical context into the report generation process, aiming to align AI systems more closely with the human diagnostic workflow.
What is Rich Clinical Context?
The C-SRRG framework incorporates four key clinical elements that radiologists typically consider:
- Multi-view images: These provide complementary perspectives from different angles (e.g., frontal and lateral X-rays), allowing for a more comprehensive assessment of abnormalities.
- Clinical indication: This conveys the clinical reason for the imaging, helping the AI model focus on specific diagnostic questions and tailor findings to patient concerns.
- Imaging technique: Details about examination parameters, such as protocols and contrast use, help the model account for technical caveats and avoid misinterpreting artifacts as pathology.
- Prior studies with corresponding comparisons: When available, previous imaging studies provide a historical context, enabling the detection of disease progression, treatment response, and interval changes. This is crucial for preventing temporal hallucinations.
The researchers curated a large-scale dataset for C-SRRG by integrating information from MIMIC-CXR and CheXpert Plus datasets. This dataset provides multi-view images, clinical indications, imaging techniques, and variable-length prior studies, reflecting the diverse scenarios encountered in real-world clinical practice.
How C-SRRG Improves Report Generation
The C-SRRG framework was evaluated using state-of-the-art medical multimodal large language models (MLLMs), including CheXagent-3B, MedGemma-4B, and Lingshu-7B. The results demonstrated that incorporating clinical context significantly and consistently improves the quality of radiology reports across various metrics, especially for both the ‘findings’ and ‘impression’ sections of the reports.
One of the most impactful findings was the substantial reduction in temporal hallucinations. In baseline models without clinical context, temporal hallucination rates were as high as 22.9% for findings and 43.8% for impressions. With the C-SRRG framework, these rates dropped significantly to 10.7% and 25.8%, respectively. This indicates that providing relevant clinical history helps the models avoid fabricating temporal comparisons when no prior studies are available.
Interestingly, the study also revealed that the importance of clinical context increases as MLLMs scale up. Larger models showed even greater gains in performance when provided with comprehensive context, suggesting that sophisticated contextual integration is vital for achieving optimal performance with advanced AI models in radiology.
Furthermore, an analysis of organ-level performance showed that C-SRRG improved diagnostic accuracy across nearly all anatomical regions, from lungs and airways to cardiovascular structures and pleura.
Also Read:
- MedMMV: Enhancing Trust and Accuracy in AI for Clinical Decisions
- Benchmarking AI in Radiology: A Reality Check on Diagnostic Accuracy
Looking Ahead
While C-SRRG represents a significant step forward, the researchers acknowledge certain limitations, such as the reliance on synthetically generated annotations and constraints on processing extensive longitudinal histories due to current model architectures. Future work will focus on scaling to even larger foundation models, developing intelligent context selection policies, and incorporating feedback from radiologists to further refine the report generation process.
This research marks a crucial advancement in automated radiology report generation, promising to reduce radiologists’ workload, improve diagnostic accuracy, and ensure more clinically aligned reports. For more details, you can read the full research paper here.


