Assessing Synthetic Chest X-rays: A Radiologist's Perspective on GANs and Diffusion Models

TLDR: A study evaluated Generative Adversarial Networks (GANs) and Diffusion Models (DMs) for creating synthetic chest X-rays, focusing on four abnormalities. Radiologists found DMs generally more realistic but GANs more accurate for specific conditions. They often struggled to distinguish synthetic from real images, though identified visual cues for detection. The research highlights that while generative AI is promising for medical imaging, generating clinically reliable synthetic X-rays remains a challenge, emphasizing the need for continued refinement and human expert validation.

Generative Artificial Intelligence (AI) has made significant strides in creating realistic images, and its potential in medical imaging, particularly for addressing data scarcity, is immense. However, a critical question remains: how faithful and clinically useful are these synthetic images? A recent study titled “Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays” by Gregory Schuit, Denis Parra, and Cecilia Besa delves into this very question, focusing on chest X-rays.

The research investigates the effectiveness of two leading generative models, Generative Adversarial Networks (GANs) and Diffusion Models (DMs), in synthesizing chest X-rays. The models were tasked with generating images conditioned on the presence or absence of four common abnormalities: Atelectasis (AT), Lung Opacity (LO), Pleural Effusion (PE), and Enlarged Cardiac Silhouette (ECS).

To evaluate these models, the researchers conducted a reader study involving three radiologists with varying levels of experience. These experts were presented with a benchmark dataset comprising real images from the MIMIC-CXR dataset and synthetic images generated by both GANs and DMs. The radiologists participated in two main tasks: first, distinguishing between real and synthetic images, and second, assessing whether the visual features in an image were consistent with the target abnormality.

The findings revealed nuanced differences between the two generative approaches. Overall, Diffusion Models were found to generate images that appeared more visually realistic. However, GANs demonstrated better accuracy for specific conditions, such as the absence of an Enlarged Cardiac Silhouette. Interestingly, radiologists often found it challenging to differentiate between real and synthetic images. In fact, they were undecided about which image was synthetic in 41.8% of cases. When they did make a decision, they were correct only about half the time (50.4%), suggesting that both models produce images realistic enough to be difficult to discern from genuine ones.

The study also provided valuable insights into the visual cues radiologists used to identify synthetic images. These included characteristics like unusually high radiolucency (transparency to X-rays), incomplete pulmonary fields (lungs appearing cut off), abnormally large densities, and blurry lateral views. For instance, high radiolucency was a common indicator for DM-generated images without ECS, while cropped images often revealed DM-generated images without LO.

While GANs and DMs showed promise in generating conditioned abnormalities like Enlarged Cardiac Silhouette and Pleural Effusion with high accuracy, challenges emerged with Lung Opacity and Atelectasis. The study noted that radiologists themselves struggled with classifying Lung Opacity in real images, partly due to the lower resolution (256×256 pixels) used in the study compared to the high-resolution DICOM format typically used in clinical settings. Furthermore, the generative models showed a distinct lack of efficacy in accurately producing Lung Opacity, suggesting that critical information for generating proper contrasts might be lost during training.

In terms of overall performance, GANs showed a slight edge over DMs regarding conditional correctness, even though DMs offer the advantage of accepting natural language descriptions as conditions. This suggests that for precise binary conditionality, GANs might currently be more effective. However, the study acknowledges that different prompting techniques for DMs could potentially improve their results.

Also Read:

The researchers highlight several limitations, including a relatively small sample size of images and participants, and the use of a single prompt template for the Diffusion Model. The small image resolution also limits how representative the experiment is of a real clinical scenario. Despite these limitations, the study underscores that while generative AI holds immense promise for medical imaging, the creation of truly realistic and clinically reliable chest radiographs is still an evolving challenge. The findings emphasize the crucial role of human-in-the-loop validation in the development of trustworthy generative models for medical applications. You can read the full research paper here: Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Assessing Synthetic Chest X-rays: A Radiologist’s Perspective on GANs and Diffusion Models

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates