spot_img
HomeResearch & DevelopmentAssessing Synthetic Chest X-rays: A Radiologist's Perspective on GANs...

Assessing Synthetic Chest X-rays: A Radiologist’s Perspective on GANs and Diffusion Models

TLDR: A study evaluated Generative Adversarial Networks (GANs) and Diffusion Models (DMs) for creating synthetic chest X-rays, focusing on four abnormalities. Radiologists found DMs generally more realistic but GANs more accurate for specific conditions. They often struggled to distinguish synthetic from real images, though identified visual cues for detection. The research highlights that while generative AI is promising for medical imaging, generating clinically reliable synthetic X-rays remains a challenge, emphasizing the need for continued refinement and human expert validation.

Generative Artificial Intelligence (AI) has made significant strides in creating realistic images, and its potential in medical imaging, particularly for addressing data scarcity, is immense. However, a critical question remains: how faithful and clinically useful are these synthetic images? A recent study titled “Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays” by Gregory Schuit, Denis Parra, and Cecilia Besa delves into this very question, focusing on chest X-rays.

The research investigates the effectiveness of two leading generative models, Generative Adversarial Networks (GANs) and Diffusion Models (DMs), in synthesizing chest X-rays. The models were tasked with generating images conditioned on the presence or absence of four common abnormalities: Atelectasis (AT), Lung Opacity (LO), Pleural Effusion (PE), and Enlarged Cardiac Silhouette (ECS).

To evaluate these models, the researchers conducted a reader study involving three radiologists with varying levels of experience. These experts were presented with a benchmark dataset comprising real images from the MIMIC-CXR dataset and synthetic images generated by both GANs and DMs. The radiologists participated in two main tasks: first, distinguishing between real and synthetic images, and second, assessing whether the visual features in an image were consistent with the target abnormality.

The findings revealed nuanced differences between the two generative approaches. Overall, Diffusion Models were found to generate images that appeared more visually realistic. However, GANs demonstrated better accuracy for specific conditions, such as the absence of an Enlarged Cardiac Silhouette. Interestingly, radiologists often found it challenging to differentiate between real and synthetic images. In fact, they were undecided about which image was synthetic in 41.8% of cases. When they did make a decision, they were correct only about half the time (50.4%), suggesting that both models produce images realistic enough to be difficult to discern from genuine ones.

The study also provided valuable insights into the visual cues radiologists used to identify synthetic images. These included characteristics like unusually high radiolucency (transparency to X-rays), incomplete pulmonary fields (lungs appearing cut off), abnormally large densities, and blurry lateral views. For instance, high radiolucency was a common indicator for DM-generated images without ECS, while cropped images often revealed DM-generated images without LO.

While GANs and DMs showed promise in generating conditioned abnormalities like Enlarged Cardiac Silhouette and Pleural Effusion with high accuracy, challenges emerged with Lung Opacity and Atelectasis. The study noted that radiologists themselves struggled with classifying Lung Opacity in real images, partly due to the lower resolution (256×256 pixels) used in the study compared to the high-resolution DICOM format typically used in clinical settings. Furthermore, the generative models showed a distinct lack of efficacy in accurately producing Lung Opacity, suggesting that critical information for generating proper contrasts might be lost during training.

In terms of overall performance, GANs showed a slight edge over DMs regarding conditional correctness, even though DMs offer the advantage of accepting natural language descriptions as conditions. This suggests that for precise binary conditionality, GANs might currently be more effective. However, the study acknowledges that different prompting techniques for DMs could potentially improve their results.

Also Read:

The researchers highlight several limitations, including a relatively small sample size of images and participants, and the use of a single prompt template for the Diffusion Model. The small image resolution also limits how representative the experiment is of a real clinical scenario. Despite these limitations, the study underscores that while generative AI holds immense promise for medical imaging, the creation of truly realistic and clinically reliable chest radiographs is still an evolving challenge. The findings emphasize the crucial role of human-in-the-loop validation in the development of trustworthy generative models for medical applications. You can read the full research paper here: Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -