spot_img
HomeNews & Current EventsGenerative AI Models Show Variable Performance in Pulmonary CT...

Generative AI Models Show Variable Performance in Pulmonary CT Lung Cancer Diagnosis

TLDR: A recent study evaluated the diagnostic capabilities of advanced generative AI models, including GPT-4-turbo, Gemini-pro-vision, and Claude-3-opus, in interpreting pulmonary CT scans for lung cancer. While these models demonstrated potential, particularly with single-image inputs, their accuracy declined when presented with more complex multimodal information, such as multiple CT slices or patient clinical histories. The research highlights both the promise and the current limitations of AI in complex radiological interpretations, emphasizing the need for further refinement for successful clinical integration.

A comprehensive study has shed light on the performance of cutting-edge generative artificial intelligence (Gen-AI) models in the critical field of pulmonary computed tomography (CT) imaging for lung cancer diagnosis. The research, published on June 29, 2025, evaluated three prominent Gen-AI models: GPT-4-turbo, Gemini-pro-vision, and Claude-3-opus. The objective was to assess their diagnostic accuracy and identify their strengths and weaknesses when interpreting complex radiological data.

The study, a retrospective analysis, utilized chest CT scans from 404 patients, including those with lung neoplasms (184 cases) and non-malignant lung conditions (210 cases). External validation was performed using datasets from The Cancer Genome Atlas and the Medical Imaging and Data Resource Center.

The models were tested across various clinical scenarios, including single-image CT diagnostics, consecutive CT slices, and single images combined with patient clinical histories.

Initial findings revealed that in single-image CT diagnostics, Gemini and Claude demonstrated superior accuracy compared to GPT. However, a significant observation was the decline in diagnostic accuracy for all models when additional CT slices or clinical histories were incorporated. This suggests a challenge in integrating complex multimodal information effectively. For instance, Gemini’s accuracy dropped sharply with consecutive slices, indicating potential difficulties in interpreting lesion continuity and spatial relationships. Similarly, GPT struggled with tasks combining CT images and clinical history, often treating auxiliary text as interference.

Further analysis indicated that Gen-AI models primarily relied on morphology and margins for malignancy predictions. While features like “spiculated” and “irregular” margins, as well as “mixed,” “solid,” and “hyperdense” densities, were heavily weighted, the models occasionally struggled to recognize critical imaging features and, concerningly, sometimes fabricated data. This “hallucination” of information poses a significant risk in clinical applications, potentially misleading diagnoses.

The research also explored the impact of prompt design on model performance. Simplifying prompts, which asked only for lesion identification and preliminary diagnosis, led to significant improvements in diagnostic accuracy, sensitivity, specificity, and F1 scores across all models. This suggests that the way information is presented to these AI systems can profoundly influence their diagnostic capabilities.

Also Read:

Despite the promising aspects, the study underscores the current limitations of Gen-AI in medical imaging. These include inconsistencies in diagnostic justifications, discrepancies between AI-generated parameters and actual image features, and a tendency for performance degradation with increasing information complexity. The authors emphasize that while Gen-AI holds potential for early tumor screening and streamlining diagnostic workflows, ongoing efforts are crucial to improve their robustness, reliability, and ability to integrate diverse clinical information for successful adoption in healthcare. The findings highlight the need for developers to maintain objective perspectives when describing their models’ performance in practical applications and specific tasks, and for continued research to bridge the gap in domain expertise and address issues like data fabrication.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -