Study Reveals ChatGPT's Surprising Accuracy in Medical Diagnosis, Highlights Key Limitations

TLDR: A new study published in the journal iScience, led by researchers from Binghamton University, has evaluated ChatGPT’s diagnostic capabilities. The AI demonstrated high accuracy in identifying disease terms, drug names, and genetic information, often exceeding researchers’ expectations. However, it showed lower accuracy in symptom identification and a tendency to ‘hallucinate’ specific genetic accession numbers, pointing to areas for improvement in AI’s medical application.

A recent study, spearheaded by Ahmed Abdeen Hamed, a research fellow at Binghamton University’s Thomas J. Watson College of Engineering and Applied Science, has delved into the diagnostic prowess of generative artificial intelligence, specifically ChatGPT. Published in the journal iScience, the research aimed to assess the accuracy of AI-generated medical information, a growing concern as individuals increasingly consult platforms like ChatGPT for health diagnoses.

Hamed, alongside collaborators from AGH University of Krakow, Poland, Howard University, and the University of Vermont, tested ChatGPT across various biomedical categories: disease terms, drug names, genetics, and symptoms. The findings presented a mix of impressive successes and notable limitations.

Remarkably, ChatGPT exhibited high accuracy in identifying disease terms (ranging from 88% to 97%), drug names (90% to 91%), and genetic information (88% to 98%). Hamed expressed his astonishment at these results, stating, ‘I thought it would be at most 25% accuracy.’ He further elaborated on the AI’s capabilities, noting, ‘The exciting result was ChatGPT said cancer is a disease, hypertension is a disease, fever is a symptom, Remdesivir is a drug and BRCA is a gene related to breast cancer. Incredible, absolutely incredible!’

However, the study identified a significant weakness in symptom identification, where ChatGPT’s accuracy dropped to between 49% and 61%. This discrepancy is attributed to the difference in language used by medical professionals and the general public. While doctors and researchers rely on precise biomedical ontologies, ChatGPT is trained on more informal, social language to communicate with average users. Hamed explained, ‘The LLM is apparently trying to simplify the definition of these symptoms, because there is a lot of traffic asking such questions, so it started to minimize the formalities of medical language to appeal to those users.’

Another critical issue highlighted was ChatGPT’s tendency to ‘hallucinate’ information, particularly when asked for specific genetic accession numbers from databases like the National Institutes of Health’s GenBank. For instance, when prompted for the designation of the Breast Cancer 1 gene (BRCA1), which is NM_007294.4, ChatGPT would generate made-up numbers. Hamed views this as a major flaw despite the otherwise positive results.

Also Read:

Looking ahead, Hamed sees an opportunity to enhance these AI tools. He suggests ‘introducing these biomedical ontologies to the LLMs to provide much higher accuracy, get rid of all the hallucinations and make these tools into something amazing.’ His ongoing research, which began in 2023 due to concerns about fact-checking in large language models, aims to expose these flaws, enabling data scientists to refine and improve AI models for safer and more accurate biomedical applications.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Study Reveals ChatGPT’s Surprising Accuracy in Medical Diagnosis, Highlights Key Limitations

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Get Well and RhythmX AI Unite to Form GW RhythmX, Pioneering AI-Native Healthcare Intelligence

Legal AI Startup Theo Ai Secures $3.4 Million to Advance Predictive Litigation Tools

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

OneShield Achieves Landmark Registration Under Cloud Security Alliance AI Controls Matrix, Setting New Industry Standard

SeedAI Leads Utah’s Proactive Initiative for Ethical AI Integration in Business

Bahrain Commended for AI Preparedness in New UNESCO Global Report

U.S. Air Force Secures Skydio Drone Technology for Enhanced Autonomous Operations

Malaysia Forges Ahead with AI Development, Prioritizing Governance and Ethical Frameworks

Contractify Honored as Top Contract Management Solution Provider for 2025 by LegalTech Breakthrough Awards

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

EPAM Honored with Microsoft’s 2025 Innovate with Azure AI Platform Partner of the Year Award for Pioneering AI Solutions

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Netherlands Unveils Ambitious AI Strategy to Shape Global Governance Frameworks

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Prepify AI and ZoraSafe, Inc. Honored with ‘Panelists’ Choice’ Awards at UF Innovate’s GatorPitch in Miami

Subscribe to get the latest news and updates