AI's Predictive Power in Infant Eye Disease: A Look at Affective Biases and Prompt Engineering

TLDR: Researchers developed Affective-ROPTester and the CROP dataset to assess LLMs’ ability to predict Retinopathy of Prematurity (ROP) risk in infants using admission notes. Findings show LLMs struggle with intrinsic knowledge but improve with external inputs. They also exhibit a bias towards overestimating ROP risk, which can be mitigated by incorporating positive emotional framing in prompts, emphasizing the importance of affect-sensitive prompt engineering for reliable AI in healthcare.

A new research paper introduces “Affective-ROPTester,” a groundbreaking framework designed to evaluate how large language models (LLMs) perform in predicting the risk of Retinopathy of Prematurity (ROP) in infants. ROP is a serious eye condition affecting premature and low-weight infants, and its early prediction is crucial for effective intervention.

Traditionally, ROP diagnosis relies heavily on medical records and imaging. However, this research explores the potential of LLMs to predict ROP risk using only initial hospital admission notes. To facilitate this, the researchers developed a novel Chinese benchmark dataset called CROP, which contains 993 admission records categorized into low, medium, and high-risk labels for ROP.

The Affective-ROPTester framework employs three distinct prompting strategies to thoroughly examine LLMs’ predictive capabilities and biases: Instruction-based, Chain-of-Thought (CoT), and In-Context Learning (ICL). The Instruction scheme tests the LLMs’ inherent knowledge, while CoT and ICL integrate external medical knowledge to improve accuracy. A unique aspect of this framework is the inclusion of emotional elements at the prompt level, investigating how different emotional framings influence the models’ predictions and bias patterns.

The study yielded several significant findings. Firstly, LLMs showed limited effectiveness in ROP risk prediction when relying solely on their intrinsic knowledge. However, their performance significantly improved when augmented with structured external inputs, such as known risk factors in the CoT scheme or demonstration examples in the ICL scheme. For instance, the Qwen-2.5 model achieved an accuracy of 61.33% with the CoT scheme.

Secondly, the research revealed clear affective biases in the LLM outputs, with a consistent tendency to overestimate medium- and high-risk cases. This means that without proper guidance, LLMs might lean towards more pessimistic predictions. Interestingly, the study found that positive emotional framing in the prompts helped to mitigate this predictive bias, contributing to more balanced and accurate outcomes compared to negative or neutral emotional framings.

Also Read:

This research highlights the critical role of affect-sensitive prompt engineering in enhancing the reliability of diagnostic tools powered by LLMs in clinical settings. The Affective-ROPTester serves as a valuable framework for both evaluating and reducing affective bias in clinical language modeling systems, paving the way for more dependable AI assistance in healthcare. You can read the full paper for more details at this link.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI’s Predictive Power in Infant Eye Disease: A Look at Affective Biases and Prompt Engineering

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates