spot_img
HomeResearch & DevelopmentAI's Predictive Power in Infant Eye Disease: A Look...

AI’s Predictive Power in Infant Eye Disease: A Look at Affective Biases and Prompt Engineering

TLDR: Researchers developed Affective-ROPTester and the CROP dataset to assess LLMs’ ability to predict Retinopathy of Prematurity (ROP) risk in infants using admission notes. Findings show LLMs struggle with intrinsic knowledge but improve with external inputs. They also exhibit a bias towards overestimating ROP risk, which can be mitigated by incorporating positive emotional framing in prompts, emphasizing the importance of affect-sensitive prompt engineering for reliable AI in healthcare.

A new research paper introduces “Affective-ROPTester,” a groundbreaking framework designed to evaluate how large language models (LLMs) perform in predicting the risk of Retinopathy of Prematurity (ROP) in infants. ROP is a serious eye condition affecting premature and low-weight infants, and its early prediction is crucial for effective intervention.

Traditionally, ROP diagnosis relies heavily on medical records and imaging. However, this research explores the potential of LLMs to predict ROP risk using only initial hospital admission notes. To facilitate this, the researchers developed a novel Chinese benchmark dataset called CROP, which contains 993 admission records categorized into low, medium, and high-risk labels for ROP.

The Affective-ROPTester framework employs three distinct prompting strategies to thoroughly examine LLMs’ predictive capabilities and biases: Instruction-based, Chain-of-Thought (CoT), and In-Context Learning (ICL). The Instruction scheme tests the LLMs’ inherent knowledge, while CoT and ICL integrate external medical knowledge to improve accuracy. A unique aspect of this framework is the inclusion of emotional elements at the prompt level, investigating how different emotional framings influence the models’ predictions and bias patterns.

The study yielded several significant findings. Firstly, LLMs showed limited effectiveness in ROP risk prediction when relying solely on their intrinsic knowledge. However, their performance significantly improved when augmented with structured external inputs, such as known risk factors in the CoT scheme or demonstration examples in the ICL scheme. For instance, the Qwen-2.5 model achieved an accuracy of 61.33% with the CoT scheme.

Secondly, the research revealed clear affective biases in the LLM outputs, with a consistent tendency to overestimate medium- and high-risk cases. This means that without proper guidance, LLMs might lean towards more pessimistic predictions. Interestingly, the study found that positive emotional framing in the prompts helped to mitigate this predictive bias, contributing to more balanced and accurate outcomes compared to negative or neutral emotional framings.

Also Read:

This research highlights the critical role of affect-sensitive prompt engineering in enhancing the reliability of diagnostic tools powered by LLMs in clinical settings. The Affective-ROPTester serves as a valuable framework for both evaluating and reducing affective bias in clinical language modeling systems, paving the way for more dependable AI assistance in healthcare. You can read the full paper for more details at this link.

Rhea Bhattacharya
Rhea Bhattacharyahttps://blogs.edgentiq.com
Rhea Bhattacharya is an AI correspondent with a keen eye for cultural, social, and ethical trends in Generative AI. With a background in sociology and digital ethics, she delivers high-context stories that explore the intersection of AI with everyday lives, governance, and global equity. Her news coverage is analytical, human-centric, and always ahead of the curve. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -

Previous article
Next article