Adaptive Prompting Strategies Boost Biomedical Named Entity Recognition

TLDR: This research introduces a dynamic prompting strategy using Retrieval-Augmented Generation (RAG) to enhance few-shot biomedical Named Entity Recognition (NER) with Large Language Models (LLMs). By dynamically selecting in-context learning examples based on similarity to input texts, the method significantly improves performance over static prompting. Key findings show that TF-IDF and SBERT retrieval methods yield the best results, and GPT-4 consistently outperforms LLaMA 3, highlighting the utility of contextually adaptive prompts for biomedical NER, especially in data-scarce scenarios.

Named Entity Recognition (NER) is a fundamental task in natural language processing (NLP) that involves identifying and classifying predefined entities from text. In the biomedical field, NER is crucial for extracting information like diseases, treatments, and symptoms from medical texts. However, biomedical NER often faces a significant challenge: the scarcity of annotated training data, especially for rare medical concepts. Traditional deep neural network methods, while powerful, typically require large datasets, which are expensive and often impossible to obtain or share due to privacy concerns.

Large Language Models (LLMs) have shown great promise in few-shot learning (FSL) settings, where models can adapt to new tasks with minimal examples. This adaptability is particularly transformative for restricted domains like biomedicine. A common approach with LLMs is prompt engineering, where carefully designed prompts guide the model’s understanding and output. However, many prompt-driven methods use ‘static’ prompts, meaning the same fixed prompt and in-context examples are used for every input, regardless of its content. This lack of flexibility can lead to suboptimal performance, as the fixed examples might not always be relevant to the specific input text.

To overcome the limitations of static prompts and enhance few-shot biomedical NER, researchers have explored dynamic prompting strategies, particularly those involving Retrieval-Augmented Generation (RAG). RAG enriches the LLM’s context by retrieving query-relevant information before generating a response. This process is typically guided by similarity measures, ensuring the model accesses contextually relevant examples tailored to the input query. By introducing relevant information at inference time, RAG can significantly improve performance in specialized applications like biomedical text analysis, where precision and relevance are critical.

A Novel Dynamic Prompting Approach

A recent study, titled Retrieval augmented generation based dynamic prompting for few-shot biomedical named entity recognition using large language models, by Yao Ge, Sudeshna Das, Yuting Guo, and Abeed Sarker, delves into a dynamic prompting strategy that leverages RAG. In their approach, annotated in-context learning examples are selected based on their similarities with the input texts, and the prompt is dynamically updated for each instance during inference. This ensures that the LLM receives the most relevant examples for the specific text it is processing.

The researchers implemented and optimized both static and dynamic prompt engineering techniques, evaluating them on five diverse biomedical NER datasets: MIMIC-III, BC5CDR, NCBI-Disease, Med-Mentions, and REDDIT-IMPACTS. For static prompting, they developed a structured framework incorporating task-relevant instructions, entity definitions, dataset contextualization, high-frequency instances, background knowledge from the Unified Medical Language System (UMLS), and error analysis feedback. This comprehensive static prompt significantly boosted performance across various LLMs.

For dynamic prompting, the core idea was to use a retrieval engine to select the most suitable training examples from a larger pool. Upon receiving an input sentence, the system retrieves the top ‘n’ annotated examples based on contextual similarity and embeds them into the prompt before passing it to the LLM. The study explored several retrieval methods, each with unique strengths:

TF-IDF (Term Frequency-Inverse Document Frequency): A simple yet efficient method for keyword matching, effective for datasets with well-defined biomedical terminologies.
Sentence-BERT (SBERT): Leverages pre-trained BERT models to encode sentences into dense embeddings, capturing semantic relationships and identifying contextually similar examples even with different phrasing.
ColBERT (Contextualized Late Interaction over BERT): Enhances retrieval by focusing on contextualized token representations, allowing for nuanced matching of query and document tokens.
Dense Passage Retrieval (DPR): Employs a dual-encoder architecture to learn dense embeddings, optimizing for maximum similarity between relevant query-document pairs.

Also Read:

Key Findings and Performance Insights

The results demonstrated clear advantages for both static and dynamic prompting. Static prompting with structured components increased average F1-scores by 12% for GPT-4, and 11% for GPT-3.5 and LLaMA 3-70B, compared to basic static prompting. The addition of high-frequency instances and dataset descriptions particularly improved recall, while few-shot examples at the token level significantly increased precision.

Dynamic prompting further improved performance, with TF-IDF and SBERT retrieval methods yielding the best results. In 5-shot and 10-shot settings, these methods improved average F1-scores by 7.3% and 5.6% respectively. GPT-4 consistently outperformed LLaMA 3 across most datasets and retrieval methods, especially in the 5-shot setting, highlighting its robustness in leveraging limited training data. While increasing the number of examples (shot size) generally improved F1-scores, the gains diminished from 10-shot to 20-shot settings, suggesting potential redundancy or input token limits affecting LLM performance.

The study’s findings suggest that TF-IDF is highly efficient for datasets with low noise, while SBERT is better suited for linguistically diverse data, such as social media texts. More advanced methods like ColBERT and DPR, despite their general-purpose strengths, did not provide substantial advantages in this biomedical domain and could introduce unnecessary computational overhead. This research underscores the significant utility of contextually adaptive prompts via RAG for improving biomedical NER, offering valuable insights for optimizing LLMs in healthcare applications and reducing the need for extensive manual data annotation.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Adaptive Prompting Strategies Boost Biomedical Named Entity Recognition

A Novel Dynamic Prompting Approach

Key Findings and Performance Insights

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates