spot_img
HomeResearch & DevelopmentAdaptive Prompting Strategies Boost Biomedical Named Entity Recognition

Adaptive Prompting Strategies Boost Biomedical Named Entity Recognition

TLDR: This research introduces a dynamic prompting strategy using Retrieval-Augmented Generation (RAG) to enhance few-shot biomedical Named Entity Recognition (NER) with Large Language Models (LLMs). By dynamically selecting in-context learning examples based on similarity to input texts, the method significantly improves performance over static prompting. Key findings show that TF-IDF and SBERT retrieval methods yield the best results, and GPT-4 consistently outperforms LLaMA 3, highlighting the utility of contextually adaptive prompts for biomedical NER, especially in data-scarce scenarios.

Named Entity Recognition (NER) is a fundamental task in natural language processing (NLP) that involves identifying and classifying predefined entities from text. In the biomedical field, NER is crucial for extracting information like diseases, treatments, and symptoms from medical texts. However, biomedical NER often faces a significant challenge: the scarcity of annotated training data, especially for rare medical concepts. Traditional deep neural network methods, while powerful, typically require large datasets, which are expensive and often impossible to obtain or share due to privacy concerns.

Large Language Models (LLMs) have shown great promise in few-shot learning (FSL) settings, where models can adapt to new tasks with minimal examples. This adaptability is particularly transformative for restricted domains like biomedicine. A common approach with LLMs is prompt engineering, where carefully designed prompts guide the model’s understanding and output. However, many prompt-driven methods use ‘static’ prompts, meaning the same fixed prompt and in-context examples are used for every input, regardless of its content. This lack of flexibility can lead to suboptimal performance, as the fixed examples might not always be relevant to the specific input text.

To overcome the limitations of static prompts and enhance few-shot biomedical NER, researchers have explored dynamic prompting strategies, particularly those involving Retrieval-Augmented Generation (RAG). RAG enriches the LLM’s context by retrieving query-relevant information before generating a response. This process is typically guided by similarity measures, ensuring the model accesses contextually relevant examples tailored to the input query. By introducing relevant information at inference time, RAG can significantly improve performance in specialized applications like biomedical text analysis, where precision and relevance are critical.

A Novel Dynamic Prompting Approach

A recent study, titled Retrieval augmented generation based dynamic prompting for few-shot biomedical named entity recognition using large language models, by Yao Ge, Sudeshna Das, Yuting Guo, and Abeed Sarker, delves into a dynamic prompting strategy that leverages RAG. In their approach, annotated in-context learning examples are selected based on their similarities with the input texts, and the prompt is dynamically updated for each instance during inference. This ensures that the LLM receives the most relevant examples for the specific text it is processing.

The researchers implemented and optimized both static and dynamic prompt engineering techniques, evaluating them on five diverse biomedical NER datasets: MIMIC-III, BC5CDR, NCBI-Disease, Med-Mentions, and REDDIT-IMPACTS. For static prompting, they developed a structured framework incorporating task-relevant instructions, entity definitions, dataset contextualization, high-frequency instances, background knowledge from the Unified Medical Language System (UMLS), and error analysis feedback. This comprehensive static prompt significantly boosted performance across various LLMs.

For dynamic prompting, the core idea was to use a retrieval engine to select the most suitable training examples from a larger pool. Upon receiving an input sentence, the system retrieves the top ‘n’ annotated examples based on contextual similarity and embeds them into the prompt before passing it to the LLM. The study explored several retrieval methods, each with unique strengths:

  • TF-IDF (Term Frequency-Inverse Document Frequency): A simple yet efficient method for keyword matching, effective for datasets with well-defined biomedical terminologies.

  • Sentence-BERT (SBERT): Leverages pre-trained BERT models to encode sentences into dense embeddings, capturing semantic relationships and identifying contextually similar examples even with different phrasing.

  • ColBERT (Contextualized Late Interaction over BERT): Enhances retrieval by focusing on contextualized token representations, allowing for nuanced matching of query and document tokens.

  • Dense Passage Retrieval (DPR): Employs a dual-encoder architecture to learn dense embeddings, optimizing for maximum similarity between relevant query-document pairs.

Also Read:

Key Findings and Performance Insights

The results demonstrated clear advantages for both static and dynamic prompting. Static prompting with structured components increased average F1-scores by 12% for GPT-4, and 11% for GPT-3.5 and LLaMA 3-70B, compared to basic static prompting. The addition of high-frequency instances and dataset descriptions particularly improved recall, while few-shot examples at the token level significantly increased precision.

Dynamic prompting further improved performance, with TF-IDF and SBERT retrieval methods yielding the best results. In 5-shot and 10-shot settings, these methods improved average F1-scores by 7.3% and 5.6% respectively. GPT-4 consistently outperformed LLaMA 3 across most datasets and retrieval methods, especially in the 5-shot setting, highlighting its robustness in leveraging limited training data. While increasing the number of examples (shot size) generally improved F1-scores, the gains diminished from 10-shot to 20-shot settings, suggesting potential redundancy or input token limits affecting LLM performance.

The study’s findings suggest that TF-IDF is highly efficient for datasets with low noise, while SBERT is better suited for linguistically diverse data, such as social media texts. More advanced methods like ColBERT and DPR, despite their general-purpose strengths, did not provide substantial advantages in this biomedical domain and could introduce unnecessary computational overhead. This research underscores the significant utility of contextually adaptive prompts via RAG for improving biomedical NER, offering valuable insights for optimizing LLMs in healthcare applications and reducing the need for extensive manual data annotation.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -