Retrieval-Augmented Generation in Medicine: A Scoping Review of Current Landscape and Future Paths

TLDR: A scoping review of 251 studies on Retrieval-Augmented Generation (RAG) in medicine reveals that research heavily relies on public data and dense retrieval methods, often with English-centric models. Proprietary LLMs are common, but medical-specific LLMs are underutilized. Applications focus on question answering and report generation, primarily in Internal Medicine. A critical finding is the insufficient attention to ethical considerations like bias, safety, and deployment in low-resource settings, highlighting the need for clinical validation, transparency, and equitable adaptation for future implementation.

The medical field is constantly evolving, with new knowledge emerging at an unprecedented rate. This rapid expansion, coupled with the increasing complexity of patient care, presents significant challenges for healthcare professionals. Large Language Models (LLMs) have shown promise in assisting with these challenges, but they come with their own set of limitations, such as relying on static data, potential for factual inaccuracies, lack of explainability, and inability to access private patient data.

A recent scoping review, titled Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations, delves into how Retrieval-Augmented Generation (RAG) technologies are being applied in medicine to overcome these LLM limitations. This comprehensive review analyzed 251 studies to map the implementation pathways, application patterns, and ethical considerations of RAG in healthcare.

Understanding RAG in Medicine

RAG enhances LLMs by allowing them to access and incorporate information from external knowledge sources during the generation process. This means LLMs can provide more up-to-date, relevant, and fact-grounded outputs. The review highlights that RAG systems typically follow an “index-retrieve-generate” pipeline, where relevant information is first retrieved from sources like research literature or clinical guidelines, and then used to augment the LLM’s response.

Key Findings from the Review

The review uncovered several important trends in medical RAG research:

Data Sources: Most studies (over 80%) relied on publicly available data, such as biomedical scientific corpora (e.g., PubMed), clinical guidelines, and online information. Private data, like electronic health records, saw limited use due to privacy concerns and implementation complexities. This suggests that current RAG applications primarily focus on general medical knowledge rather than personalized healthcare.

Retrieval Methods: Dense retrieval methods were dominant, used in over 84% of studies. These methods often employ general or medical-specific embedding models (like BioBERT or MedCPT) to capture semantic relationships. However, a significant limitation identified was the reliance on English-centric embedding models, which restricts RAG’s effectiveness in non-English medical contexts and can exacerbate health inequities in low-resource languages.

Generative LLMs: Proprietary LLMs, mainly from OpenAI’s GPT series, were the most widely used, followed by open-weight LLMs (like DeepSeek, Gemma, LLaMA, and Qwen series). Interestingly, medical-specific LLMs were rarely applied, possibly due to limited public accessibility or slower development compared to general LLMs.

Medical Specialties and Applications: RAG applications were most concentrated in Internal Medicine, followed by Psychiatry, Neurology, and Radiology. The primary application scenario was medical question answering, supporting clinicians in evidence retrieval, diagnostic reasoning, and decision-making. Other notable applications included report generation (e.g., radiology or pathology reports), text summarization, and information extraction, all aimed at reducing clinician workload and improving information management.

Evaluation and Ethics: Evaluation methods showed a balance between automated metrics (for text generation quality and task performance) and human evaluation (for accuracy, completeness, relevance, and fluency). Crucially, the review found insufficient attention paid to ethical considerations such as bias (examined in less than 3% of studies), safety (addressed in less than 10%), and applications in low-resource settings (less than 3%). This highlights a significant gap in ensuring equitable and responsible deployment of RAG technologies.

Also Read:

Challenges and Future Directions

The review concludes that medical RAG is still in its early stages. To move towards real-world clinical implementation, several breakthroughs are needed. These include rigorous clinical validation to ensure factual accuracy and clinical actionability, establishing traceability and transparency mechanisms for outputs, and developing robust regulatory frameworks and ethical guidelines. Furthermore, significant progress is required in cross-linguistic and cross-cultural adaptation, as well as ensuring fairness in low-resource settings, to achieve safe, trustworthy, and responsible global use of RAG in healthcare.

The insights from this review by Rui Yang, Matthew Yu Heng Wong, Huitao Li, and their colleagues provide a critical roadmap for researchers and developers to address the current limitations and advance RAG technologies for the benefit of global health care.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Retrieval-Augmented Generation in Medicine: A Scoping Review of Current Landscape and Future Paths

Understanding RAG in Medicine

Key Findings from the Review

Challenges and Future Directions

Gen AI News and Updates

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

OneShield Achieves Landmark Registration Under Cloud Security Alliance AI Controls Matrix, Setting New Industry Standard

SeedAI Leads Utah’s Proactive Initiative for Ethical AI Integration in Business

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates