Enhancing Drug Side Effect Detection with AI: A Look at RAG and GraphRAG Architectures

TLDR: This research introduces and evaluates two AI architectures, Retrieval Augmented Generation (RAG) and GraphRAG, designed to improve the accuracy of drug side effect retrieval using Large Language Models (LLMs). By integrating structured drug side effect knowledge from the SIDER 4.1 database, GraphRAG achieved near-perfect accuracy, significantly outperforming standalone LLMs and standard RAG methods in identifying drug-side effect associations. The study highlights the critical role of domain-specific knowledge augmentation for LLMs in pharmacovigilance.

Drug side effects are a significant global health concern, contributing to illness and death worldwide. Healthcare professionals often struggle to keep up with the rapid pace of new drug developments and their associated side effects, especially outside their primary areas of expertise. Existing tools like drug handbooks and electronic medical records can be time-consuming and limited in their search capabilities, highlighting a clear need for more efficient ways to assess drug side effects in clinical practice.

Large Language Models (LLMs) offer a promising solution with their intuitive, conversational interfaces, potentially streamlining clinical workflows and improving decision-making. However, off-the-shelf LLMs have limitations in specialized fields like pharmacovigilance. They often rely on black-box training data, are prone to ‘hallucinations’ (generating incorrect information), and lack specific domain knowledge, making them unreliable for nuanced medical data.

Introducing RAG and GraphRAG Architectures

To overcome these challenges, researchers have proposed two novel architectures: Retrieval Augmented Generation (RAG) and GraphRAG. These frameworks are designed to integrate comprehensive drug side effect knowledge into an LLM, specifically a Llama 3-8B Language Model. The goal is to enhance the LLM’s reliability and accuracy in identifying drug side effects.

The first architecture, RAG, improves LLMs by retrieving relevant information from an external Pinecone vector database. In this system, drug side effect information is stored as feature vectors. When a user asks a question, the RAG system retrieves the most similar information from this database and uses it to inform the LLM’s response. The data was processed into two formats: Format A, which lists all known side effects for a drug in a comma-separated list, and Format B, which lists each drug-side effect pair on a new line, offering more detailed granularity.

The second architecture, GraphRAG, takes a different approach by leveraging a Neo4j graph database. In this model, drugs and side effects are represented as distinct ‘nodes,’ and their known relationships (e.g., ‘may_cause_side_effect’) are encoded as ‘edges’ connecting these nodes. This graph structure allows for the storage and efficient handling of more complex relationships between drugs and their side effects. When a query is made, GraphRAG uses an entity recognition module to identify the drug and side effect, then constructs a precise query to search the graph database for a direct link between them.

How They Work: A Simplified Workflow

Both RAG and GraphRAG frameworks incorporate custom functions to optimize user prompts for accurate retrieval. For RAG, the user’s query is embedded and compared against the vector database. An entity recognition module extracts drug and side effect terms. Based on whether a match is found in the retrieved results, a modified prompt is generated for the Llama 3 model, asking for a simple YES or NO answer regarding the association. For GraphRAG, extracted entities are used to create a specific query for the Neo4j graph database. The results from the graph then inform a modified prompt for the Llama 3 model, also leading to a binary YES/NO response.

Remarkable Performance

The researchers conducted extensive evaluations on a balanced dataset of 19,520 drug-side effect associations, covering 976 drugs and 3,851 unique side effect terms from the Side Effect Resource (SIDER) 4.1 database. They compared the performance of GraphRAG and RAG frameworks against a standalone Llama 3-8B model, as well as larger models like ChatGPT 3.5 and ChatGPT 4 on a subset of data.

The results were striking: GraphRAG achieved near-perfect accuracy (0.9999), F1 score (0.9999), precision (0.9998), sensitivity (0.9999), and specificity (0.9998). This demonstrates its exceptional ability to accurately retrieve drug side effect information. In contrast, the standalone Llama 3-8B model performed poorly, with an accuracy of only 0.529, highlighting its limitations without specialized knowledge. Even ChatGPT 3.5 and 4 showed limited accuracy (around 0.55 and 0.63, respectively), underscoring that larger models still struggle without domain-specific augmentation.

The RAG models also showed significant improvements over the standalone LLM. RAG with Data Format B performed particularly well, achieving an accuracy of 0.998 and sensitivity of 0.999. This indicates that how the data is structured plays a crucial role in retrieval precision. GraphRAG consistently outperformed all other models across various drug categories and side effect classes, proving its robustness in handling complex drug-side effect relationships.

Also Read:

Looking Ahead

While these advancements are significant, the current framework has some limitations. It primarily relies on reported side effects in the SIDER 4.1 database, which may not capture unreported or emerging adverse events. The system currently supports only single-drug queries and does not accommodate reverse queries (e.g., “Which drugs cause hand-foot rash?”) or class-based queries. Future work aims to address these by integrating real-world, self-reported data from sources like FAERS and social media platforms. The researchers also plan to develop a conversational assistant interface, remove the binary output constraint, incorporate semantic search for handling mistyped drug names and synonyms, and explore deploying models on encrypted databases to mitigate privacy concerns. For more technical details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Drug Side Effect Detection with AI: A Look at RAG and GraphRAG Architectures

Introducing RAG and GraphRAG Architectures

How They Work: A Simplified Workflow

Remarkable Performance

Looking Ahead

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates