spot_img
HomeResearch & DevelopmentEHR-MCP: Bridging Large Language Models with Electronic Health Records

EHR-MCP: Bridging Large Language Models with Electronic Health Records

TLDR: A study evaluated EHR-MCP, a framework integrating large language models (LLMs) with hospital electronic health records (EHR) via the Model Context Protocol (MCP). Using GPT-4.1, the system autonomously retrieved clinically relevant information for infection control tasks in a real hospital setting. Simple tasks achieved near-perfect accuracy, while complex tasks showed challenges related to argument specification and interpretation of lengthy tool outputs. The research demonstrates the potential of LLMs for secure clinical data access and lays groundwork for hospital AI agents, highlighting areas for future development in reasoning and generation.

Large language models (LLMs) are rapidly advancing, showing immense potential across various fields, including medicine. However, integrating these powerful AI systems into real-world hospital environments, especially with sensitive electronic health record (EHR) systems, presents significant challenges. A recent study introduces EHR-MCP, a novel framework designed to bridge this gap by enabling LLMs to autonomously retrieve clinically relevant information from hospital EHRs.

The core idea behind EHR-MCP is the Model Context Protocol (MCP), a standardized interface that allows LLMs to interact with external tools. This protocol reduces the complexity and cost associated with integrating LLMs with diverse hospital information systems. The research aimed to evaluate the accuracy and effectiveness of an LLM, specifically GPT-4.1, connected to an EHR database via EHR-MCP in a live hospital setting.

How EHR-MCP Works

The EHR-MCP framework operates by synchronizing data from the hospital’s EHR system with an in-hospital data warehouse daily. Custom MCP tools, implemented in Python, provide a secure way to query this data using SQL. An LLM client, in this case, GPT-4.1, interacts with these tools through a LangGraph ReAct agent. This agent allows the LLM to dynamically select and execute appropriate tools based on a user’s query, interpret the results, and then generate a final answer. This iterative process mirrors how clinicians gather information, making the AI agent more compatible with human-AI collaboration.

Evaluating Performance in a Real Hospital

The study tested EHR-MCP with six tasks derived from real-world use cases of an infection control team (ICT) at Keio University Hospital. These tasks were categorized into two types: simple tasks, requiring a single tool call (e.g., retrieving body weight or lab data), and complex tasks, demanding multi-step tool use and reasoning (e.g., calculating creatinine clearance or counting antibiotic administration days after a negative blood culture). Eight patient cases, discussed at ICT conferences, were retrospectively analyzed, and the LLM’s outputs were compared against physician-generated gold standards.

Key Findings

The results were promising. The LLM consistently demonstrated the ability to select and execute the correct MCP tools. For simple tasks, EHR-MCP achieved near-perfect accuracy. This indicates that LLMs can reliably retrieve straightforward clinical data when given the right tools.

However, performance was lower in complex tasks, particularly those requiring time-dependent calculations or multi-step interpretation. The study identified that most errors stemmed from two main areas: incorrect arguments passed to the tools (e.g., specifying an inappropriate data retrieval window) and misinterpretation of lengthy or complex tool results by the LLM. For instance, the model sometimes failed to restrict retrieval to the most recent results or included non-blood culture results when only blood cultures were requested.

Despite these challenges, the responses from EHR-MCP were generally reliable. The researchers also noted that lengthy and repetitive data in tool responses sometimes risked exceeding the LLM’s context window, leading to potential degradation in response quality or increased API costs. Hallucinations were also observed when required information was unavailable, though the LLM sometimes recognized these failures.

Also Read:

Implications and Future Directions

This research demonstrates that LLMs, when integrated with EHRs via MCP tools, can autonomously retrieve clinically relevant information in a real hospital setting. This capability is foundational for developing advanced clinical AI agents. EHR-MCP provides a secure and consistent infrastructure for data access, which can accelerate the deployment of generative AI projects across hospital departments.

While the study focused on tool-use capability, future work will expand to evaluate the LLM’s reasoning and generation abilities, as well as its clinical impact on patient outcomes and workflow efficiency. The goal is to move beyond simple retrieval to more comprehensive AI agents that can support complex decision-making in specialties like infectious disease management. You can read the full research paper here: EHR-MCP: Real-world Evaluation of Clinical Information Retrieval by Large Language Models via Model Context Protocol.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -