Advancing Medical Translation: How MedCOD Enhances Language Models for English-to-Spanish Healthcare Communication

TLDR: MedCOD is a new framework that significantly improves English-to-Spanish medical translation by integrating structured medical knowledge from UMLS and an LLM-as-Knowledge-Base into large language models. Combined with fine-tuning, MedCOD enables open-source LLMs to outperform proprietary systems like GPT-4o in clinical translation accuracy, addressing critical language barriers in healthcare.

Language barriers in healthcare can significantly impact patient care, especially for the millions of individuals with limited English proficiency in the United States. Electronic Health Records (EHRs) are crucial for patient engagement and communication, but their full benefits are often not realized by non-English speakers. This challenge is particularly acute for the Hispanic population, where a substantial percentage faces difficulties understanding medical forms, communicating with healthcare professionals, and following prescription guidelines.

While machine translation technologies have been explored to bridge this gap, general-purpose systems often fall short in ensuring clinical accuracy for complex medical texts. Large Language Models (LLMs) have shown promise in general translation, but their application in specialized biomedical translation, particularly for EHRs, has remained an area needing more exploration.

A new framework called MedCOD (Medical Chain-of-Dictionary) has been developed to address these critical issues. MedCOD is a hybrid approach designed to significantly improve English-to-Spanish medical translation by integrating structured, domain-specific knowledge into LLMs. It builds upon the Chain-of-Dictionary Prompting (COD) framework and incorporates knowledge from two key sources: the Unified Medical Language System (UMLS) and an LLM-as-Knowledge-Base (LLM-KB).

The MedCOD framework works by enriching the translation process with multi-layered domain knowledge. This includes translated medical terms, synonyms, and multilingual mappings obtained from both UMLS and the LLM-KB. This structured context helps LLMs better understand the nuances of medical language. The framework also combines this structured prompting with a lightweight fine-tuning technique called Low-Rank Adaptation (LoRA), allowing open-source models to adapt more effectively to specialized biomedical content.

Researchers constructed a parallel corpus of nearly 3,000 English-Spanish MedlinePlus articles and a 100-sentence test set, meticulously annotated with structured medical contexts. They evaluated four open-source LLMs: Phi-4, Qwen2.5-14B, Qwen2.5-7B, and LLaMA-3.1-8B. These models were tested using structured prompts that included multilingual variants, medical synonyms, and UMLS-derived definitions, combined with LoRA-based fine-tuning.

The experimental results were highly encouraging. MedCOD significantly improved translation quality across all evaluated models. For instance, Phi-4 with MedCOD and fine-tuning achieved a BLEU score of 44.23, a chrF++ score of 28.91, and a COMET score of 0.863. These scores surpassed strong baseline models like GPT-4o and GPT-4o-mini, demonstrating that MedCOD-enhanced open-source models can rival or even outperform proprietary systems in clinical translation accuracy.

Ablation studies confirmed that both the MedCOD prompting strategy and the model adaptation through fine-tuning independently contributed to these performance gains, with their combination yielding the highest improvements. This highlights the complementary nature of providing external knowledge and task-specific model adaptation.

Further analysis revealed that multilingual translation prompts generally yielded the highest scores, especially for fine-tuned models. However, the optimal prompting strategy could vary depending on the specific LLM architecture and whether it was fine-tuned, suggesting potential for adaptive prompt selection in future work.

The study also explored MedCOD’s applicability beyond English-to-Spanish translation, extending it to paragraph-level medical translation across six language pairs in the WMT24 Biomedical test set, and to multilingual summarization tasks using the MultiClinSum dataset. Consistent benefits were observed, indicating MedCOD’s robustness and generalizability across different tasks, diverse languages, and long, high-stakes medical texts.

Despite these advancements, the researchers acknowledge limitations, such as the dataset’s origin from standardized MedlinePlus articles, which might not fully capture the linguistic complexity of all clinical domains. Future work will also explore adaptability to other language pairs and address persistent issues like grammatical inconsistencies and stylistic awkwardness. For more in-depth information, you can read the full research paper here.

Also Read:

In conclusion, MedCOD offers a practical and scalable framework for enhancing biomedical translation. By equipping open-source LLMs with rich medical context, it paves the way for improved cross-lingual health communication, ultimately benefiting underrepresented populations and advancing healthcare accessibility.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Advancing Medical Translation: How MedCOD Enhances Language Models for English-to-Spanish Healthcare Communication

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates