CDrugRed: A New Dataset for Chinese Discharge Drug Recommendations in Metabolic Diseases

TLDR: CDrugRed is the first publicly available Chinese dataset for discharge drug recommendations in metabolic diseases, built from 5,894 de-identified real-world EHRs. It addresses the scarcity of non-English medical datasets and includes comprehensive patient information. Benchmarking with LLMs shows that supervised fine-tuning is crucial for effective drug recommendation, significantly outperforming prompt-based methods, and establishing CDrugRed as a valuable resource for developing accurate clinical decision support systems.

A new and significant resource for advancing intelligent drug recommendation systems in China has been introduced: CDrugRed. This dataset addresses a critical gap in the field, specifically the scarcity of publicly available, real-world Electronic Health Records (EHR) datasets in languages other than English, particularly for Chinese patients.

The development of intelligent drug recommendation systems is vital for enhancing the quality and efficiency of clinical decision-making. These systems can help doctors select the most suitable medications by analyzing extensive patient data, including medical history, diagnoses, lab results, and co-existing conditions. However, the progress of such systems has been hindered by the lack of diverse and accessible datasets.

CDrugRed is the first publicly available Chinese drug recommendation dataset specifically designed for discharge medications in patients with metabolic diseases. Metabolic diseases, such as diabetes, hypertension, and fatty liver disease, are widespread chronic conditions with complex treatment plans. Ensuring continuity of care, especially with discharge medications, is crucial for managing these conditions and preventing readmissions.

The dataset comprises 5,894 de-identified medical records from 3,190 patients, along with 651 candidate drugs. These records were collected from a Grade A tertiary hospital in China between 2013 and 2023. The information within CDrugRed is comprehensive, covering patient demographics, medical history, clinical course during hospitalization, and discharge diagnoses. This rich detail is a key differentiator from other datasets, which often extract only partial information like diagnoses or surgery records.

The data collection process involved strict ethical and privacy protection standards. Patient records were carefully selected based on inclusion criteria such as age (18 or older), diagnosis of metabolic diseases (hypertension, hyperlipidemia, hyperglycemia, hyperuricemia), and data completeness. Records with severe allergies, participation in other clinical studies, or severe comorbidities were excluded.

To ensure patient privacy, sensitive information like names and phone numbers was de-identified using a large language model (Qwen3-30B-A3B) deployed on a local server. The same model was also used to extract medication-related content from discharge instructions and to standardize drug names, correcting misspellings and inconsistent suffixes. This two-stage normalization process, which also involved cross-referencing with the DXY database, ensures consistency and alignment with clinical terminology.

Statistical analysis of CDrugRed reveals interesting demographic insights. The majority of patients were middle-aged and elderly, aligning with the higher prevalence of metabolic diseases in these age groups. Hospital admissions for chronic metabolic conditions showed a gradual upward trend from 2015 to 2023. The most common discharge diagnoses include Type 2 Diabetes Mellitus, Hypertension, and Fatty Liver, often accompanied by complications. Correspondingly, frequently prescribed discharge medications include atorvastatin, aspirin enteric-coated tablets, acarbose, and metformin, which are standard treatments for these conditions.

The research paper also details benchmarking experiments conducted on CDrugRed using several state-of-the-art large language models (LLMs), including GLM4-9B-Chat, Llama3.1-8B-Instruct, and Qwen2.5-7B-Instruct. The goal was to evaluate the models’ ability to understand clinical contexts and make drug recommendations. Various inference strategies were tested: 0-shot, 1-shot, chain-of-thought (CoT) prompting, and supervised fine-tuning (SFT).

The results clearly demonstrated that supervised fine-tuning significantly outperformed all other prompting strategies. This highlights that while general LLMs possess impressive capabilities, they require specialized training with labeled data to effectively handle complex, domain-specific tasks like drug recommendation. Simple prompt-based methods (0-shot, 1-shot, CoT) showed limited benefits, and in some cases, CoT even performed worse than 0-shot, suggesting that current LLMs don’t reliably leverage generative reasoning chains for this task.

Among the models tested, GLM4 achieved the best performance under the SFT strategy. The study also observed that increasing model size generally led to improved performance, further emphasizing the potential of larger models when properly fine-tuned. A case study illustrated how SFT produced recommendations that were much more clinically accurate and relevant compared to the other prompting methods, which often included irrelevant medications.

Also Read:

CDrugRed has already been adopted in the 11th China Health Information Processing Conference (CHIP) Challenge, attracting over 500 participating teams. This underscores its value as a robust benchmark for future research in automated medication recommendation. While the dataset is a high-quality resource, a current limitation is its single-hospital origin, which might affect the generalizability of models trained on it. Future work aims to expand the dataset with data from multiple hospitals and diverse clinical departments to enhance its representativeness and robustness. You can find the full research paper here: CDrugRed Research Paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

CDrugRed: A New Dataset for Chinese Discharge Drug Recommendations in Metabolic Diseases

Gen AI News and Updates

Jorie AI Unveils SmartCore Engine: Revolutionizing Healthcare Intelligence and Automation

Get Well and RhythmX AI Unite to Form GW RhythmX, Pioneering AI-Native Healthcare Intelligence

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates