TLDR: The CHIP 2025 Shared Task 2 focused on developing AI models for recommending discharge medications for metabolic diseases using Chinese Electronic Health Records (EHRs). Researchers created a new dataset, CDrugRed, for this multi-label classification challenge. Top-performing teams utilized advanced large language models (LLMs) with ensemble methods, data augmentation, and fine-tuning, significantly outperforming baseline models and demonstrating the potential of AI in personalized medication management for Chinese patients.
Ensuring patients receive the correct medications after leaving the hospital is crucial for their long-term health, especially for those managing chronic conditions like diabetes, hypertension, and fatty liver disease. These metabolic diseases often require complex, personalized treatment plans, and getting medication recommendations right at discharge can prevent readmissions and improve overall care. However, this process is challenging due to the multi-label nature of medication recommendations, the varied information found in clinical texts, and the unique needs of each patient.
To address these complexities, the China Health Information Processing Conference (CHIP) organized its 2025 Shared Task 2, focusing on discharge medication recommendation for metabolic diseases using real-world Chinese Electronic Health Records (EHRs). This initiative aimed to spur the development of advanced AI approaches to assist clinicians in optimizing personalized treatment strategies.
Introducing CDrugRed: A New Dataset for Chinese Healthcare
A significant contribution of this task was the creation of CDrugRed, a high-quality dataset specifically designed for Chinese discharge medication recommendations. CDrugRed comprises 5,894 de-identified hospitalization records from 3,190 patients in China, collected between 2013 and 2023 from a top-tier hospital. This dataset is rich with clinical information, including patient demographics, admission conditions, inpatient clinical course, laboratory results, past medical history, discharge diagnoses, and the actual medications prescribed at discharge. It includes a predefined list of 651 candidate drugs for recommendation.
The development of CDrugRed is particularly important because most existing drug recommendation datasets are based on English-language data, leaving a gap in resources for Chinese clinical practice. CDrugRed provides a valuable benchmark for developing and evaluating intelligent medication recommendation systems tailored to the unique characteristics of Chinese healthcare.
The Competition and Its Outcomes
The CHIP 2025 Shared Task 2 attracted considerable interest, with 526 teams registering for the competition. Of these, 167 teams submitted valid results for Phase A (development) and 95 teams participated in Phase B (final evaluation). Participants were encouraged to explore various technical solutions, including traditional machine learning, deep learning, pretrained language models, and retrieval-augmented generation methods, with a model size limit of 10 billion parameters.
The task was framed as a multi-label classification problem, evaluated using Jaccard and F1 scores, which measure the overlap and accuracy of predicted medication lists compared to actual prescriptions. The results demonstrated significant progress in the field. The top-performing team achieved a Jaccard score of 0.5102 and an F1 score of 0.6267 on the final test set, showcasing a notable improvement over the established baseline model.
Advanced AI Strategies Lead the Way
The success of the top teams highlighted the potential of advanced large language model (LLM)-based ensemble systems. Common strategies employed by these high-ranking teams included:
- **Multi-dimensional Feature Enhancement:** Incorporating drug category annotations, patient meta-features, and disease-drug co-occurrence knowledge to enrich clinical understanding.
- **Data Augmentation:** Using techniques like order perturbations in diagnostic and medication lists, and pseudo-labeling to expand training data and improve model robustness.
- **Supervised Fine-tuning of LLMs:** Adapting powerful LLMs like the Qwen-series and GLM4-9B-Chat using methods like LoRA (Low-Rank Adaptation) to specialize them for medication recommendation.
- **Multi-scale Model Fusion and Ensemble:** Combining predictions from multiple models, often with hierarchical weighted-voting mechanisms, to balance bias and variance and achieve more stable and accurate recommendations.
- **Prompt Engineering:** Designing specific prompt templates to guide LLMs in clinical medication reasoning.
For instance, the top-ranked team, DeepDrug, developed a generative recommendation framework that integrated multi-dimensional feature enhancement and multi-scale model fusion, achieving the highest overall score. Other top teams like ZZUNLP and suxiao818 also leveraged sophisticated ensemble strategies and data augmentation to significantly boost their performance.
Also Read:
- POLIS-Bench: A New Framework for Evaluating AI in Bilingual Government Policy
- Boosting LLM Performance with Implicit Federated In-Context Learning
Looking Ahead
The CHIP 2025 Shared Task 2 has established a crucial benchmark for discharge medication recommendation in metabolic diseases using Chinese EHR data. The high level of participation and the diverse, innovative solutions presented underscore the growing interest and potential of AI in clinical decision support. While significant advancements have been made, challenges remain, such as addressing label imbalance, rare medication usage, and ensuring generalizability across different institutions.
Future work aims to expand the CDrugRed dataset with more clinical contexts and multi-modal data (like laboratory and medical imaging results). The goal is to move beyond just recommending drug names to generating complete medication regimens, including dosage and instructions, ultimately fostering more explainable and trustworthy AI systems for clinical medication applications. You can find more details about this research paper here.


