TLDR: DoPI is a novel AI system designed to improve Traditional Chinese Medicine (TCM) diagnosis by enabling large language models (LLMs) to conduct proactive, multi-turn dialogues with patients. It uses a collaborative architecture with a guidance model for questioning, an expert model for diagnosis, and a knowledge graph to guide interactions. DoPI achieved 84.68% diagnostic accuracy, significantly outperforming other LLMs by effectively gathering critical symptom information and maintaining professional expertise, addressing a key limitation of current medical AI.
Artificial intelligence (AI) has made incredible strides in language understanding and generation, but applying these large language models (LLMs) to specialized fields like medicine, especially Traditional Chinese Medicine (TCM), comes with unique challenges. One major hurdle is the AI’s inability to conduct multi-turn dialogues and proactively ask patients questions, which is crucial for accurate diagnosis in real-world medical scenarios.
Current medical LLMs often fall short because they tend to provide diagnoses too early, based on incomplete initial information. Experiments with existing models like Sunsimiao, HuatuoGPT, and BianQue showed very low diagnostic accuracy rates (ranging from 14.73% to 21.27%) when given only partial symptom descriptions. This highlights a significant gap: while these models might be good at answering direct questions, they lack the ‘doctor-like’ ability to interrogate and gather more details from a patient.
To address this, researchers have introduced DoPI, a novel LLM system specifically designed for the TCM domain. DoPI stands for Doctor-like Proactive Interrogation LLM. Its innovative architecture features two main components working in harmony: a guidance model and an expert model.
How DoPI Works
The guidance model is responsible for engaging in multi-turn conversations with patients. It doesn’t just wait for information; it actively generates questions based on a sophisticated knowledge graph. This knowledge graph helps it efficiently extract critical symptom information from the patient. Think of it as the AI’s conversational front-end, guiding the patient through a series of relevant questions to build a comprehensive picture of their condition.
Simultaneously, the expert model, which is built upon the robust Sunsimiao framework and fine-tuned on extensive, high-quality TCM datasets, leverages deep medical expertise. Once the guidance model has gathered sufficient information, the expert model steps in to provide the final diagnosis and a tailored treatment plan. This separation of roles ensures that the system can maintain its professional medical knowledge while also being highly effective in patient communication.
A key innovation in DoPI is its ability to conduct these dialogues without sacrificing medical accuracy. Unlike some previous approaches that might reduce a model’s expertise when trying to improve its conversational skills, DoPI’s collaborative framework ensures that professional knowledge is retained. It also incorporates an update mechanism for its knowledge graph, allowing it to refine the relationships between symptoms and diseases based on successful diagnoses, much like a doctor learns from experience.
Furthermore, DoPI integrates tongue diagnosis as an auxiliary input. Patients can submit photographs of their tongue coating, which are then analyzed by a convolutional neural network (CNN) to classify their constitution type. This visual information is combined with the textual symptom data, providing the expert model with a more comprehensive understanding of the patient’s condition.
Also Read:
- Revolutionizing Medical Diagnosis: How AI’s KERAP Framework Offers Accurate Zero-Shot Predictions
- AI Models Streamline Clinical Data Standardization with HL7 FHIR
Building and Evaluating DoPI
Given the scarcity of high-quality multi-turn doctor-patient dialogue data in TCM, the researchers constructed a new dataset by having LLMs simulate doctor and patient roles. This allowed them to create realistic consultation scenarios where the ‘doctor’s’ questions were guided by the knowledge graph, mimicking a real TCM practitioner’s logical reasoning.
In evaluations, DoPI demonstrated impressive performance. It achieved an accuracy rate of 84.68% in interrogation outcomes, significantly outperforming other large-scale LLMs like Qwen2.5-Max (32.31%), ChatGPT-4o (35.12%), and DeepSeek-v3 (58.74%) in diagnostic accuracy. DoPI also showed superior performance in its ‘Q&A Ratio’ (how proactively it asks questions) and ‘Interrogation Distance’ (how closely its questioning process aligns with a professional physician).
An additional evaluation using LLMs as judges further confirmed DoPI’s strengths in knowledgeability, professionalism, fluency, and respectfulness compared to other models. This indicates that DoPI not only gets the diagnosis right more often but also interacts with patients in a more natural and medically sound manner.
The DoPI system represents a significant step forward in applying AI to Traditional Chinese Medicine. By effectively combining a conversational guidance model with a deep-expertise expert model through a dynamic knowledge graph, it offers a promising path towards more accurate, proactive, and patient-friendly AI diagnostic tools in healthcare. For more details, you can refer to the full research paper here.


