ShizhenGPT: Advancing TCM Diagnostics with Multimodal AI

TLDR: ShizhenGPT is the first multimodal large language model (LLM) specifically designed for Traditional Chinese Medicine (TCM). It addresses the challenges of limited TCM data and the multimodal nature of TCM diagnostics by curating the largest TCM dataset to date (over 300GB of text and multimodal data). ShizhenGPT integrates deep TCM knowledge with the ability to interpret visual, auditory, olfactory, and pulse signals, aligning with TCM’s “Four Diagnostic Methods.” Evaluations show it outperforms comparable LLMs and competes with larger proprietary models in TCM expertise and visual understanding, paving the way for more holistic AI in TCM.

Traditional Chinese Medicine (TCM), a medical system with thousands of years of history, has remained largely separate from recent advancements in artificial intelligence (AI). This gap exists primarily due to two significant challenges: a scarcity of high-quality TCM data and the inherently multimodal nature of TCM diagnostics, which involve sensory-rich methods like looking, listening, smelling, and pulse-taking. Conventional large language models (LLMs) are typically limited to text, making them unsuitable for these complex diagnostic approaches.

To bridge this gap, researchers have introduced ShizhenGPT, the first multimodal LLM specifically designed for Traditional Chinese Medicine. This innovative model aims to bring AI closer to real-world clinical practice in TCM by understanding and reasoning across various sensory inputs.

Addressing Data Scarcity

One of ShizhenGPT’s foundational achievements is the creation of the largest TCM dataset to date. This extensive collection comprises over 100GB of text data, gathered from 3,256 TCM-specific books and various online sources. In addition to text, the dataset includes over 200GB of multimodal data, featuring 1.2 million annotated images, more than 200 hours of audio, and diverse physiological signals such as pulse and electrocardiograms (ECG).

This massive dataset is crucial for training a robust AI model, as previous TCM-specific LLMs often relied on less than 1GB of text, which is insufficient for the complexity of TCM theory.

Multimodal Capabilities for TCM Diagnostics

TCM diagnosis traditionally relies on the “Four Diagnostic Methods”: observing (e.g., tongue, visual cues), listening (e.g., voice, breath), smelling, and pulse-taking. ShizhenGPT is engineered to integrate these rich sensory modalities. Its architecture includes an LLM backbone for core reasoning, a vision encoder for visual inputs, and a signal encoder for continuous signals like voice, pulse, and smell.

The model undergoes a two-stage pre-training process. The first stage focuses on infusing knowledge from extensive TCM text, while the second introduces multimodal alignment through image-text and audio-text data. Following pre-training, an instruction-tuning phase aligns the model for instruction-following and extends its capabilities to various downstream tasks, including adapting to less data-rich modalities like sound and smell.

Performance and Evaluation

ShizhenGPT’s capabilities were rigorously evaluated using a comprehensive benchmark suite covering text, vision, and physiological signals. For textual understanding, the model was tested on recent national TCM qualification exams, including licensing exams for pharmacists, physicians, and assistant physicians, as well as postgraduate entrance exams. ShizhenGPT-7B, the smaller version, achieved the highest average score among comparable-scale LLMs, even outperforming some larger models.

In visual tasks, ShizhenGPT set a new state-of-the-art, demonstrating strong ability in medicinal recognition and visual diagnosis (e.g., interpreting tongue and palm images). Furthermore, it showed effective multimodal perception across various signal modalities, such as smell, ECG, and pulse, consistently outperforming random baselines. Notably, it achieved 80% accuracy in pregnancy detection from pulse signals alone.

Human evaluations conducted by licensed TCM practitioners also indicated a higher preference for ShizhenGPT’s responses compared to other leading models, highlighting its clinical relevance and insight.

Also Read:

Future Outlook

ShizhenGPT represents a significant step towards more holistic medical AI systems in Traditional Chinese Medicine. By expanding diagnostic capabilities beyond text-based interaction to include direct analysis of visual cues, sounds, and physiological signals, it brings AI interaction closer to real-world clinical practice. The datasets, models, and code for ShizhenGPT are publicly available, aiming to inspire further research and collaboration in this vital field.

While ShizhenGPT shows immense promise, the researchers acknowledge limitations, including the scarcity of high-quality signal data for certain modalities and the need for real-world clinical testing. The model is currently intended for scientific research and not for clinical deployment due to potential for inaccuracies. For more technical details, you can refer to the full research paper: ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

ShizhenGPT: Advancing TCM Diagnostics with Multimodal AI

Addressing Data Scarcity

Multimodal Capabilities for TCM Diagnostics

Performance and Evaluation

Future Outlook

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Jorie AI Unveils SmartCore Engine: Revolutionizing Healthcare Intelligence and Automation

Get Well and RhythmX AI Unite to Form GW RhythmX, Pioneering AI-Native Healthcare Intelligence

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates