TLDR: A study evaluated Multimodal Large Language Models (MLLMs) for Adolescent Idiopathic Scoliosis (AIS) self-management using a “Divide and Conquer” framework. It found MLLMs struggle with precise spinal X-ray interpretation (deformity location/direction) but significantly improve in domain knowledge and patient education tasks when enhanced with visual prompts and a specialized knowledge base (Retrieval-Augmented Generation). The research concludes that current MLLMs are not yet ready for automated AIS self-management but show promise with targeted improvements.
A recent study introduces a novel framework for evaluating Multimodal Large Language Models (MLLMs) in the context of Adolescent Idiopathic Scoliosis (AIS) self-management. This research, detailed in the paper “Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework”, addresses a critical gap in medical AI: the application of advanced language models to spinal deformities, an area often overlooked due to limited specialized data.
Adolescent Idiopathic Scoliosis is a common spinal deformity affecting young people, typically during growth spurts. While clinical treatments are vital, patient self-management—including exercise, therapy adherence, and mental well-being—plays a significant role in recovery and long-term quality of life. MLLMs have shown impressive capabilities in analyzing medical images and providing advice for conditions like chest radiographs, but their effectiveness for complex spinal diseases like AIS, which requires precise assessment of curve patterns and specialized knowledge, has been largely unexplored.
The “Divide and Conquer” Approach
To systematically assess MLLMs, the researchers developed a “Divide and Conquer” framework. This approach breaks down the complex requirements of AIS self-management into three distinct evaluation tasks:
- Visual Spinal Assessment (VSA): This task evaluates an MLLM’s ability to analyze spinal X-rays for disease progression. It includes three sub-tasks: AIS Diagnosis (determining presence or absence of scoliosis), Spinal Deformity Location Detection (identifying if the curve is in the thoracic, thoracolumbar, or lumbar segments), and Spinal Deformity Direction Detection (assessing if the curve is leftward or rightward).
- Domain Knowledge Assessment (DKA): This multiple-choice task gauges the MLLM’s understanding of AIS-specific professional knowledge, covering areas like basic knowledge, etiology, diagnosis, treatment options, and complications.
- Patient Education and Counseling Assessment (PECA): This patient-oriented question-answering task evaluates how well MLLMs can provide accurate and accessible information to patients, adapting responses based on the severity of their spinal deformity (mild, moderate, severe).
Enhancing MLLM Performance
The study also explored methods to improve MLLM performance. A database of approximately 3,000 anteroposterior X-rays with diagnostic texts was constructed, representing the largest specialized image-text database for AIS to date. To enhance visual interpretation, three visual prompting strategies were introduced: Curved Spine Midline (CSM), Vertebral Connection Line (VCL), and Segmented Vertebrae Marks (SVM). These prompts provide critical anatomical information to the models.
For knowledge-intensive tasks (DKA and PECA), a Retrieval-Augmented Generation (RAG) framework was implemented. This involved compiling an AIS-specific knowledge base from authoritative sources like clinical guidelines, PubMed research, and patient education resources from organizations such as the Scoliosis Research Society. Gemini was used to generate structured knowledge graphs to optimize information retrieval.
Also Read:
- Enhancing Clinical Reasoning in AI: A Compartmentalized Approach for Language Models
- AI’s Next Leap in Healthcare: Dynamic Reasoning with Temporal Graphs and Multi-Agent Systems
Key Findings and Future Directions
The evaluation revealed mixed results. While structured visual cues generally improved diagnostic accuracy in the VSA task, their effectiveness varied significantly across different MLLM architectures. Notably, current MLLMs still face substantial challenges in accurately detecting spinal deformity locations (with a best accuracy of 0.55) and directions (best accuracy of 0.13). This indicates a fundamental limitation in their ability to interpret complex spinal radiographs precisely.
However, the RAG approach demonstrated significant improvements in both the DKA and PECA tasks. Models showed substantial gains in medical accuracy and safety when augmented with the AIS knowledge base. Interestingly, performance gaps between larger and smaller models narrowed with RAG, suggesting that retrieval augmentation can effectively compensate for limited parameters in specialized medical applications.
In conclusion, the research highlights that while current MLLMs show promise in specialized tasks and can be significantly enhanced with anatomical guidance and knowledge augmentation, they are not yet capable of fully realizing personalized assistance in AIS self-management. The study provides a clear roadmap for future improvements, emphasizing the need for advancements in foundational MLLM capabilities and deeper specialized medical understanding to support, rather than replace, human expertise in AIS care.


