AI-Powered Shoulder Disorder Diagnosis with Everyday Cameras

TLDR: This research introduces a low-cost, accessible method for preliminary shoulder disorder diagnosis using videos from consumer-grade cameras and Multimodal Large Language Models (MLLMs). Their Hybrid Motion Video Diagnosis (HMVDx) framework uses two MLLMs: one for action understanding (Gemini-1.5-Pro) and another for disease diagnosis (DeepSeek-R1), significantly improving diagnostic accuracy by 79.6% compared to direct video diagnosis. The study also proposes a new “Usability Index” to evaluate MLLMs in medical contexts, highlighting the potential of this technology for early detection, especially in areas with limited medical resources.

Shoulder disorders, such as frozen shoulder, are common conditions that affect many people globally, particularly the elderly and those who perform repetitive shoulder tasks. In areas where medical resources are scarce, getting an early and accurate diagnosis can be very challenging. This highlights a critical need for affordable and easily scalable diagnostic solutions.

A recent research paper introduces an innovative approach to address this problem: using videos captured by everyday consumer-grade cameras as the foundation for diagnosis. This method significantly reduces costs for users and makes preliminary diagnosis more accessible. The core of this research lies in the application of Multimodal Large Language Models (MLLMs) for diagnosing shoulder disorders.

Introducing HMVDx: A Hybrid Approach to Diagnosis

The researchers propose a novel framework called Hybrid Motion Video Diagnosis (HMVDx). This framework cleverly divides the complex task of diagnosis into two distinct parts: action understanding and disease diagnosis. Each part is handled by a different MLLM. This division of labor is crucial, as it reduces the complexity for each model and significantly improves diagnostic accuracy and reliability.

In the HMVDx framework, one MLLM (specifically, Gemini-1.5-Pro) is responsible for converting video information into detailed text descriptions of a patient’s actions. Following this, a separate reasoning large language model (DeepSeek-R1) takes these descriptions and, based on pre-set diagnostic rules, makes a judgment about the presence of a shoulder disorder. This sequential processing mimics how a medical professional might observe a patient’s movements before making a diagnosis.

Key Innovations and Contributions

Beyond the HMVDx framework itself, the research introduces several other important contributions:

Motion Trajectories Prompt Framework: This framework helps MLLMs understand human actions by analyzing orthopedic popular science videos and summarizing judgment actions and standards. It replaces numerical measurements with relative position descriptions to improve accuracy, making it easier for medical practitioners to adapt general LLMs for specific medical uses at a low cost.
Usability Index: Recognizing the limitations of traditional evaluation metrics in the medical field, the study proposes a new metric called the Usability Index. This index evaluates the effectiveness of MLLMs from the perspective of the entire medical diagnostic pathway, considering action recognition, movement diagnosis, and final diagnosis. It provides a more comprehensive tool for assessing the applicability of MLLMs in preliminary diagnosis.

Also Read:

Performance and Future Potential

Experimental comparisons showed that HMVDx significantly improved the accuracy of diagnosing shoulder joint injuries by 79.6% compared to direct video diagnosis methods. This demonstrates a substantial technical advancement for applying MLLMs to video understanding in medicine.

While HMVDx showed strong performance, especially in scenarios focusing on final judgment and logical consistency, the researchers acknowledge that current methods still face challenges in meeting the stringent requirements of real-world medical applications, particularly when considering the entire diagnostic process (Scenario 3). This indicates areas for future optimization, such as improving dynamic video analysis, body positioning accuracy, and action decomposition.

The study highlights the immense potential of low-cost MLLMs in assisting medical practitioners with diagnosing shoulder disorders. Future research could integrate advanced techniques like Supervised Fine-Tuning (SFT), Retrieval Augmented Generation (RAG), and Agentic AI to further enhance performance. Expanding research into multilingual environments and leveraging more diverse medical imaging and video data will also broaden the application scope and improve the model’s generalization ability.

This research represents a significant step towards creating accessible and affordable preliminary diagnostic tools for shoulder disorders, particularly beneficial for regions with limited medical resources. For more details, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI-Powered Shoulder Disorder Diagnosis with Everyday Cameras

Introducing HMVDx: A Hybrid Approach to Diagnosis

Key Innovations and Contributions

Performance and Future Potential

Gen AI News and Updates

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Get Well and RhythmX AI Unite to Form GW RhythmX, Pioneering AI-Native Healthcare Intelligence

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates