Deep Learning Model Validated for Accurate Scoliosis Assessment

TLDR: A multi-centre study validated the Carebot AI Bones deep learning model for automated Cobb angle measurement in scoliosis. Comparing its performance against two expert radiologists using 103 X-rays from ten hospitals, the AI demonstrated accuracy and agreement comparable to human experts for both continuous angle measurements and four-grade severity classification. This suggests the AI can effectively streamline scoliosis assessment and triage in clinical settings.

Scoliosis, a condition characterized by a lateral curvature of the spine, affects a significant portion of the population, particularly adolescents. Accurate assessment of scoliosis relies heavily on measuring the Cobb angle from X-ray images. This measurement is crucial for diagnosis and determining the appropriate treatment pathway, which can range from observation to bracing or even surgery.

Traditionally, Cobb angle measurement is performed manually by radiologists. However, this process is not only time-consuming but also prone to variations between different observers, leading to potential inconsistencies in diagnosis and treatment decisions. Recognizing these challenges, researchers have been exploring the potential of deep learning approaches to automate this critical assessment.

A recent study, titled Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment, conducted a comprehensive evaluation of a fully automated deep-learning software called Carebot AI Bones, specifically its Spine Measurement functionality. This software, developed by Carebot s.r.o., aims to streamline scoliosis reporting and triage in clinical workflows by providing precise, automated Cobb angle measurements.

How the AI Software Works

The Carebot AI Bones software employs a two-stage deep-learning approach. First, it uses a YOLOv11 landmark detector, trained on a large dataset of expertly annotated X-ray images, to accurately locate the superior and inferior corners of vertebrae from C7 to L5. Following this, a geometry-based algorithm computes the Cobb angles from these detected landmarks. The software then classifies the scoliosis severity into four grades: no scoliosis (less than 10°), mild (10–24°), moderate (25–39°), or severe (40° or more). Designed for seamless integration, it connects directly into clinical Picture Archiving and Communication Systems (PACS) to automate image retrieval and result insertion.

The Multi-Centre Validation Study

To rigorously test the software’s performance, a retrospective, multi-centre study was conducted. Researchers collected 103 standing anteroposterior whole-spine radiographs from ten different hospitals. This diverse dataset was crucial for assessing the model’s generalizability across various clinical settings and equipment.

Two experienced musculoskeletal radiologists independently measured the maximal Cobb angle on each X-ray, serving as the reference standard. In parallel, the AI software analyzed the same images without any manual intervention. The measurements from the AI were then compared against those of both radiologists using several statistical methods, including Bland–Altman analysis, mean absolute error (MAE), root-mean-squared error (RMSE), Pearson correlation coefficient, and Cohen’s kappa for severity classification.

Key Findings and Performance

The study yielded promising results, demonstrating that the AI software achieved accuracy comparable to expert human radiologists. Against Radiologist 1, the AI had a mean absolute error (MAE) of 3.89° and a root-mean-squared error (RMSE) of 4.77°, with a small bias of 0.70°. Against Radiologist 2, the MAE was 3.90° and RMSE was 5.68°, with a bias of 2.14°. Notably, the inter-radiologist comparison showed an MAE of 3.30° and RMSE of 4.25°, indicating that the AI’s performance was very much in line with the variability observed between the two human experts.

Pearson correlation coefficients, which measure the linear association between measurements, were very high across all comparisons. The AI showed correlations of 0.906 with Radiologist 1 and 0.880 with Radiologist 2, closely rivaling the inter-reader correlation of 0.928. For the four-grade severity classification, Cohen’s kappa values indicated moderate to substantial agreement: 0.51 for AI vs. Radiologist 1, 0.64 for AI vs. Radiologist 2, and 0.59 for the inter-radiologist comparison. This means the AI’s classification of scoliosis severity was consistent with expert opinions, with most discrepancies falling into adjacent categories.

Also Read:

Implications and Future Directions

The study’s findings suggest that the Carebot AI Bones software can reproduce expert-level Cobb angle measurements and categorical grading across multiple centers. This is a significant step forward, as many previous AI studies were limited to single institutions or narrowly defined patient populations. The multi-centre design and inclusion of diverse cases in this study enhance the generalizability of these results to real-world clinical practice.

While the study highlights the AI’s potential to enhance consistency and support efficient triage in scoliosis assessment, the authors acknowledge certain limitations. The cohort was predominantly pediatric and adolescent, which might limit generalizability to older adult populations. Additionally, the study did not assess the direct impact of AI integration on actual reporting time or downstream clinical decisions. Further prospective multi-center validation will be crucial to ascertain its full impact on clinical workflows and patient care.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Deep Learning Model Validated for Accurate Scoliosis Assessment

How the AI Software Works

The Multi-Centre Validation Study

Key Findings and Performance

Implications and Future Directions

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates