TLDR: A multi-centre study validated the Carebot AI Bones deep learning model for automated Cobb angle measurement in scoliosis. Comparing its performance against two expert radiologists using 103 X-rays from ten hospitals, the AI demonstrated accuracy and agreement comparable to human experts for both continuous angle measurements and four-grade severity classification. This suggests the AI can effectively streamline scoliosis assessment and triage in clinical settings.
Scoliosis, a condition characterized by a lateral curvature of the spine, affects a significant portion of the population, particularly adolescents. Accurate assessment of scoliosis relies heavily on measuring the Cobb angle from X-ray images. This measurement is crucial for diagnosis and determining the appropriate treatment pathway, which can range from observation to bracing or even surgery.
Traditionally, Cobb angle measurement is performed manually by radiologists. However, this process is not only time-consuming but also prone to variations between different observers, leading to potential inconsistencies in diagnosis and treatment decisions. Recognizing these challenges, researchers have been exploring the potential of deep learning approaches to automate this critical assessment.
A recent study, titled Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment, conducted a comprehensive evaluation of a fully automated deep-learning software called Carebot AI Bones, specifically its Spine Measurement functionality. This software, developed by Carebot s.r.o., aims to streamline scoliosis reporting and triage in clinical workflows by providing precise, automated Cobb angle measurements.
How the AI Software Works
The Carebot AI Bones software employs a two-stage deep-learning approach. First, it uses a YOLOv11 landmark detector, trained on a large dataset of expertly annotated X-ray images, to accurately locate the superior and inferior corners of vertebrae from C7 to L5. Following this, a geometry-based algorithm computes the Cobb angles from these detected landmarks. The software then classifies the scoliosis severity into four grades: no scoliosis (less than 10°), mild (10–24°), moderate (25–39°), or severe (40° or more). Designed for seamless integration, it connects directly into clinical Picture Archiving and Communication Systems (PACS) to automate image retrieval and result insertion.
The Multi-Centre Validation Study
To rigorously test the software’s performance, a retrospective, multi-centre study was conducted. Researchers collected 103 standing anteroposterior whole-spine radiographs from ten different hospitals. This diverse dataset was crucial for assessing the model’s generalizability across various clinical settings and equipment.
Two experienced musculoskeletal radiologists independently measured the maximal Cobb angle on each X-ray, serving as the reference standard. In parallel, the AI software analyzed the same images without any manual intervention. The measurements from the AI were then compared against those of both radiologists using several statistical methods, including Bland–Altman analysis, mean absolute error (MAE), root-mean-squared error (RMSE), Pearson correlation coefficient, and Cohen’s kappa for severity classification.
Key Findings and Performance
The study yielded promising results, demonstrating that the AI software achieved accuracy comparable to expert human radiologists. Against Radiologist 1, the AI had a mean absolute error (MAE) of 3.89° and a root-mean-squared error (RMSE) of 4.77°, with a small bias of 0.70°. Against Radiologist 2, the MAE was 3.90° and RMSE was 5.68°, with a bias of 2.14°. Notably, the inter-radiologist comparison showed an MAE of 3.30° and RMSE of 4.25°, indicating that the AI’s performance was very much in line with the variability observed between the two human experts.
Pearson correlation coefficients, which measure the linear association between measurements, were very high across all comparisons. The AI showed correlations of 0.906 with Radiologist 1 and 0.880 with Radiologist 2, closely rivaling the inter-reader correlation of 0.928. For the four-grade severity classification, Cohen’s kappa values indicated moderate to substantial agreement: 0.51 for AI vs. Radiologist 1, 0.64 for AI vs. Radiologist 2, and 0.59 for the inter-radiologist comparison. This means the AI’s classification of scoliosis severity was consistent with expert opinions, with most discrepancies falling into adjacent categories.
Also Read:
- Advancing Alzheimer’s Diagnosis Through Causal AI and Multi-Modal Data
- Glucose-ML: A New Resource for Advancing AI in Diabetes Management
Implications and Future Directions
The study’s findings suggest that the Carebot AI Bones software can reproduce expert-level Cobb angle measurements and categorical grading across multiple centers. This is a significant step forward, as many previous AI studies were limited to single institutions or narrowly defined patient populations. The multi-centre design and inclusion of diverse cases in this study enhance the generalizability of these results to real-world clinical practice.
While the study highlights the AI’s potential to enhance consistency and support efficient triage in scoliosis assessment, the authors acknowledge certain limitations. The cohort was predominantly pediatric and adolescent, which might limit generalizability to older adult populations. Additionally, the study did not assess the direct impact of AI integration on actual reporting time or downstream clinical decisions. Further prospective multi-center validation will be crucial to ascertain its full impact on clinical workflows and patient care.


