TLDR: A new AI method combines YOLO for efficient object detection and SAM for precise image segmentation to accurately identify anatomical landmarks and complex structures on orthopaedic X-rays. This hybrid approach overcomes limitations of single models, requires fewer resources for training, and achieves high precision, making it a scalable solution for medical diagnostics.
Medical imaging, particularly X-rays in orthopaedics, is fundamental for diagnosing bone and joint conditions. A crucial step in analyzing these images is identifying specific anatomical points, known as landmarks. These landmarks help doctors measure angles and ratios, which are vital for accurate diagnoses and treatment planning. Traditionally, this process can be time-consuming and limited by available commercial software, creating a need for more flexible and scalable automated solutions.
Recent advancements in artificial intelligence, especially with “foundation models” in computer vision, offer new possibilities. These models are trained on vast amounts of image data, making them highly adaptable to specific tasks with much smaller, task-specific datasets. One such model is the Segment Anything Model (SAM), which excels at segmenting (outlining) objects in images. However, SAM, and even its medical variant MedSAM, aren’t inherently designed for the very precise, fine-grained landmark detection needed for orthopaedic images like pelvic X-rays. They typically require “prompts” to guide their segmentation, and these prompts are highly specific for medical landmarks.
To overcome this challenge, researchers proposed a novel approach combining two general-purpose models: YOLO (You Only Look Once) and SAM. YOLO is renowned for its efficiency in object detection, meaning it can quickly identify and draw bounding boxes around objects. These bounding boxes can then serve as precise input prompts for SAM. While YOLO is fast at detection, SAM is superior at segmenting complex shapes with high accuracy. By combining them, the strengths of both models are leveraged.
The study used a dataset of 100 anonymized frontal pelvic radiographs, annotated with 72 individual landmarks and 18 additional regions and outlines. The goal was to develop a pipeline that could accurately detect these features, even when they overlapped, and be scalable and easy to use in a hospital setting without extensive AI expertise. Importantly, the chosen models needed to be fine-tunable without requiring prohibitively expensive, high-end computing resources.
The researchers opted for Ultralytics YOLOv11 for landmark localization due to its efficiency, speed, accuracy, and ability to be trained on more modest hardware like a laptop with an NVIDIA RTX 3050 video card. For segmentation, they used the Huggingface version of SAM, specifically with weights from MedSAM, which allowed SAM to recognize medical image features without needing extensive re-training of its core components. This meant only SAM’s decoder needed fine-tuning, further reducing computational demands.
The experiment was conducted in two stages. Initially, a smaller set of eight landmarks was used to match previous studies and reduce training time. In this phase, YOLOv11-n (a smaller version of YOLOv11) showed promising results for detecting landmark locations, with median errors within the acceptable 3 mm range for medical image analysis. However, YOLO’s segmentation capabilities alone were not accurate enough for the task.
In the second stage, the problem was scaled up to the full set of 72 landmarks and 18 patches/outlines. Here, YOLO was used to detect the locations and provide bounding boxes, which were then fed into SAM for precise segmentation. This hybrid approach proved highly effective. For the 15 unseen test cases, the median error for identified landmarks was 1.66 mm, and the mean error was 2.30 mm, both well within the 3 mm acceptable range. For patches and outlines, the median Intersection over Union (IoU) was 0.74, and the mean IoU was 0.77, indicating good segmentation accuracy.
While some closely positioned landmarks were occasionally missed by YOLO in the expanded dataset (7% of landmarks and 11% of outlines/patches were not identified), the overall precision of the identified features remained high. The authors suggest that increasing the training dataset size could further enhance the pipeline’s performance, as the current 100 radiographs might be insufficient for YOLO to differentiate all closely located points reliably. The developed pipeline also allows for iterative improvement, where medical practitioners can review and correct AI-generated labels, which can then be used to continuously fine-tune the models.
Also Read:
- Enhancing Spinal Vertebrae Contouring on X-Rays with a New U-Net Architecture
- Guiding the Segment Anything Model: A Deep Dive into Prompt Engineering
In conclusion, this study demonstrates that combining YOLO for efficient detection and SAM for accurate segmentation creates a robust and scalable pipeline for anatomical landmark detection and segmentation in orthopaedic pelvic radiographs. This approach reduces the need for large, custom-trained datasets and expensive computing resources, making advanced AI tools more accessible for medical diagnostics. For more technical details, you can refer to the full research paper available here.


