TLDR: This research paper introduces a robust single-stage mitotic figure detection method using the YOLOv12 object detection architecture. Developed for the MIDOG 2025 challenge, the approach achieved an F1-score of 0.801 on the preliminary test set without external data. It leverages diverse human and canine tumor datasets, advanced pre-processing including stain normalization, and sophisticated post-processing techniques like Test-Time Augmentation and Weighted Boxes Fusion. The method demonstrates strong, balanced performance and efficient inference, making it suitable for clinical deployment, while also outlining areas for future improvement in generalization.
Detecting mitotic figures (MFs) in tumor pathology is a critical task for assessing tumor aggressiveness and proliferation. However, this process has traditionally been challenging, often relying on manual visual counting by pathologists, which can lead to inconsistencies. The rise of digital pathology and artificial intelligence has opened new avenues for automated detection, aiming to improve accuracy and reproducibility.
The MItosis DOmain Generalization (MIDOG) challenges have been instrumental in establishing a rigorous benchmark for evaluating models under realistic conditions. The 2025 edition of the MIDOG challenge is particularly significant, featuring the largest mitosis-annotated dataset to date and introducing two main tasks: detecting mitotic figures in arbitrary tumor tissue and classifying them as atypical or normal.
Researchers Raphaël Bourgade, Guillaume Balezo, and Thomas Walter have presented a robust approach for mitotic figure detection based on the YOLOv12 object detection architecture. Their method achieved an impressive F1-score of 0.801 on the preliminary test set of the MIDOG 2025 challenge, without using any external data. You can read their full paper here.
The Approach: YOLOv12 for Precision
The core of their solution is a one-stage YOLOv12-m model, designed to combine object detection and classification for two classes: true mitosis and hard negatives. This model was trained using a comprehensive dataset provided by the challenge organizers.
Diverse Datasets for Robustness
The study utilized three key datasets: the MIDOG++ dataset, which includes 503 manually selected regions of interest (ROIs) covering seven diverse tumor types from both human and canine samples. This dataset is crucial for developing algorithms that can generalize across different species, tissue origins, and scanner variations. Additionally, two canine-specific datasets were used: the Canine Mammary Carcinoma (CMC) dataset with 21 whole-slide images (WSIs) and the Canine Cutaneous Mast Cell Tumor (CCMCT) dataset with 32 WSIs, both fully annotated with a high frequency of mitotic figures.
Smart Pre-processing and Training
Before training, the canine datasets underwent tissue segmentation and tiling into ROIs similar in size to MIDOG++. Further tiling into smaller 640×640 pixel patches with overlap helped prevent mitotic figures from being cut off at tile borders. A significant addition was the inclusion of 80,000 background tiles (without mitotic figures) to enhance the model’s ability to distinguish between mitotic and non-mitotic regions.
During training, a custom batch sampler was employed to ensure an equal representation of human and canine-derived images, mitigating species imbalance. Various data augmentations were applied, including horizontal and vertical flipping, small random rotations, and mosaic augmentation. Crucially, stain normalization using a Multi-target Macenko method was used to simulate natural H&E staining variations, improving the model’s generalization and robustness across different domains. Geometric scale-altering transforms were avoided to preserve the perceived size of mitoses.
Inference and Post-processing for Accuracy
At the testing phase, ROIs were again tiled into 640×640 patches with overlap and fed into the YOLOv12 detector. Several post-processing steps were then applied. Test-Time Augmentation (TTA) was used to evaluate each tile under different geometric transformations, merging predictions afterward to increase robustness and recall. To consolidate overlapping or redundant detections, a Weighted Boxes Fusion (WBF) strategy was implemented, which aggregates bounding boxes by computing a confidence-weighted average of their coordinates, thereby improving spatial precision and reducing duplicates.
Also Read:
- SDF-YOLO: A Focused AI Approach for Robust Mitotic Figure Detection in Pathology
- Enhancing Mitotic Figure Detection in Cancer Diagnostics with Attention-Guided AI
Promising Results and Future Directions
The proposed methodology achieved an F1-score of 0.801 on the MIDOG 2025 preliminary test set, with a precision of 0.808 and a recall of 0.794. This balanced performance indicates the YOLOv12 model effectively avoids both false positives and excessive false negatives. Furthermore, with an inference time of under 7 seconds per ROI on an NVIDIA A40 GPU, the algorithm demonstrates practical compatibility for clinical deployment.
The researchers highlight that a single-stage detector like YOLOv12, when combined with stain normalization and a tile-to-ROI level fusion approach, can deliver strong performance even on images from previously unseen domains. A key advantage of this method is its simplicity, as it does not rely on external data or model ensembling, which contributes to faster inference speeds and easier integration into real-world applications.
While effective, the approach does have limitations, particularly its dependence on fixed heuristic thresholds for confidence and post-processing, which might be suboptimal under significant domain or scanner shifts. An experimental two-stage pipeline, which included a classifier for refinement, did not outperform the single-stage YOLOv12 detector in their tests, suggesting that the initial candidate proposals were already of high quality.
Future research will focus on incorporating well-curated external datasets to introduce broader variability in staining protocols, tissue types, and scanner characteristics, further enhancing the model’s robustness to domain shifts. Additionally, training classifiers on larger, more heterogeneous datasets, especially with hard negative candidates from the YOLOv12 detector, could improve discriminative power and reduce reliance on fixed confidence thresholds.


