Advancing Cancer Diagnosis: YOLOv12 for Automated Mitotic Figure Detection

TLDR: This research paper introduces a robust single-stage mitotic figure detection method using the YOLOv12 object detection architecture. Developed for the MIDOG 2025 challenge, the approach achieved an F1-score of 0.801 on the preliminary test set without external data. It leverages diverse human and canine tumor datasets, advanced pre-processing including stain normalization, and sophisticated post-processing techniques like Test-Time Augmentation and Weighted Boxes Fusion. The method demonstrates strong, balanced performance and efficient inference, making it suitable for clinical deployment, while also outlining areas for future improvement in generalization.

Detecting mitotic figures (MFs) in tumor pathology is a critical task for assessing tumor aggressiveness and proliferation. However, this process has traditionally been challenging, often relying on manual visual counting by pathologists, which can lead to inconsistencies. The rise of digital pathology and artificial intelligence has opened new avenues for automated detection, aiming to improve accuracy and reproducibility.

The MItosis DOmain Generalization (MIDOG) challenges have been instrumental in establishing a rigorous benchmark for evaluating models under realistic conditions. The 2025 edition of the MIDOG challenge is particularly significant, featuring the largest mitosis-annotated dataset to date and introducing two main tasks: detecting mitotic figures in arbitrary tumor tissue and classifying them as atypical or normal.

Researchers Raphaël Bourgade, Guillaume Balezo, and Thomas Walter have presented a robust approach for mitotic figure detection based on the YOLOv12 object detection architecture. Their method achieved an impressive F1-score of 0.801 on the preliminary test set of the MIDOG 2025 challenge, without using any external data. You can read their full paper here.

The Approach: YOLOv12 for Precision

The core of their solution is a one-stage YOLOv12-m model, designed to combine object detection and classification for two classes: true mitosis and hard negatives. This model was trained using a comprehensive dataset provided by the challenge organizers.

Diverse Datasets for Robustness

The study utilized three key datasets: the MIDOG++ dataset, which includes 503 manually selected regions of interest (ROIs) covering seven diverse tumor types from both human and canine samples. This dataset is crucial for developing algorithms that can generalize across different species, tissue origins, and scanner variations. Additionally, two canine-specific datasets were used: the Canine Mammary Carcinoma (CMC) dataset with 21 whole-slide images (WSIs) and the Canine Cutaneous Mast Cell Tumor (CCMCT) dataset with 32 WSIs, both fully annotated with a high frequency of mitotic figures.

Smart Pre-processing and Training

Before training, the canine datasets underwent tissue segmentation and tiling into ROIs similar in size to MIDOG++. Further tiling into smaller 640×640 pixel patches with overlap helped prevent mitotic figures from being cut off at tile borders. A significant addition was the inclusion of 80,000 background tiles (without mitotic figures) to enhance the model’s ability to distinguish between mitotic and non-mitotic regions.

During training, a custom batch sampler was employed to ensure an equal representation of human and canine-derived images, mitigating species imbalance. Various data augmentations were applied, including horizontal and vertical flipping, small random rotations, and mosaic augmentation. Crucially, stain normalization using a Multi-target Macenko method was used to simulate natural H&E staining variations, improving the model’s generalization and robustness across different domains. Geometric scale-altering transforms were avoided to preserve the perceived size of mitoses.

Inference and Post-processing for Accuracy

At the testing phase, ROIs were again tiled into 640×640 patches with overlap and fed into the YOLOv12 detector. Several post-processing steps were then applied. Test-Time Augmentation (TTA) was used to evaluate each tile under different geometric transformations, merging predictions afterward to increase robustness and recall. To consolidate overlapping or redundant detections, a Weighted Boxes Fusion (WBF) strategy was implemented, which aggregates bounding boxes by computing a confidence-weighted average of their coordinates, thereby improving spatial precision and reducing duplicates.

Also Read:

Promising Results and Future Directions

The proposed methodology achieved an F1-score of 0.801 on the MIDOG 2025 preliminary test set, with a precision of 0.808 and a recall of 0.794. This balanced performance indicates the YOLOv12 model effectively avoids both false positives and excessive false negatives. Furthermore, with an inference time of under 7 seconds per ROI on an NVIDIA A40 GPU, the algorithm demonstrates practical compatibility for clinical deployment.

The researchers highlight that a single-stage detector like YOLOv12, when combined with stain normalization and a tile-to-ROI level fusion approach, can deliver strong performance even on images from previously unseen domains. A key advantage of this method is its simplicity, as it does not rely on external data or model ensembling, which contributes to faster inference speeds and easier integration into real-world applications.

While effective, the approach does have limitations, particularly its dependence on fixed heuristic thresholds for confidence and post-processing, which might be suboptimal under significant domain or scanner shifts. An experimental two-stage pipeline, which included a classifier for refinement, did not outperform the single-stage YOLOv12 detector in their tests, suggesting that the initial candidate proposals were already of high quality.

Future research will focus on incorporating well-curated external datasets to introduce broader variability in staining protocols, tissue types, and scanner characteristics, further enhancing the model’s robustness to domain shifts. Additionally, training classifiers on larger, more heterogeneous datasets, especially with hard negative candidates from the YOLOv12 detector, could improve discriminative power and reduce reliance on fixed confidence thresholds.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Advancing Cancer Diagnosis: YOLOv12 for Automated Mitotic Figure Detection

The Approach: YOLOv12 for Precision

Diverse Datasets for Robustness

Smart Pre-processing and Training

Inference and Post-processing for Accuracy

Promising Results and Future Directions

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates