TLDR: A new AI method significantly improves polyp counting in colonoscopy videos by using a ‘temporally-aware supervised contrastive learning’ approach. It leverages both visual and temporal information to accurately identify and count polyps, reducing fragmentation errors by 2.2 times compared to previous methods.
Colonoscopy is a vital procedure for detecting and preventing colorectal cancer. A crucial aspect of this procedure is accurately counting the number of polyps observed, which helps in automated reporting and quality control. However, existing methods for polyp counting often struggle because they primarily focus on how polyps look visually and don’t fully use the time-based information from the video recordings.
A new research paper introduces a groundbreaking approach called “Temporally-Aware Supervised Contrastive Learning” to significantly improve polyp counting. This method addresses the limitations of previous techniques by incorporating temporal relationships, meaning it considers how polyps appear and behave over time in the video, not just their static appearance.
The core idea behind this new method is a “supervised contrastive loss” that is aware of time. In simpler terms, the system learns to recognize the same polyp even if its appearance changes slightly (intra-polyp variability) while still being able to tell different polyps apart (inter-polyp discriminability). It does this by giving more weight to tracklets (sequences of polyp detections) that are close in time, even if they look a bit different visually. This makes the learning process more robust and accurate.
Furthermore, the researchers enhanced the tracklet clustering process by adding a “temporal adjacency constraint.” This means the system is less likely to incorrectly group together polyps that look similar but appeared at very different times in the colonoscopy video. This helps in reducing false positive associations, leading to a more precise count.
The method involves two main steps: a visual-temporal encoder and a clustering module. The encoder processes polyp tracklets from the video, extracting features that capture both visual and temporal information. This encoder is trained using the novel temporally-aware supervised contrastive loss. The clustering module then takes these features and groups tracklets into distinct polyp entities, using both visual similarity and the new temporal penalty term.
Extensive experiments were conducted on publicly available datasets, including an expanded training set combining REAL-Colon, LDPolyp, SUN, and PolypSet datasets. The evaluation used a robust leave-one-out cross-validation strategy to ensure reliable results. The findings are impressive: the new method achieved a 2.2 times reduction in the fragmentation rate compared to prior state-of-the-art approaches. Fragmentation rate measures how often a single polyp is incorrectly split into multiple “tracklets,” so a lower rate indicates better accuracy.
Also Read:
- AI-Powered Crop Health: A New Deep Learning Model for Multi-Disease Detection
- Microsoft’s MAI-DxO AI System Demonstrates Superiority in Complex Medical Diagnosis, Outperforming Human Physicians
This research marks a significant step forward in automated polyp counting, highlighting the critical role of temporal awareness in achieving higher accuracy. The code for this innovative approach is also made publicly available, encouraging further research and development in this crucial area of medical imaging. You can find the full research paper here: Temporally-Aware Supervised Contrastive Learning for Polyp Counting in Colonoscopy.


