TLDR: EZ-Sort is a novel framework designed to make subjective data annotation more efficient. It achieves this by combining a zero-shot CLIP-based pre-ordering system, which uses hierarchical prompting to roughly sort items, with an uncertainty-aware human-in-the-loop MergeSort algorithm. This approach significantly reduces the number of human annotations required (up to 90.5% less than exhaustive comparisons and 19.8% less than previous sorting methods for n=100) while maintaining or improving inter-rater reliability across diverse tasks like face-age estimation, historical image chronology, and retinal image quality assessment.
Subjective data annotation tasks, such as assessing image quality or estimating age from faces, often rely on pairwise comparisons because they offer better reliability than simple ratings. However, traditional exhaustive pairwise comparisons demand a massive number of annotations, scaling quadratically with the number of items (O(n²)), making them impractical for large datasets.
Recent advancements have reduced this burden significantly by using sorting algorithms to actively sample comparisons, bringing the cost down to O(n log n). Building on this, a new framework called EZ-Sort further enhances efficiency by integrating artificial intelligence with human expertise.
Developed by Yujin Park and Haejun Chung from Hanyang University, and Ikbeom Jang from Hankuk University of Foreign Studies, EZ-Sort introduces two key innovations: first, it roughly pre-orders items using the Contrastive Language-Image Pre-training (CLIP) model without requiring any prior training; second, it automates easy and obvious comparisons, reserving human input for only the most uncertain cases. This hybrid approach aims to drastically cut down human annotation costs while maintaining or even improving the quality of the results.
How EZ-Sort Works
The EZ-Sort framework operates in three main stages:
1. CLIP-based Zero-Shot Pre-Ordering: This initial stage uses CLIP, a powerful vision-language model, to perform a rough, semantic pre-ordering of images. It employs a hierarchical prompting strategy, which recursively groups unsorted images using binary prompts. This method mimics how humans might categorize items from coarse to fine, making decisions at multiple levels to improve accuracy.
2. Bucket-Aware Elo Score Initialization: After the hierarchical pre-ordering, the fine-grained groups are merged into a smaller number of coarse ‘buckets’. Each image is then assigned an Elo score, a rating system commonly used in competitive games, based on its bucket ID and the confidence level from the CLIP model. This provides a strong starting point for the sorting process.
3. Uncertainty-Guided Human-in-the-Loop MergeSort: The final stage employs an uncertainty-aware MergeSort algorithm. Instead of asking humans to compare every pair, EZ-Sort selectively routes only high-uncertainty comparisons to human annotators. Comparisons where the model is highly confident are resolved automatically. This intelligent allocation of human effort ensures that the overall process remains efficient, preserving the optimal O(n log n) complexity of MergeSort.
Significant Efficiency Gains and Reliability
EZ-Sort was validated across various datasets, including face-age estimation (FGNET), historical image chronology (DHCI), and retinal image quality assessment (EyePACS). The results were compelling:
- It reduced human annotation cost by an impressive 90.5% compared to exhaustive pairwise comparisons.
- Compared to prior state-of-the-art sorting-based methods, EZ-Sort achieved a 19.8% reduction in human annotation cost for datasets with 100 items.
- Crucially, these efficiency gains were achieved while improving or maintaining inter-rater reliability, especially in ambiguous tasks like retinal image quality assessment.
- An ablation study confirmed that the hierarchical prompting strategy significantly improved correlation with ground truth labels compared to simpler, flat prompting methods.
The framework’s ability to combine strong CLIP-based priors with an intelligent, uncertainty-aware sampling strategy makes it a highly efficient and scalable solution for pairwise ranking tasks, particularly in domains where expert annotation is scarce and costly.
Also Read:
- Moving Beyond Words: Enhancing OCR Accuracy and Speed with Line-Level Recognition
- MMSearch-Plus: A New Benchmark Elevates the Challenge for AI Web Browsing
Future Directions and Availability
While EZ-Sort offers substantial benefits, the authors acknowledge its limitations, such as its dependence on the reliability of the underlying vision-language model and potential struggles in domains with very subtle visual distinctions. Future work includes integrating annotator reliability models, validating scalability on even larger datasets, and exploring few-shot fine-tuning for less common domains.
The code for EZ-Sort is publicly available, allowing researchers and practitioners to implement and build upon this innovative approach. You can find more details about this research in the full paper: EZ-Sort: Efficient Pairwise Comparison via Zero-Shot CLIP-Based Pre-Ordering and Human-in-the-Loop Sorting.


