TLDR: This research introduces a greedy algorithm for selecting a subset of human experts to collaborate with AI in classification tasks. By leveraging “conformal prediction sets” (AI-generated sets of highly probable labels), the algorithm helps human experts make more accurate decisions. The study demonstrates that this method significantly improves classification performance compared to naive approaches and other set-based predictors, especially in multi-expert scenarios, by focusing human expertise on relevant options.
In the evolving landscape of artificial intelligence, the collaboration between human experts and AI systems is becoming increasingly vital, especially in high-stakes fields like medicine, finance, and scientific discovery. This partnership, known as human-AI complementarity, aims to combine the strengths of both to achieve better outcomes than either could alone. While much research has focused on improving AI algorithms, a new study delves into how humans and AI can work together more effectively, particularly when multiple human experts are involved.
Traditional approaches to human-AI collaboration often focus on a single human expert or on systems where AI defers decisions to humans when it lacks confidence. However, real-world decision-making frequently involves multiple experts. This new research, titled Conformal Set-based Human-AI Complementarity with Multiple Experts, addresses this gap by proposing a framework for human-AI teamwork that incorporates a pool of multiple human experts.
The core idea revolves around “conformal prediction sets.” Imagine an AI system that, instead of giving a single answer, provides a small set of highly probable answers for a given task. For example, in an image classification task, if an AI is unsure whether an image is a “cat” or a “dog,” it might present both options. Human experts then choose the most appropriate label from this narrowed set. This approach helps humans by reducing the overwhelming number of possibilities, allowing them to focus their expertise more effectively.
The researchers, Helbert Paat and Guohao Shen from The Hong Kong Polytechnic University, highlight that existing studies often limit their scope to single-expert scenarios. Their work expands this by characterizing the conditions under which multiple experts can significantly benefit from these conformal sets. They introduce a novel greedy algorithm designed to select the most relevant subset of human experts for each specific instance. This is crucial because, for any given task, not all experts may be equally relevant or accurate.
The Greedy Algorithm for Expert Selection
The proposed greedy algorithm works by identifying which human experts, even without prior knowledge of the conformal set, are more likely to choose an answer from within that set. It leverages a “confusion matrix” for each expert, which essentially maps how an expert’s predictions align with the true labels. By understanding these individual expert tendencies, the algorithm can strategically select a subset of experts whose combined insights are most likely to lead to an accurate final decision.
The algorithm’s efficiency is notable, scaling linearly with both the number of human experts and the size of the conformal set. This means it can handle a growing number of experts and prediction options without becoming computationally prohibitive.
Experimental Validation and Key Findings
To validate their approach, the researchers conducted simulation studies using real expert predictions from two well-known datasets: CIFAR-10H and ImageNet-16H. These datasets contain natural images with human annotations, making them ideal for testing classification tasks.
The results were compelling. The proposed greedy algorithm consistently outperformed “naive” methods of human subset selection. These naive methods included simply using all available human experts (applying a majority decision rule) or selecting a random subset of experts. The study also showed that their method surpassed approaches based on “top-k” prediction sets, where experts choose from the k most probable labels identified by the AI, rather than the more rigorously defined conformal sets.
A significant finding was that multi-expert collaboration, guided by this conformal set-based approach, yielded higher success probabilities than relying on a single expert. Even with a limited set of options provided by the AI, the system demonstrated superior performance compared to previous baselines where humans had access to the full range of labels. This underscores the power of conformal predictors in identifying truly meaningful classes for each instance.
Furthermore, the research explored how the system performs as the number of human experts increases. The conformal set-based greedy selection approach continued to outperform both human-only expert teams and top-k predictor methods, proving its effectiveness even with a large pool of human expertise.
Also Read:
- Pref-GUIDE: Smarter AI Training Through Structured Human Preferences
- Unpacking User Engagement with AI Explanations: A Surprising Look at Trust and Decision-Making
Considerations and Future Directions
While promising, the study acknowledges certain assumptions and limitations. The algorithm relies on an estimated confusion matrix for human performance, and more sophisticated estimation methods could be explored. The assumption of independence among experts might not always hold true in real-world scenarios, as human decisions can influence each other. Additionally, the current framework often sets the AI’s tolerance level very low, resulting in nearly 100% coverage by the conformal sets. Future work could investigate scenarios where a clearer trade-off exists between tolerance levels and prediction set sizes.
Despite these points, the research offers a significant step forward in designing human-AI collaborative systems. By strategically selecting human experts and leveraging the focused guidance of conformal prediction sets, this framework paves the way for more accurate and efficient decision-making in complex, high-risk environments. It highlights that the “who” in human-AI teams is just as important as the “how.”


