Enhancing Human-AI Collaboration: A New Approach to Expert Selection with Conformal Prediction

TLDR: This research introduces a greedy algorithm for selecting a subset of human experts to collaborate with AI in classification tasks. By leveraging “conformal prediction sets” (AI-generated sets of highly probable labels), the algorithm helps human experts make more accurate decisions. The study demonstrates that this method significantly improves classification performance compared to naive approaches and other set-based predictors, especially in multi-expert scenarios, by focusing human expertise on relevant options.

In the evolving landscape of artificial intelligence, the collaboration between human experts and AI systems is becoming increasingly vital, especially in high-stakes fields like medicine, finance, and scientific discovery. This partnership, known as human-AI complementarity, aims to combine the strengths of both to achieve better outcomes than either could alone. While much research has focused on improving AI algorithms, a new study delves into how humans and AI can work together more effectively, particularly when multiple human experts are involved.

Traditional approaches to human-AI collaboration often focus on a single human expert or on systems where AI defers decisions to humans when it lacks confidence. However, real-world decision-making frequently involves multiple experts. This new research, titled Conformal Set-based Human-AI Complementarity with Multiple Experts, addresses this gap by proposing a framework for human-AI teamwork that incorporates a pool of multiple human experts.

The core idea revolves around “conformal prediction sets.” Imagine an AI system that, instead of giving a single answer, provides a small set of highly probable answers for a given task. For example, in an image classification task, if an AI is unsure whether an image is a “cat” or a “dog,” it might present both options. Human experts then choose the most appropriate label from this narrowed set. This approach helps humans by reducing the overwhelming number of possibilities, allowing them to focus their expertise more effectively.

The researchers, Helbert Paat and Guohao Shen from The Hong Kong Polytechnic University, highlight that existing studies often limit their scope to single-expert scenarios. Their work expands this by characterizing the conditions under which multiple experts can significantly benefit from these conformal sets. They introduce a novel greedy algorithm designed to select the most relevant subset of human experts for each specific instance. This is crucial because, for any given task, not all experts may be equally relevant or accurate.

The Greedy Algorithm for Expert Selection

The proposed greedy algorithm works by identifying which human experts, even without prior knowledge of the conformal set, are more likely to choose an answer from within that set. It leverages a “confusion matrix” for each expert, which essentially maps how an expert’s predictions align with the true labels. By understanding these individual expert tendencies, the algorithm can strategically select a subset of experts whose combined insights are most likely to lead to an accurate final decision.

The algorithm’s efficiency is notable, scaling linearly with both the number of human experts and the size of the conformal set. This means it can handle a growing number of experts and prediction options without becoming computationally prohibitive.

Experimental Validation and Key Findings

To validate their approach, the researchers conducted simulation studies using real expert predictions from two well-known datasets: CIFAR-10H and ImageNet-16H. These datasets contain natural images with human annotations, making them ideal for testing classification tasks.

The results were compelling. The proposed greedy algorithm consistently outperformed “naive” methods of human subset selection. These naive methods included simply using all available human experts (applying a majority decision rule) or selecting a random subset of experts. The study also showed that their method surpassed approaches based on “top-k” prediction sets, where experts choose from the k most probable labels identified by the AI, rather than the more rigorously defined conformal sets.

A significant finding was that multi-expert collaboration, guided by this conformal set-based approach, yielded higher success probabilities than relying on a single expert. Even with a limited set of options provided by the AI, the system demonstrated superior performance compared to previous baselines where humans had access to the full range of labels. This underscores the power of conformal predictors in identifying truly meaningful classes for each instance.

Furthermore, the research explored how the system performs as the number of human experts increases. The conformal set-based greedy selection approach continued to outperform both human-only expert teams and top-k predictor methods, proving its effectiveness even with a large pool of human expertise.

Also Read:

Considerations and Future Directions

While promising, the study acknowledges certain assumptions and limitations. The algorithm relies on an estimated confusion matrix for human performance, and more sophisticated estimation methods could be explored. The assumption of independence among experts might not always hold true in real-world scenarios, as human decisions can influence each other. Additionally, the current framework often sets the AI’s tolerance level very low, resulting in nearly 100% coverage by the conformal sets. Future work could investigate scenarios where a clearer trade-off exists between tolerance levels and prediction set sizes.

Despite these points, the research offers a significant step forward in designing human-AI collaborative systems. By strategically selecting human experts and leveraging the focused guidance of conformal prediction sets, this framework paves the way for more accurate and efficient decision-making in complex, high-risk environments. It highlights that the “who” in human-AI teams is just as important as the “how.”

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Human-AI Collaboration: A New Approach to Expert Selection with Conformal Prediction

The Greedy Algorithm for Expert Selection

Experimental Validation and Key Findings

Considerations and Future Directions

Gen AI News and Updates

Upwork Study Reveals AI Agents Thrive with Human Collaboration, Struggle Alone

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Cisco Revolutionizes Customer Experience with Pervasive Agentic AI Integration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates