Enhancing Human-AI Collaboration with Data-Efficient Learning to Defer Systems

TLDR: This paper introduces a context-aware, semi-supervised meta-learning framework for Learning to Defer (L2D) systems. It addresses data scarcity by generating expert-specific embeddings from limited demonstrations, which are then used to create a large corpus of pseudo-labels for training and enable on-the-fly adaptation to new experts. Experiments show the model trained on these synthetic labels achieves near oracle-level performance and generalizes effectively to unseen experts, making adaptive L2D systems more practical and scalable.

Artificial intelligence systems have made incredible strides, often achieving performance comparable to or even surpassing human capabilities in fields like computer vision and medical image analysis. However, in critical areas such as healthcare diagnostics, purely automated AI models still face limitations. This has led to the development of hybrid intelligence systems, which combine human expertise with AI to leverage the strengths of both.

One significant area within hybrid intelligence is “Learning to Defer” (L2D). L2D systems allow an AI model to either make a prediction independently or, when uncertain or facing high-risk decisions, defer to a human expert. This approach aims to enhance safety and reliability in decision-making processes.

A major challenge with conventional L2D systems is their inability to generalize effectively to new human experts they haven’t encountered during training. While adaptive L2D approaches have emerged to model diverse expert behaviors, they typically require extensive labeled datasets that capture a wide range of human decision-making patterns. Acquiring such vast amounts of expert-labeled data is often impractical and expensive, creating a significant barrier to their real-world deployment.

This research paper introduces a novel solution to this data scarcity problem: a context-aware, semi-supervised framework for L2D systems. The core idea is to enable L2D models to adapt to new, unseen experts even with very limited initial demonstrations of their behavior. The framework uses meta-learning to generate unique “expert-specific embeddings” from just a few examples of an expert’s past decisions. These embeddings essentially capture an individual expert’s unique behavioral style.

The expert-specific embeddings serve a dual purpose. First, during the training phase, they are used to generate a large collection of “pseudo-labels” for a diverse population of experts. This synthetically labeled data then provides the necessary supervision to train a robust L2D model. Second, at test time, these embeddings act as a context vector, allowing the trained L2D model to adjust its deferral strategy on-the-fly to any new expert it encounters.

The architecture involves three main modules: an Embedding Model that creates a shared feature representation for inputs, a Context Set Encoder that processes an expert’s history to create a behavioral embedding, and an Expert Predictor that uses this embedding to predict whether an expert will label a query correctly or incorrectly. The training process combines supervised learning on the small set of available labels with an unsupervised consistency loss, where the model learns to predict the same outcome for an image even when it’s heavily augmented.

Once trained, the model can generate a complete set of context-aware expert labels for an entire dataset. These generated pseudo-labels are then used to train a downstream L2D model, specifically adapting the L2D-Pop architecture. This downstream model learns to personalize deferral decisions by conditioning on each individual expert’s context-set embedding.

Experiments were conducted on three standard vision datasets: CIFAR-10, FASHIONMNIST, and GTSRB. The researchers created a population of ten synthetic experts, each with a defined “oracle set” of classes they label with 100% accuracy, simulating diverse but overlapping skills. The study varied the number of available ground-truth annotations per expert, demonstrating scenarios with extremely limited data.

The results were highly promising. Across all datasets, even with a modest number of initial labels, the system trained on synthetic labels rapidly approached the performance of an “oracle” system (one trained on all true expert labels). For instance, on CIFAR-10, the proposed L2D-Pop variants achieved substantial gains in system accuracy (e.g., 12.5 and 12.8 percentage points) over a standalone classifier. Similarly, expert accuracy on deferred instances significantly improved, indicating the high quality of the deferral policy. This performance held true for both experts seen during label generation and for completely novel experts, highlighting the generalization capability of the approach.

Also Read:

The framework demonstrates remarkable data efficiency, closing most of the performance gap to the oracle upper bound with as few as 50 labels per expert. This means that effective human-AI collaboration can be achieved with significantly less initial data. A key insight from the discussion is the framework’s sensitivity to the quality of these initial limited annotations; a “clean” initial set with consistent expert behavior is crucial for optimal performance. This work represents a significant step forward in making adaptive L2D systems more practical and scalable for real-world human-AI collaboration. You can find more details about this research in the full paper: Learning To Defer To A Population With Limited Demonstrations.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Human-AI Collaboration with Data-Efficient Learning to Defer Systems

Gen AI News and Updates

Upwork Study Reveals AI Agents Thrive with Human Collaboration, Struggle Alone

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Cisco Revolutionizes Customer Experience with Pervasive Agentic AI Integration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates