TLDR: This paper introduces a context-aware, semi-supervised meta-learning framework for Learning to Defer (L2D) systems. It addresses data scarcity by generating expert-specific embeddings from limited demonstrations, which are then used to create a large corpus of pseudo-labels for training and enable on-the-fly adaptation to new experts. Experiments show the model trained on these synthetic labels achieves near oracle-level performance and generalizes effectively to unseen experts, making adaptive L2D systems more practical and scalable.
Artificial intelligence systems have made incredible strides, often achieving performance comparable to or even surpassing human capabilities in fields like computer vision and medical image analysis. However, in critical areas such as healthcare diagnostics, purely automated AI models still face limitations. This has led to the development of hybrid intelligence systems, which combine human expertise with AI to leverage the strengths of both.
One significant area within hybrid intelligence is “Learning to Defer” (L2D). L2D systems allow an AI model to either make a prediction independently or, when uncertain or facing high-risk decisions, defer to a human expert. This approach aims to enhance safety and reliability in decision-making processes.
A major challenge with conventional L2D systems is their inability to generalize effectively to new human experts they haven’t encountered during training. While adaptive L2D approaches have emerged to model diverse expert behaviors, they typically require extensive labeled datasets that capture a wide range of human decision-making patterns. Acquiring such vast amounts of expert-labeled data is often impractical and expensive, creating a significant barrier to their real-world deployment.
This research paper introduces a novel solution to this data scarcity problem: a context-aware, semi-supervised framework for L2D systems. The core idea is to enable L2D models to adapt to new, unseen experts even with very limited initial demonstrations of their behavior. The framework uses meta-learning to generate unique “expert-specific embeddings” from just a few examples of an expert’s past decisions. These embeddings essentially capture an individual expert’s unique behavioral style.
The expert-specific embeddings serve a dual purpose. First, during the training phase, they are used to generate a large collection of “pseudo-labels” for a diverse population of experts. This synthetically labeled data then provides the necessary supervision to train a robust L2D model. Second, at test time, these embeddings act as a context vector, allowing the trained L2D model to adjust its deferral strategy on-the-fly to any new expert it encounters.
The architecture involves three main modules: an Embedding Model that creates a shared feature representation for inputs, a Context Set Encoder that processes an expert’s history to create a behavioral embedding, and an Expert Predictor that uses this embedding to predict whether an expert will label a query correctly or incorrectly. The training process combines supervised learning on the small set of available labels with an unsupervised consistency loss, where the model learns to predict the same outcome for an image even when it’s heavily augmented.
Once trained, the model can generate a complete set of context-aware expert labels for an entire dataset. These generated pseudo-labels are then used to train a downstream L2D model, specifically adapting the L2D-Pop architecture. This downstream model learns to personalize deferral decisions by conditioning on each individual expert’s context-set embedding.
Experiments were conducted on three standard vision datasets: CIFAR-10, FASHIONMNIST, and GTSRB. The researchers created a population of ten synthetic experts, each with a defined “oracle set” of classes they label with 100% accuracy, simulating diverse but overlapping skills. The study varied the number of available ground-truth annotations per expert, demonstrating scenarios with extremely limited data.
The results were highly promising. Across all datasets, even with a modest number of initial labels, the system trained on synthetic labels rapidly approached the performance of an “oracle” system (one trained on all true expert labels). For instance, on CIFAR-10, the proposed L2D-Pop variants achieved substantial gains in system accuracy (e.g., 12.5 and 12.8 percentage points) over a standalone classifier. Similarly, expert accuracy on deferred instances significantly improved, indicating the high quality of the deferral policy. This performance held true for both experts seen during label generation and for completely novel experts, highlighting the generalization capability of the approach.
Also Read:
- Optimizing Vision-Language-Action Models for Robotics: A Deep Dive into Efficiency
- Semantic World Models: Empowering Robots to Plan with Language-Based Future Predictions
The framework demonstrates remarkable data efficiency, closing most of the performance gap to the oracle upper bound with as few as 50 labels per expert. This means that effective human-AI collaboration can be achieved with significantly less initial data. A key insight from the discussion is the framework’s sensitivity to the quality of these initial limited annotations; a “clean” initial set with consistent expert behavior is crucial for optimal performance. This work represents a significant step forward in making adaptive L2D systems more practical and scalable for real-world human-AI collaboration. You can find more details about this research in the full paper: Learning To Defer To A Population With Limited Demonstrations.


