TLDR: This paper introduces ICL-GradSel, a novel algorithm for efficiently selecting demonstration examples for in-context learning (ICL) in large language models. It uses a first-order approximation based on gradients of the model output to estimate performance on demonstration subsets with less than 1% error. This approach results in a linear-time algorithm, achieving up to 37.7x speed-up and outperforming existing selection methods by 11% on average, while significantly reducing computational cost. The method enhances both the efficiency and effectiveness of ICL.
In the rapidly evolving landscape of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities, particularly through a technique known as in-context learning (ICL). This allows models to adapt to new tasks by simply conditioning on a few examples provided within the prompt, rather than undergoing extensive fine-tuning. However, the effectiveness of ICL is highly sensitive to the quality and relevance of these demonstration examples. The challenge lies in efficiently selecting the best examples from a potentially vast pool, a problem that has significant implications for areas like prompt tuning and chain-of-thought reasoning.
Traditional methods for demonstration selection often fall into two categories: those based on the similarity of input embeddings and those that directly evaluate model losses. Similarity-based approaches, while identifying relevant examples, can overlook how the model’s output is conditioned on demonstration labels and treat examples independently, ignoring crucial interactions between them. On the other hand, methods that evaluate model losses directly, such as forward selection or random ensemble selection, can be computationally prohibitive, especially when dealing with a large number of demonstrations or very large models.
A recent research paper, titled “Linear-Time Demonstration Selection for In-Context Learning via Gradient Estimation,” introduces a groundbreaking algorithm to address this efficiency bottleneck. Authored by Ziniu Zhang, Zhenshuo Zhang, Dongyue Li, Lu Wang, Jennifer Dy, and Hongyang R. Zhang, this work proposes a novel approach that leverages the gradients of the model output with respect to the input embeddings. The core idea is to use a first-order Taylor expansion to accurately estimate model outputs for various demonstration subsets without needing to run full model inference repeatedly.
How the New Approach Works
The algorithm, referred to as ICL-GradSel, operates in three main stages:
1. Pre-computing Gradients: Initially, the model’s functional outputs and gradients (with respect to the embedding vector) are computed once on the entire training set. This is a one-time cost that sets up the estimation process.
2. Gradient Estimation: Using the pre-computed gradients and a first-order approximation, the algorithm estimates the model’s outputs for multiple randomly sampled subsets of demonstrations. This stage avoids costly full inference for each subset, significantly speeding up the evaluation process. The researchers empirically validated that this gradient estimation yields approximations with less than 1% error across various LLMs and datasets, even for models with up to 34 billion parameters.
3. Demonstration Selection: Finally, an influence score is calculated for each demonstration example based on the aggregated estimated outcomes from the sampled subsets. The ‘k’ most relevant examples (those with the lowest scores, indicating better performance) are then selected to form the prompt for in-context learning.
Also Read:
- DistillPrompt: A New Method for Automatically Optimizing Language Model Prompts
- Unlocking Low-Resource Language Performance in LLMs Through Neuron Overlap
Efficiency and Performance Gains
This gradient-based estimation procedure results in a linear-time algorithm relative to model and training set sizes. This is a significant improvement over existing methods, which can incur much higher computational costs. The paper demonstrates that ICL-GradSel achieves up to a 37.7x speed-up compared to full-model inference methods like forward selection and random ensemble selection, all while maintaining high accuracy.
Beyond efficiency, the selected demonstration sets also lead to superior in-context learning performance. Experiments across six diverse datasets (including sentiment classification and math reasoning tasks) show that ICL-GradSel outperforms strong baselines based on input embeddings by an average of 11%, using up to 49% less computation. In long-context scenarios, the method can match the performance of existing baselines with significantly shorter context lengths, demonstrating its ability to select highly impactful examples.
The research highlights that this gradient estimation framework is flexible and can be instantiated to accelerate various subset selection methods, such as gradient-based random ensemble (ICL-GradRE) and gradient-based forward selection (ICL-GradFS). The code to replicate these findings is available on GitHub, underscoring the practical applicability of this work. For more technical details, you can read the full paper here.
This work represents a crucial step forward in making in-context learning more efficient and effective, especially as LLMs continue to grow in size and complexity. By providing a scalable method for demonstration selection, it paves the way for broader applications of ICL in real-world scenarios.


