Optimizing In-Context Learning: A Kernelized and Information-Theoretic Approach to Example Selection

TLDR: KITE is a novel, information theory-driven framework for selecting optimal examples for In-Context Learning (ICL) in large language models. It addresses limitations of previous methods by modeling LLMs as linear functions, framing example selection as a query-specific optimization problem, and leveraging an approximately submodular objective. KITE enhances this by incorporating the kernel trick to handle non-linear relationships and an optimal design-based regularizer to encourage diversity among selected examples. Empirically, KITE consistently outperforms existing baselines across various classification tasks and LLMs, demonstrating significant improvements in performance.

In-context learning (ICL) has become a powerful method for adapting large language models (LLMs) to new tasks, especially when data is scarce. This approach involves providing the LLM with a few carefully chosen examples directly within the prompt. However, a critical challenge arises due to the limited context size of LLMs: how do we select the most effective examples to maximize performance for a given user query?

Traditional methods, such as nearest-neighbor-based techniques like KATE, often fall short in high-dimensional embedding spaces. They can suffer from poor generalization and a lack of diversity among the selected examples. This is where a new research paper introduces a novel framework called KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning.

KITE tackles the example selection problem from a principled, information theory-driven perspective. The researchers model an LLM as a linear function over input embeddings and frame the example selection as an optimization problem. The goal is to choose a subset of examples from a larger bank that minimizes the prediction error for a specific query. This differs from traditional approaches that focus on generalization across a distribution of test points; KITE targets accurate prediction for a single, specific query instance.

The framework derives a surrogate objective that is approximately submodular, which allows for the use of a greedy algorithm with a strong approximation guarantee. KITE further enhances its method through two key innovations:

Kernel Trick for Non-Linearity

First, KITE incorporates the well-known kernel trick. This allows the method to operate effectively in high-dimensional feature spaces without needing to explicitly map data into those spaces. Instead, it computes inner products via kernels, enabling the model to capture complex, non-linear relationships between data points. This is crucial because real-world data often exhibits intricate patterns that linear models cannot fully capture.

Optimal Design for Diversity

Second, KITE introduces an optimal design-based regularizer to actively encourage diversity among the selected examples. Inspired by maximum information gain theory, this component ensures that the chosen examples are not only relevant to the query but also sufficiently varied. Promoting diversity is vital for improving the generalizability of the model and enhancing the quality of LLM responses, especially in scenarios where many examples might be semantically similar and lead to redundancy.

The combined objective in KITE balances both relevance (how similar an example is to the input query) and diversity (how varied the selected examples are from each other). The algorithm, called LITE (Linear Information Theoretic Exemplars) when using a linear kernel, efficiently selects examples by maximizing this combined score at each step.

Also Read:

Empirical Validation

The researchers conducted extensive experiments across multiple classification datasets, including SST-5, CMSQA, MRPC, QNLI, and HellaSwag, using state-of-the-art LLMs like GPT-Neo-2.7B, Qwen 2.5-1.5B, and Llama-3.2-3B. KITE consistently outperformed strong retrieval baselines such as Random, BM25, Dense embeddings, and DPP-based retrieval strategies. For instance, KITE showed significant accuracy improvements, surpassing the strongest baseline, DPP, by notable margins on several datasets.

Ablation studies confirmed that the choice of kernel function is a critical hyperparameter, with no single kernel being universally optimal, highlighting the importance of the kernel trick for capturing non-linear relationships. The studies also demonstrated that incorporating diversity (controlled by a parameter λ) is crucial, especially for large and varied example banks, and that KITE maintains its superior performance even in low-resource settings with fewer in-context examples.

The empirical validation of the objective function’s approximate submodularity further justifies the use of a greedy algorithm, ensuring near-optimal results in practice.

In conclusion, KITE offers a robust and effective framework for in-context example selection by combining a principled, information-theoretic approach with kernel methods and diversity regularization. Its consistent outperformance across various benchmarks underscores its potential to significantly enhance the efficacy of in-context learning for LLMs. Future work aims to extend KITE to generative tasks. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Optimizing In-Context Learning: A Kernelized and Information-Theoretic Approach to Example Selection

Kernel Trick for Non-Linearity

Optimal Design for Diversity

Empirical Validation

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Generative AI Powers Next-Gen Autonomous Emergency Response

A New Way to Disentangle Data for Scientific Exploration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates