DICE: A Dynamic Approach to In-Context Example Selection for LLM Agents

TLDR: DICE (Dynamic In-Context Example Selection) is a new framework that significantly enhances Large Language Model (LLM) agents by dynamically selecting the most relevant examples at each step of a task. It achieves this by identifying and prioritizing “transferable knowledge” from demonstrations, thereby mitigating the negative impact of irrelevant information. This training-free, plug-in solution consistently improves agent performance across diverse reasoning and sequential decision-making tasks, demonstrating greater efficiency with fewer examples and increased robustness even with lower-quality demonstrations.

Large Language Model (LLM) agents are becoming increasingly powerful, tackling complex tasks that involve reasoning and using various tools. A key technique enabling their capabilities is In-Context Learning (ICL), where the model learns how to behave by looking at a few examples provided directly in its prompt. However, a significant challenge with ICL is its sensitivity to the quality and relevance of these examples. If the examples aren’t optimal, the agent’s performance can become unstable or even degrade.

Traditional approaches to selecting these examples often rely on simple rules or are designed for very specific tasks. They typically lack a general, theoretically sound way to determine what makes an example truly effective across different steps of a complex reasoning process. This makes it difficult to develop a universal method for choosing examples that consistently help LLM agents perform better.

Addressing this challenge, researchers have introduced a new framework called DICE, which stands for Dynamic In-Context Example Selection. DICE is designed specifically for LLM agents and offers a theoretically grounded approach to ICL. Its core idea is to dynamically select the most relevant examples at each individual step of an agent’s reasoning process.

How DICE Works

DICE operates by looking at the knowledge contained within demonstration examples through a unique lens, separating it into two parts: transferable knowledge and non-transferable knowledge. Transferable knowledge is the useful information that can be applied to a new task, while non-transferable knowledge includes irrelevant or task-specific details that can actually mislead the agent and hinder its ability to generalize to new situations. The framework shows how this non-transferable information can create misleading connections that impair the agent’s performance.

To combat this, DICE proposes a step-by-step selection criterion. This criterion is formally guaranteed to improve agent performance by focusing on maximizing the transferable knowledge. Importantly, DICE is designed as a flexible, plug-in module that can be integrated into existing LLM agent frameworks without needing any additional training. This means it can enhance current agents without extra computational cost.

At each step, DICE uses a pre-trained LLM, acting as a “Knowledge Retriever,” to identify the most relevant demonstrations. It approximates how much useful information (transferable knowledge) a demonstration provides for the agent’s next action, effectively filtering out irrelevant examples and adapting to the agent’s evolving context.

Also Read:

Demonstrated Effectiveness

The effectiveness and generality of DICE have been rigorously tested across a variety of tasks and agent frameworks. Experiments were conducted on:

HotpotQA: A multi-hop question answering dataset that requires complex reasoning.
Webshop: An interactive web-based shopping environment, testing sequential decision-making.
AlfWorld: A text-based environment for embodied decision-making, involving multi-step household tasks.

DICE was integrated with popular agent frameworks like ReAct, Reflexion, and LATS. The results were consistently positive, showing significant improvements in performance across all benchmarks and baselines. For instance, on HotpotQA, DICE boosted the Exact Match score for ReAct by 9.3 percentage points. On AlfWorld, it led to a 10.4 percentage point increase in success rate for ReAct, particularly in more challenging subcategories.

A key finding from the research is the benefit of DICE’s stepwise demonstration selection. Unlike methods that pick a fixed set of examples for an entire task, DICE dynamically adjusts the examples at each reasoning step. This dynamic approach consistently outperformed static, task-level selection methods, highlighting the value of adapting demonstrations as the agent progresses.

Further analysis revealed that DICE can achieve comparable or even better performance using fewer examples than standard ICL. This demonstrates its efficiency in reducing the reliance on large numbers of demonstrations. Moreover, DICE proved to be robust even when only low-quality demonstrations were available, suggesting its ability to extract valuable transferable knowledge even from suboptimal inputs.

In essence, DICE represents a significant step forward in making LLM agents more robust, efficient, and adaptable by providing a principled, context-aware method for selecting the right examples at the right time. To learn more about this innovative framework, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

DICE: A Dynamic Approach to In-Context Example Selection for LLM Agents

How DICE Works

Demonstrated Effectiveness

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Generative AI Powers Next-Gen Autonomous Emergency Response

A New Way to Disentangle Data for Scientific Exploration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates