Guiding Large Language Models Through Better Questioning

TLDR: A new method called Conformal Information Pursuit (C-IP) helps Large Language Models (LLMs) ask more effective questions in interactive conversations. Unlike traditional methods that struggle with LLM’s often inaccurate confidence, C-IP uses “conformal prediction sets” to reliably measure uncertainty. This leads to LLMs asking fewer, more informative questions, improving their accuracy in tasks like 20 Questions and medical diagnosis.

Large Language Models (LLMs) are increasingly used for interactive question-answering, where they sequentially ask for information to arrive at a prediction. However, a significant challenge arises because LLMs often provide over- or under-confident probabilities for their outputs. This miscalibration makes it difficult to accurately estimate uncertainty, leading to suboptimal question selection and less efficient conversations.

Traditional methods, such as Information Pursuit (IP), aim to minimize the number of questions by selecting queries that maximize information gain or minimize uncertainty at each step. But when LLM probabilities are unreliable, these uncertainty estimates become inaccurate, hindering the model’s ability to ask the most informative questions.

To address this, researchers have proposed Conformal Information Pursuit (C-IP), a novel approach that leverages ‘conformal prediction sets’ to measure uncertainty. Unlike traditional conditional entropy, conformal prediction sets offer a robust and distribution-free way to estimate how uncertain an LLM is about its prediction. C-IP utilizes a mathematical relationship between these prediction sets and conditional entropy, allowing it to estimate uncertainty based on the average size of these sets. Essentially, the smaller the prediction set, the more confident the model is.

C-IP works by greedily selecting the next question that is expected to minimize the size of the prediction set. This means the model is guided to ask questions that will most effectively narrow down the possibilities and reduce its uncertainty. The paper explores two ways to construct these prediction sets: by uniformly sampling historical query patterns or by simulating query patterns directly from LLM interactions.

Also Read:

Real-World Applications and Performance

The effectiveness of C-IP was demonstrated through experiments on two distinct tasks. First, in a game of 20 Questions, C-IP showed superior predictive performance and achieved correct answers with shorter sequences of questions compared to previous IP methods and other uncertainty-based approaches. This held true for both pre-defined (closed) and free-form (open) question sets.

Second, C-IP was applied to an interactive medical question-answering task using the MediQ dataset, which simulates a conversation between a doctor LLM and a patient LLM. In this complex setting, C-IP achieved competitive performance with direct, single-turn predictions (where all information is given at once), while also providing greater interpretability of the diagnostic process. It consistently outperformed traditional IP in specialties like Internal Medicine and Pediatrics, indicating its ability to select more informative queries during the interactive diagnosis.

This research highlights that using prediction set sizes is an effective way to measure uncertainty in LLMs, leading to more efficient and accurate interactive AI systems. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Guiding Large Language Models Through Better Questioning

Real-World Applications and Performance

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates