Training Proactive Language Models from Real-World Conversations

TLDR: Learn-to-Ask is a novel, simulator-free framework that teaches Large Language Models (LLMs) to be proactive and goal-oriented by learning directly from offline expert dialogue logs. It infers turn-by-turn rewards from observed expert actions, enabling LLMs to learn both what questions to ask and when to stop a conversation. Successfully deployed in a medical AI service, the framework achieved performance superior to human experts, demonstrating its ability to translate offline data into significant real-world impact.

Large Language Models (LLMs) have become incredibly powerful, but they typically act as passive responders, waiting for a prompt before generating text. Imagine an AI in a critical field like healthcare or finance that could proactively guide a conversation, gather necessary information, and know when to stop. This capability, known as proactivity, is a major challenge for current LLMs.

Existing methods to make LLMs proactive often fall short. Some focus on optimizing single-turn qualities like clarity, which is too narrow to build a coherent, long-term conversational strategy. Others rely on complex user simulators for training, but these simulators are notoriously difficult and costly to build for real-world, open-ended domains, leading to a significant “reality gap” where models trained in simulation fail in real interactions.

A new framework, called Learn-to-Ask, aims to bridge this gap by enabling LLMs to learn and deploy proactive dialogue agents directly from offline expert data, completely bypassing the need for user simulators. This approach transforms passive LLMs into goal-oriented partners.

How Learn-to-Ask Works

The core idea behind Learn-to-Ask is to reframe the offline policy learning problem. Instead of trying to model complex user dynamics, it leverages the “observed future” of each expert conversation. This means that for every turn in an expert’s dialogue, the framework looks ahead to see what information the expert ultimately gathered and when they decided to stop. This allows it to infer a dense, turn-by-turn reward signal that is grounded in the expert’s actual strategy.

This process breaks down the difficult, long-term conversational problem into a series of manageable, supervised learning tasks. The policy is trained to output a structured tuple: an action (what to ask) and a state assessment (when to stop). To ensure these inferred rewards are accurate, the framework includes an Automated Grader Calibration pipeline that systematically cleans up noise from the LLM-based reward model with minimal human oversight.

The reward system is hierarchical, consisting of two main parts:

Micro-Reward (Question Utility): This measures how effectively a generated question targets the specific information the expert deemed critical to collect next. It provides a nuanced score (e.g., 1.0 for precise, 0.5 for relevant but not precise, 0.0 for irrelevant), helping the model learn precision.
Macro-Reward (Assessment Accuracy): This evaluates the correctness of the agent’s decision to continue or stop the conversation, comparing it against the expert’s implicit decision. This is a crucial binary reward (1 if correct, 0 otherwise) for dialogue efficiency.

These rewards are integrated multiplicatively, meaning the model only gets credit for asking a good question if its decision to continue the conversation was also correct. This ensures that the strategic decision of when to stop is prioritized.

Real-World Impact and Validation

The effectiveness of Learn-to-Ask was first demonstrated in offline experiments using RealMedConv, a real-world medical dialogue dataset. Models trained with this framework showed dramatic improvements. For example, a 7B model more than tripled its ability to ask perfectly targeted questions and achieved over 92% accuracy in correctly terminating conversations.

More importantly, the framework successfully bridged the “reality gap” by deploying a Learn-to-Ask-trained model into a live, large-scale online AI service called “Medication AI Assistant.” This service proactively engages with users to gather symptom descriptions and recommend over-the-counter medications. In this production environment, the model not only functioned robustly but achieved task-success rates that exceeded those of human experts. It reached a 93% information completeness rate and an 88% good-question rate, leading to a significant increase in the dialog-to-purchase conversion rate compared to human-based services.

The Automated Prompt Calibration (Auto-Prompt) pipeline proved invaluable in this production setting. While it offered marginal gains on smaller academic datasets, its ability to automatically refine prompts for information extraction and reward grading became essential for maintaining and continuously improving the system in a dynamic, complex environment with evolving user behaviors and business needs.

Also Read:

Looking Ahead

Learn-to-Ask provides a practical and economically viable blueprint for transforming passive LLMs into proactive, goal-oriented applications. The research also opens doors for future advancements, such as moving beyond expert imitation to creating superhuman AI agents. This could involve shaping rewards to enforce specific safety protocols, exploring new lines of inquiry not present in expert data, or developing hybrid human-AI systems where the AI continuously learns from human expert feedback. You can find more details about this research in the full paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Training Proactive Language Models from Real-World Conversations

How Learn-to-Ask Works

Real-World Impact and Validation

Looking Ahead

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates