spot_img
HomeResearch & DevelopmentTraining Proactive Language Models from Real-World Conversations

Training Proactive Language Models from Real-World Conversations

TLDR: Learn-to-Ask is a novel, simulator-free framework that teaches Large Language Models (LLMs) to be proactive and goal-oriented by learning directly from offline expert dialogue logs. It infers turn-by-turn rewards from observed expert actions, enabling LLMs to learn both what questions to ask and when to stop a conversation. Successfully deployed in a medical AI service, the framework achieved performance superior to human experts, demonstrating its ability to translate offline data into significant real-world impact.

Large Language Models (LLMs) have become incredibly powerful, but they typically act as passive responders, waiting for a prompt before generating text. Imagine an AI in a critical field like healthcare or finance that could proactively guide a conversation, gather necessary information, and know when to stop. This capability, known as proactivity, is a major challenge for current LLMs.

Existing methods to make LLMs proactive often fall short. Some focus on optimizing single-turn qualities like clarity, which is too narrow to build a coherent, long-term conversational strategy. Others rely on complex user simulators for training, but these simulators are notoriously difficult and costly to build for real-world, open-ended domains, leading to a significant “reality gap” where models trained in simulation fail in real interactions.

A new framework, called Learn-to-Ask, aims to bridge this gap by enabling LLMs to learn and deploy proactive dialogue agents directly from offline expert data, completely bypassing the need for user simulators. This approach transforms passive LLMs into goal-oriented partners.

How Learn-to-Ask Works

The core idea behind Learn-to-Ask is to reframe the offline policy learning problem. Instead of trying to model complex user dynamics, it leverages the “observed future” of each expert conversation. This means that for every turn in an expert’s dialogue, the framework looks ahead to see what information the expert ultimately gathered and when they decided to stop. This allows it to infer a dense, turn-by-turn reward signal that is grounded in the expert’s actual strategy.

This process breaks down the difficult, long-term conversational problem into a series of manageable, supervised learning tasks. The policy is trained to output a structured tuple: an action (what to ask) and a state assessment (when to stop). To ensure these inferred rewards are accurate, the framework includes an Automated Grader Calibration pipeline that systematically cleans up noise from the LLM-based reward model with minimal human oversight.

The reward system is hierarchical, consisting of two main parts:

  • Micro-Reward (Question Utility): This measures how effectively a generated question targets the specific information the expert deemed critical to collect next. It provides a nuanced score (e.g., 1.0 for precise, 0.5 for relevant but not precise, 0.0 for irrelevant), helping the model learn precision.

  • Macro-Reward (Assessment Accuracy): This evaluates the correctness of the agent’s decision to continue or stop the conversation, comparing it against the expert’s implicit decision. This is a crucial binary reward (1 if correct, 0 otherwise) for dialogue efficiency.

These rewards are integrated multiplicatively, meaning the model only gets credit for asking a good question if its decision to continue the conversation was also correct. This ensures that the strategic decision of when to stop is prioritized.

Real-World Impact and Validation

The effectiveness of Learn-to-Ask was first demonstrated in offline experiments using RealMedConv, a real-world medical dialogue dataset. Models trained with this framework showed dramatic improvements. For example, a 7B model more than tripled its ability to ask perfectly targeted questions and achieved over 92% accuracy in correctly terminating conversations.

More importantly, the framework successfully bridged the “reality gap” by deploying a Learn-to-Ask-trained model into a live, large-scale online AI service called “Medication AI Assistant.” This service proactively engages with users to gather symptom descriptions and recommend over-the-counter medications. In this production environment, the model not only functioned robustly but achieved task-success rates that exceeded those of human experts. It reached a 93% information completeness rate and an 88% good-question rate, leading to a significant increase in the dialog-to-purchase conversion rate compared to human-based services.

The Automated Prompt Calibration (Auto-Prompt) pipeline proved invaluable in this production setting. While it offered marginal gains on smaller academic datasets, its ability to automatically refine prompts for information extraction and reward grading became essential for maintaining and continuously improving the system in a dynamic, complex environment with evolving user behaviors and business needs.

Also Read:

Looking Ahead

Learn-to-Ask provides a practical and economically viable blueprint for transforming passive LLMs into proactive, goal-oriented applications. The research also opens doors for future advancements, such as moving beyond expert imitation to creating superhuman AI agents. This could involve shaping rewards to enforce specific safety protocols, exploring new lines of inquiry not present in expert data, or developing hybrid human-AI systems where the AI continuously learns from human expert feedback. You can find more details about this research in the full paper.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -