spot_img
HomeResearch & DevelopmentTURA: A New Framework for Conversational AI Search

TURA: A New Framework for Conversational AI Search

TLDR: TURA is a novel three-stage framework that enhances AI search by combining traditional Retrieval-Augmented Generation (RAG) with agentic tool-use. It allows search engines to access both static web content and dynamic, real-time information (like ticket availability or inventory) by decomposing queries, planning tasks using a Directed Acyclic Graph (DAG) for parallel execution, and efficiently executing tool calls with a distilled agent. This approach addresses the limitations of RAG systems that struggle with time-sensitive and structured queries, leading to more accurate and real-time answers for millions of users.

The landscape of search engines is undergoing a significant transformation, moving from traditional keyword-based results to more conversational and intelligent AI search experiences. This shift is largely driven by the advent of Large Language Models (LLMs) and a technique called Retrieval-Augmented Generation (RAG).

However, current RAG-based AI search systems, while powerful for static web content, face considerable limitations when it comes to handling real-time, dynamic information. Imagine trying to find out live train ticket availability or current product inventory – these systems often struggle because the necessary information isn’t on a static webpage; it needs to be actively queried from databases or APIs.

Introducing TURA: A Unified Approach to AI Search

To bridge this crucial gap, researchers Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, and Dawei Yin from Baidu Inc. and the University of Science and Technology of China have introduced TURA (Tool-Augmented Unified Retrieval Agent for AI Search). TURA is a groundbreaking three-stage framework designed to enhance LLMs with the ability to use external tools, moving beyond passive document retrieval to active, real-time data acquisition. This allows AI search to provide robust, real-time answers while meeting the low-latency demands of large-scale industrial systems.

TURA’s innovative architecture is built upon three core components:

1. Intent-Aware Retrieval

This initial stage acts like a smart filter. When you ask a complex question, TURA first uses an LLM to break it down into smaller, distinct sub-questions. For example, if you ask about a “Beijing trip in June, needing a hotel, attractions, and things to do,” it might decompose this into “search Beijing weather,” “top attractions,” and “hotel booking.” Then, it intelligently identifies the most relevant “tools” or information sources (called Model Context Protocol Servers) that can answer each sub-question. To overcome the challenge of matching user language to technical tool descriptions, TURA augments its index with a large, diverse set of synthetic queries, ensuring it can find the right tool even if your phrasing is unique.

2. DAG-based Task Planner

Once the relevant tools are identified, TURA’s planner steps in. For simple queries, it creates a straightforward execution plan. But for complex, multi-faceted questions, it constructs a Directed Acyclic Graph (DAG). Think of a DAG as a flowchart where tasks are nodes and arrows show dependencies. This allows TURA to understand which tasks can run in parallel (like checking weather and finding attractions simultaneously) and which need the output of another task before they can begin (like planning a route after knowing hotel and attraction locations). This intelligent planning drastically reduces the time it takes to get an answer for complex queries.

3. Distilled Agent Executor

The final stage is executing the plan. Instead of relying on a massive, slow LLM for every small decision, TURA uses a lightweight, highly efficient “Distilled Agent Executor.” This smaller agent is trained using a technique called “agent distillation,” where it learns from the high-quality decisions of a much larger, more powerful “teacher” model. Crucially, it’s trained to understand the reasoning process but then directly generates the action during live use, skipping the verbose “thought” step. This “train-with-thought, infer-without-thought” paradigm allows TURA to achieve near-teacher accuracy at a fraction of the computational cost and latency, making it suitable for real-time production environments.

Also Read:

Real-World Impact and Performance

Since May 2025, TURA has been fully deployed and is successfully serving tens of millions of users. Extensive evaluations, including live A/B testing, have shown TURA’s clear superiority over traditional LLM + RAG systems. It significantly improves answer accuracy and faithfulness, meaning the answers are not only correct but also reliably grounded in the information retrieved. TURA has also led to a notable increase in Session Success Rate, indicating higher user satisfaction, and a substantial reduction in latency for complex queries.

This work represents a significant step forward in AI search, demonstrating a shift from passive information retrieval to active, tool-augmented systems capable of seamlessly integrating diverse, real-time data sources. For more in-depth technical details, you can read the full research paper available here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -