spot_img
HomeResearch & DevelopmentUnifying Natural Language Queries Across Databases and APIs with...

Unifying Natural Language Queries Across Databases and APIs with a Declarative Approach

TLDR: A new research paper introduces ‘siwarex,’ a declarative system that significantly improves how Large Language Models (LLMs) handle natural language queries over diverse data sources, including both databases and APIs. By treating APIs as User Defined Functions within SQL, the system unifies data access and leverages SQL’s optimization capabilities. Experiments on new benchmarks show this declarative method outperforms imperative and agent-based approaches in accuracy and robustness for heterogeneous data environments.

In today’s industrial landscape, asking questions in natural language and getting answers that pull information from various structured data sources—like spreadsheets, databases, and APIs—is a common need. While Large Language Models (LLMs) have made strides in translating natural language into executable code for databases or APIs, they often fall short when faced with the complex reality of heterogeneous data environments. This means systems struggle to combine information from different types of sources effectively.

A recent research paper, titled “Declarative Techniques for NL Queries over Heterogeneous Data,” by Elham Khabiri, Jeffrey O. Kephart, Fenno F. Heath III, Srideepika Jayaraman, Fateh A. Tipu, Yingjie Li, Dhruv Shah, Achille Fokoue, and Anu Bhamidipaty, addresses this critical challenge. The authors introduce a novel declarative approach designed to handle data heterogeneity significantly better than existing LLM-based agentic or imperative code generation systems. You can read the full paper here: Declarative Techniques for NL Queries over Heterogeneous Data.

The Challenge of Diverse Data Sources

Current LLM-based applications often struggle because they conflate a user’s intent with the complex planning required to execute queries across different data types. Imagine asking, “Which Xylem pumps at Bedford have experienced anomalous temperatures today?” This requires both a database call to find pumps by manufacturer and location, and an API call to check temperature anomalies. Existing agent-based architectures, like ReAct, can orchestrate such tasks but tend to be brittle, expensive, and difficult to scale in real-world production settings.

Introducing a Declarative Solution: siwarex

The researchers propose a more practical architecture called siwarex, which cleanly separates the user’s intent from the execution planning. This system leverages SQL as a declarative language to express user intent and uses User Defined Functions (UDFs) to invoke APIs directly from within SQL queries. By doing so, APIs are treated on the same footing as database tables, allowing the system to utilize decades of research in SQL query optimization for efficient orchestration and aggregation across both databases and APIs.

The siwarex framework relies on two key schemas:

  • Abstract Schema: Provides a global view of data source properties and relationships, agnostic to whether the source is a database table or an API.
  • API Mapping Schema: Contains details needed to invoke an API, such as URL, method (POST, GET), and input/output parameters.

These schemas allow a standard Text-to-SQL module to generate SQL queries, even for virtual tables representing APIs. A rule-based Query Rewriter then transforms these queries into executable SQL by replacing virtual tables with their corresponding UDFs, ensuring proper argument passing.

Benchmarking the Approaches

To rigorously test their declarative approach against imperative code generation and agent-based systems, the authors created two new benchmarks, extending the popular Spider dataset:

  • Benchmark I: Replaces a fraction of real Spider database tables with equivalent API calls, requiring systems to combine database and API interactions.
  • Benchmark II: Introduces 16 scalar APIs for lexical, numeric, or geospatial operations, transforming existing Spider questions to require interleaving database operations with compositions of these APIs.

Key Findings

Experiments on these new benchmarks demonstrated that the declarative approach significantly outperforms both imperative and agent-based methods, especially when dealing with a mixture of database and API calls. The agentic method, for instance, struggled with sequencing multiple API calls, routing questions to the correct tools, and often hallucinated or improperly bound inputs. The imperative approach, while strong with pure database queries, became less accurate as the mix of APIs and databases became more even, due to the increased complexity of generated Python code.

The declarative approach, by presenting a unified relational view to the LLM, allows it to leverage its Text-to-SQL capabilities more effectively, with the Query Rewriter handling the complexities of API invocation. This separation of concerns leads to more robust and accurate results.

Also Read:

Looking Ahead

This research marks a significant step towards making natural language queries over heterogeneous data sources practical in industrial settings. The authors have released their augmented benchmarks to the research community, encouraging further advancements in this crucial area. Future work aims to address limitations such as evaluating execution performance on larger datasets and incorporating APIs that produce vector or table outputs, further extending the applicability of their declarative framework.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -