Unifying Natural Language Queries Across Databases and APIs with a Declarative Approach

TLDR: A new research paper introduces ‘siwarex,’ a declarative system that significantly improves how Large Language Models (LLMs) handle natural language queries over diverse data sources, including both databases and APIs. By treating APIs as User Defined Functions within SQL, the system unifies data access and leverages SQL’s optimization capabilities. Experiments on new benchmarks show this declarative method outperforms imperative and agent-based approaches in accuracy and robustness for heterogeneous data environments.

In today’s industrial landscape, asking questions in natural language and getting answers that pull information from various structured data sources—like spreadsheets, databases, and APIs—is a common need. While Large Language Models (LLMs) have made strides in translating natural language into executable code for databases or APIs, they often fall short when faced with the complex reality of heterogeneous data environments. This means systems struggle to combine information from different types of sources effectively.

A recent research paper, titled “Declarative Techniques for NL Queries over Heterogeneous Data,” by Elham Khabiri, Jeffrey O. Kephart, Fenno F. Heath III, Srideepika Jayaraman, Fateh A. Tipu, Yingjie Li, Dhruv Shah, Achille Fokoue, and Anu Bhamidipaty, addresses this critical challenge. The authors introduce a novel declarative approach designed to handle data heterogeneity significantly better than existing LLM-based agentic or imperative code generation systems. You can read the full paper here: Declarative Techniques for NL Queries over Heterogeneous Data.

The Challenge of Diverse Data Sources

Current LLM-based applications often struggle because they conflate a user’s intent with the complex planning required to execute queries across different data types. Imagine asking, “Which Xylem pumps at Bedford have experienced anomalous temperatures today?” This requires both a database call to find pumps by manufacturer and location, and an API call to check temperature anomalies. Existing agent-based architectures, like ReAct, can orchestrate such tasks but tend to be brittle, expensive, and difficult to scale in real-world production settings.

Introducing a Declarative Solution: siwarex

The researchers propose a more practical architecture called siwarex, which cleanly separates the user’s intent from the execution planning. This system leverages SQL as a declarative language to express user intent and uses User Defined Functions (UDFs) to invoke APIs directly from within SQL queries. By doing so, APIs are treated on the same footing as database tables, allowing the system to utilize decades of research in SQL query optimization for efficient orchestration and aggregation across both databases and APIs.

The siwarex framework relies on two key schemas:

Abstract Schema: Provides a global view of data source properties and relationships, agnostic to whether the source is a database table or an API.
API Mapping Schema: Contains details needed to invoke an API, such as URL, method (POST, GET), and input/output parameters.

These schemas allow a standard Text-to-SQL module to generate SQL queries, even for virtual tables representing APIs. A rule-based Query Rewriter then transforms these queries into executable SQL by replacing virtual tables with their corresponding UDFs, ensuring proper argument passing.

Benchmarking the Approaches

To rigorously test their declarative approach against imperative code generation and agent-based systems, the authors created two new benchmarks, extending the popular Spider dataset:

Benchmark I: Replaces a fraction of real Spider database tables with equivalent API calls, requiring systems to combine database and API interactions.
Benchmark II: Introduces 16 scalar APIs for lexical, numeric, or geospatial operations, transforming existing Spider questions to require interleaving database operations with compositions of these APIs.

Key Findings

Experiments on these new benchmarks demonstrated that the declarative approach significantly outperforms both imperative and agent-based methods, especially when dealing with a mixture of database and API calls. The agentic method, for instance, struggled with sequencing multiple API calls, routing questions to the correct tools, and often hallucinated or improperly bound inputs. The imperative approach, while strong with pure database queries, became less accurate as the mix of APIs and databases became more even, due to the increased complexity of generated Python code.

The declarative approach, by presenting a unified relational view to the LLM, allows it to leverage its Text-to-SQL capabilities more effectively, with the Query Rewriter handling the complexities of API invocation. This separation of concerns leads to more robust and accurate results.

Also Read:

Looking Ahead

This research marks a significant step towards making natural language queries over heterogeneous data sources practical in industrial settings. The authors have released their augmented benchmarks to the research community, encouraging further advancements in this crucial area. Future work aims to address limitations such as evaluating execution performance on larger datasets and incorporating APIs that produce vector or table outputs, further extending the applicability of their declarative framework.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unifying Natural Language Queries Across Databases and APIs with a Declarative Approach

The Challenge of Diverse Data Sources

Introducing a Declarative Solution: siwarex

Benchmarking the Approaches

Key Findings

Looking Ahead

Gen AI News and Updates

Tracing the Evolution of Music Information Retrieval: A 25-Year Journey

Unpacking LPFQA: A New Benchmark for Real-World LLM Evaluation

Advancing Mobile AI: Introducing DigiData for Smarter Device Control

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates