TLDR: SQLord is an enterprise-level framework that converts natural language into SQL queries (NL2SQL). It tackles real-world challenges like complex business logic and data scarcity by using reverse data generation to create training data, workflow decomposition for complex queries, and a flexible GPT-Judge evaluation system. Developed by Alibaba Group, it has shown significant performance improvements in both offline and online tests, making it effective for large-scale business applications.
In the world of data-driven business, the ability to transform natural language questions into precise SQL queries (NL2SQL) is incredibly valuable. Imagine asking a question like, “What have been the best-selling products for my competitors and me over the past half year?” and instantly getting the data you need. While this seems straightforward, existing NL2SQL systems often struggle with the complexities of real-world business logic, the scarcity of domain-specific training data, and the challenges of accurately evaluating the generated SQL.
Addressing these critical issues, researchers from Alibaba Group – Song Cheng, Qiannan Cheng, Linbo Jin, Lei Yi, and Guannan Zhang – have introduced a groundbreaking enterprise-level NL2SQL framework called SQLord. This innovative solution aims to make data interaction more robust and accessible for large-scale business applications.
Overcoming Real-World Challenges
The paper highlights several key difficulties faced by traditional NL2SQL frameworks. Firstly, business queries often involve complex logic that requires deep domain knowledge, such as defining “best-selling” or identifying competitors. Secondly, there’s a significant lack of labeled SQL query data for training advanced models, as most SQL queries in daily development are not annotated. Lastly, evaluating NL2SQL performance is tricky because SQL queries can achieve the same result with different syntax, making direct string comparisons unreliable.
SQLord’s Three Pillars of Innovation
SQLord tackles these challenges with a three-pronged approach:
1. Reverse Data Generation: To combat data scarcity, SQLord introduces a clever technique. It trains a “Reverse Generation Model” (RevLLM) using existing SQL statements and their developer comments. This RevLLM then generates natural language descriptions (queries) for a vast collection of raw SQL statements. This process effectively creates a large dataset of
2. Automated Workflow Generation: Complex business questions often require breaking down into multiple sub-tasks. SQLord employs a sophisticated task decomposition strategy. Given a user query, it first retrieves relevant information from a domain knowledge base and database schema. Then, an LLM dynamically generates a sequence of sub-tasks, considering dependencies. Each sub-task is converted into an SQL query by SQLLM and executed. The results from these sub-tasks are then aggregated to provide a comprehensive answer to the original complex query. This “Retrieval-Augmented Generation” (RAG) like approach ensures that even highly intricate queries involving multi-table joins or nested SQL can be handled systematically.
3. Comprehensive GPT-Judge Evaluation Framework: Evaluating NL2SQL models accurately is crucial. SQLord introduces GPT-Judge, a flexible evaluation system with three modes tailored for different scenarios:
- Execution Evaluation (EXE): If a database is available, it compares the execution results of the generated SQL with the ground truth SQL.
- Query-SQL Evaluation (QSE): When ground truth SQL is unavailable, an LLM (like GPT-4) assesses whether the generated SQL aligns with the intent of the natural language query.
- SQL-SQL Evaluation (SSE): If both generated and ground truth SQL are available, an LLM compares their structural and semantic consistency, accounting for syntactic variations.
This multi-faceted evaluation ensures robust assessment even in real-world settings where complete information might not be present.
Impressive Performance in Real-World Scenarios
The researchers conducted extensive offline and online evaluations. Offline tests on both the open-source Spider dataset and a proprietary Real-World Dataset (from Alibaba’s B2B e-commerce platform) showed SQLord (using Qwen as its base LLM) significantly outperforming state-of-the-art baselines, including GPT-4-based frameworks. For instance, on the Real-World Dataset, SQLord achieved an 86.5% execution accuracy, a substantial improvement over other methods.
Ablation studies further confirmed that each component of SQLord—reverse data generation and workflow generation—contributes significantly to its superior performance.
Crucially, SQLord was successfully applied in two real-world enterprise scenarios: a Customs Import-Export Assistant and Intelligent Product Selection. Online evaluations demonstrated a remarkable increase in execution accuracy, highlighting SQLord’s robustness and adaptability in handling complex, real-time queries on the world’s largest B2B e-commerce platform.
Also Read:
- TableReasoner: Enhancing Question Answering on Tabular Data with LLMs and Programming
- Bridging Language and Logic: A New Framework for AI Reasoning
The Future of Data Interaction
SQLord represents a significant leap forward in enterprise NL2SQL solutions. By effectively addressing the challenges of data scarcity, complex business logic, and robust evaluation, it provides a scalable and adaptable foundation for advancing intelligent data systems. Its integration with large language models also paves the way for continuous improvement and innovation in how businesses interact with their data. For more details, you can refer to the full research paper: SQLord: A Robust Enterprise Text-to-SQL Solution via Reverse Data Generation and Workflow Decomposition.


