TLDR: E3-Rewrite is a new framework that uses large language models (LLMs) to automatically rewrite SQL queries. Unlike traditional rule-based methods, E3-Rewrite learns to generate queries that are not only syntactically correct and semantically identical to the original but also significantly more efficient. It achieves this by incorporating execution plan insights, using a reinforcement learning approach with a staged training strategy, and leveraging a library of successful past rewrites. Experiments show it drastically reduces query execution times and improves the success rate of rewrites across various SQL benchmarks.
In the world of databases, efficient query processing is paramount. SQL query rewriting is a technique used to transform a given SQL query into a more efficient form while ensuring it produces the exact same results. Traditionally, this has been done using predefined rules, but these rule-based methods often struggle with complex queries and new patterns, and they can’t capture all effective rewriting strategies.
The Challenge of SQL Rewriting
The limitations of rule-based systems are clear: fixed rules don’t adapt well, rule dependencies create a fragile search space, and many powerful rewriting strategies (like those involving Common Table Expressions or specific evaluation orders) fall outside their scope. While Large Language Models (LLMs) show promise for generating rewrites, directly applying them often leads to queries that are not optimal, don’t produce equivalent results, or even fail to execute due to a lack of understanding of how databases actually run queries.
Introducing E3-Rewrite: A New Approach
To overcome these challenges, researchers from Soochow University, Hong Kong University of Science and Technology, Zhejiang Normal University, ByteDance Inc., Alibaba Group, Southeast University, and University of Electronic Science and Technology of China have proposed a groundbreaking framework called E3-Rewrite. This LLM-based system is designed to produce SQL queries that are Executable, Equivalent, and Efficient. You can read the full research paper here: E3-Rewrite: Learning to Rewrite SQL for Executability, Equivalence, and Efficiency.
E3-Rewrite moves beyond fixed rules by training an LLM to directly generate optimized SQL rewrites. It tackles the core issues of LLMs lacking execution awareness and semantic grounding, and the instability of optimizing for multiple, sometimes conflicting, objectives like correctness and performance.
How E3-Rewrite Works
The framework integrates three core components:
-
Execution-Guided Context Construction: E3-Rewrite doesn’t just look at the SQL query. It first analyzes the query’s execution plan – essentially, how the database intends to run the query. This plan reveals inefficiencies like full table scans or unindexed joins. This ‘execution hint’ is then fed to the LLM, guiding it to identify and fix performance bottlenecks.
-
Reinforcement Learning Framework: Instead of relying on predefined rules, E3-Rewrite uses a reinforcement learning (RL) approach. The LLM generates multiple candidate rewrites, which are then evaluated based on three criteria: executability (does it run?), equivalence (does it produce the same result?), and efficiency (is it faster?). A reward function combines these factors, and the model learns to generate better rewrites through this feedback. To ensure stable learning, a two-stage curriculum is used: first, the model focuses on generating correct (executable and equivalent) queries, and then it gradually incorporates efficiency optimization.
-
Hybrid Demonstration Retrieval: To help the LLM generalize to new queries, E3-Rewrite maintains a pool of past successful rewrites. When a new query comes in, the system retrieves similar examples from this pool based on both their structural patterns (how the query is built) and their semantic meaning. If a new rewrite significantly improves performance, it’s added to this pool, allowing the system to continuously learn and improve.
Impressive Results
Extensive experiments on widely used SQL benchmarks like TPC-H, IMDB, and DSB demonstrate E3-Rewrite’s superior performance. It achieved up to a 25.6% reduction in query execution time compared to state-of-the-art methods. Furthermore, it delivered up to 24.4% more successful rewrites, expanding its coverage to complex queries that previous systems struggled with. The system also showed strong robustness across varying data scales, consistently maintaining low latencies even with larger datasets.
The ablation study, which tested the system with individual components removed, highlighted the critical role of each part: reinforcement learning ensures correctness and efficiency, execution plan hints provide crucial structural awareness, and demonstration retrieval enables better generalization and pattern reuse.
Also Read:
- SQL-Exchange: Bridging Database Schemas with Intelligent Query Transformation
- Streamlining Database Interaction: An End-to-End Text-to-SQL Framework with Automated Database Selection
The Future of SQL Optimization
E3-Rewrite represents a significant leap forward in SQL query rewriting. By combining execution-guided context, reinforcement learning with detailed rewards, and a dynamic demonstration retrieval system, it offers a powerful, end-to-end solution for generating highly optimized SQL queries without relying on rigid rule sets. This integration of plan-based context and RL holds immense potential for creating more robust and adaptable SQL rewriting systems for modern database environments.


