Thinkquel: Advancing Natural Language to Data Transformations with dbt

TLDR: Thinkquel is a new AI model that converts natural language into dbt (Data Build Tool) code for data transformations. It uses a unique synthetic data pipeline (TS-SQL) to generate high-quality training data and a novel reinforcement learning objective (TS-GRPO) that optimizes planning and code generation separately. This approach leads to significantly improved accuracy and stability, making it easier for users to query complex databases without specialized SQL expertise.

In the rapidly evolving landscape of artificial intelligence, the ability to translate natural language into executable code is a significant frontier. A new research paper introduces “Thinkquel,” a novel model designed to tackle the complex challenge of converting natural language requests into production-ready data transformations, specifically using dbt (Data Build Tool).

The paper, titled “Thinkquel: A Model Dedicated to Text-to-dbt Using Synthetic Data and a Span-Aware Objective,” highlights the inherent difficulties in this task. Unlike more forgiving programming languages, SQL and dbt demand extreme precision in schema linking, adherence to specific SQL dialects, and accurate query-level semantics. Even minor errors can lead to complete execution failure. Furthermore, creating high-quality, execution-validated training data is both expensive and scarce, making it difficult for large language models (LLMs) to learn effectively.

Thinkquel addresses these challenges through two primary innovations: a scalable, diverse synthetic data generation pipeline called TS-SQL, and a specialized training objective known as Token–Sequence GRPO (TS–GRPO).

The TS-SQL Data Pipeline: A Foundation of Quality Data

To overcome the scarcity of training data, the researchers developed TS-SQL, a rigorous pipeline that programmatically generates, refines, executes, and curates pairs of natural language requests and dbt models. This pipeline starts by creating millions of dbt model configurations, systematically varying structural parameters and SQL features. These models are then executed against target databases, with any containing errors or timeouts being filtered out.

A crucial step involves semantic refinement. Initially, generated models might have generic names like ‘CTE1’ or ‘col1’. Using advanced LLMs like Qwen3-Coder-480B, these are transformed into meaningful identifiers, enhancing the overall logic. After refinement, models are re-executed to ensure continued validity. Natural language questions are then generated for each validated dbt model, with variations in description and syntax requirements.

Quality control is paramount. Anthropic’s claude-sonnet-4-20250514 is employed to evaluate each question-model pair for clarity, semantic alignment, and technical correctness. Only pairs scoring 9/10 or higher pass the final filtering, ensuring a high standard of training data. This curated dataset is then partitioned, with examples yielding non-empty results feeding into reinforcement learning, and others supporting supervised fine-tuning.

Thinkquel’s Training Methodology: Planning for Precision

Thinkquel’s training incorporates a unique “plan-before-SQL” approach. Instead of generating verbose, free-form thoughts, the model is trained to first produce a concise, structured plan. This plan explicitly lists source tables and columns in a YAML-like format before generating the final dbt code. This structured planning significantly improves schema grounding, reduces hallucinations, and makes the planning process verifiable, allowing for objective rewards based on plan quality.

The training proceeds in two stages of Supervised Fine-Tuning (SFT). The first stage provides a base ability for text-to-dbt tasks, while the second stage refines this by incorporating plan-augmented instances and general instruction-following data, helping the model retain broad conversational abilities.

Reinforcement Learning (RL) further enhances Thinkquel. The RL signal is a composite of multiple rewards, designed to align the model’s behavior with execution-grounded correctness and encourage good planning. These rewards include checks for correct formatting, accurate schema linking (tables and columns), adherence of the generated dbt to the plan, successful execution, and exact result matching against gold standards. These rewards are strategically split: SQL-span rewards focus on execution and result matching, while Plan-span rewards focus on format and schema linking.

Token–Sequence GRPO (TS–GRPO): Bridging the Granularity Gap

The core of Thinkquel’s advanced training lies in TS–GRPO, a novel span-aware reinforcement learning objective. Traditional methods often struggle with a “granularity mismatch” – the most critical feedback (execution success) is sequence-level, but updates are often token-level, leading to instability.

TS–GRPO addresses this by treating the model’s output as two distinct spans: a reasoning span (the plan) and an answer span (the dbt code). It computes separate advantages for rewards associated with each span. Crucially, it applies a sequence-level, length-normalized importance ratio only to the dbt code span, ensuring that the unit of credit assignment matches the sequence-level nature of SQL-related rewards. For the reasoning span, it retains token-level importance ratios, which are more suitable for local, structural signals like schema linking. This dual approach, along with support for asymmetric clipping (tighter for SQL, looser for plans), significantly reduces variance and prevents credit leakage between the plan and the code, leading to more stable and efficient learning.

Impressive Performance

Thinkquel demonstrates impressive results across various benchmarks. On the Spider dataset, TS–GRPO showed faster and steadier convergence of execution-match rewards compared to existing methods like GSPO and GRPO. On the 500-example TS–SQL test set, Thinkquel (32B) achieved 93.2% execution success and 61.8% exact-result match, a substantial improvement over the base model. Even on the out-of-domain BIRD-dbt dataset, Thinkquel maintained strong performance, reaching 73.5% match at 92.9% execution, proving its robustness and portability.

The researchers note that while the two-stage SFT curriculum provides the initial significant boost in capability, TS–GRPO plays a vital role in tightening execution-aligned optimization and closing the remaining performance gap. The explicit planning mechanism also provides measurable benefits in schema grounding and reduces error propagation.

Also Read:

Looking Ahead

While Thinkquel represents a significant leap forward, the authors acknowledge areas for future improvement. Residual failures often stem from schema reference errors, indicating a need for even more robust schema linking. Future work will focus on wider dataset coverage, more realistic question styles, integrating reinforcement learning with tool use (e.g., schema inspection), and extending evaluation across multiple data warehouses to enhance cross-warehouse portability. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Thinkquel: Advancing Natural Language to Data Transformations with dbt

The TS-SQL Data Pipeline: A Foundation of Quality Data

Thinkquel’s Training Methodology: Planning for Precision

Token–Sequence GRPO (TS–GRPO): Bridging the Granularity Gap

Impressive Performance

Looking Ahead

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates