Enhancing Multilingual Text-to-SQL with Semantic Contrastive Rewards

TLDR: This paper introduces a new framework that combines Group Relative Policy Optimization (GRPO) with a multilingual contrastive reward signal to improve Text-to-SQL systems across different languages. By focusing on semantic alignment in addition to execution correctness, the method enables a smaller Llama-3 3B model to achieve higher execution accuracy than a larger Llama-3 8B zero-shot model, and significantly boosts semantic accuracy, especially in non-English languages, using only 3,000 training examples.

A new research paper introduces an innovative framework designed to significantly enhance the accuracy and semantic understanding of Text-to-SQL systems, especially across multiple languages. Titled “Bridging the Semantic Gap: Contrastive Rewards for Multilingual Text-to-SQL,” this work addresses a critical challenge in current Text-to-SQL methods: the semantic alignment between a user’s natural language query and the generated SQL query, particularly when moving beyond English. Existing systems often experience a notable drop in performance, averaging 6 percentage points, when dealing with non-English languages. This research aims to close that gap by ensuring that the generated SQL not only executes correctly but also accurately reflects the user’s original intent.

The core of the proposed solution lies in combining Group Relative Policy Optimization (GRPO) with a novel multilingual contrastive reward signal. GRPO is a reinforcement learning algorithm that helps fine-tune language models in a stable and memory-efficient way. Unlike traditional methods that might struggle with the instability of large language models, GRPO provides a robust framework for learning. The researchers adapted GRPO, which was originally developed for mathematical reasoning tasks, to the Text-to-SQL domain.

The truly innovative aspect is the introduction of a contrastive reward signal. Instead of relying solely on whether a generated SQL query executes without errors (a binary “correct” or “incorrect” signal), this new reward provides continuous feedback on how closely the generated SQL’s meaning aligns with the user’s natural language query. This semantic reward is computed using a specially trained multilingual contrastive encoder, built upon XLM-RoBERTa-base. This encoder creates embeddings (numerical representations) for both the input question and the gold-standard SQL, then calculates their cosine similarity. A higher similarity score indicates better semantic alignment, guiding the model to understand the user’s true intent more accurately.

The framework integrates several feedback signals during training: an execution reward (binary, for correct results), a syntax reward (for executable queries), a schema-matching reward (for correct table and column usage), and the crucial semantic reward. By combining these, the model learns to produce SQL that is not only syntactically valid and executable but also deeply faithful to the user’s meaning across different languages.

Experiments were conducted on the MultiSpider dataset, which includes parallel queries in seven languages: Vietnamese, Spanish, Japanese, German, English, Chinese, and French. The researchers fine-tuned a Llama-3 3B model using their approach (L3B-GRPO-C). The results were remarkable: the Llama-3 3B model, when fine-tuned with the contrastive reward, achieved an average execution accuracy of 88.86% and a semantic accuracy of 59.14%. This represents a substantial improvement of +27.43 percentage points in execution accuracy and +39.71 percentage points in semantic accuracy over the zero-shot Llama-3 3B baseline.

Perhaps even more impressively, the fine-tuned Llama-3 3B model (L3B-GRPO-C) outperformed a much larger zero-shot Llama-3 8B model in average execution accuracy (88.86% vs. 81.43%). While the 8B model still held a lead in semantic accuracy, the 3B model significantly narrowed the gap. This demonstrates that targeted fine-tuning with semantic awareness can enable smaller, more resource-efficient models to achieve performance levels comparable to, or even exceeding, much larger models in zero-shot settings. The method achieved these gains with only 3,000 reinforcement learning training examples, highlighting its sample efficiency.

A qualitative example from Vietnamese queries illustrated the power of the contrastive reward. In one instance, a model without the contrastive reward generated SQL that executed successfully but had subtle semantic inaccuracies (e.g., using `>=` instead of `>` and not counting distinct movies). The model trained with the contrastive reward, however, correctly captured these nuances, producing SQL that precisely matched the user’s intent, even when both queries yielded the same results on a specific test database state. This highlights how execution accuracy alone can sometimes be misleading, and the semantic reward provides a deeper level of correctness.

Also Read:

Ablation studies further confirmed the importance of the contrastive reward and the choice of the XLM-RoBERTa encoder. Removing the contrastive reward or using a less capable encoder significantly reduced the semantic accuracy gains. This research paves the way for more accessible and resource-efficient high-quality multilingual Text-to-SQL systems, enabling users worldwide to interact with databases in their native languages with greater precision. You can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Multilingual Text-to-SQL with Semantic Contrastive Rewards

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates