Rethinking Ambiguity: A Cooperative Approach to Natural Language Data Queries

TLDR: This research paper challenges the traditional view of ambiguity in natural language queries for tabular data analysis. Instead of seeing ambiguity as a flaw, the authors propose reframing it as a feature of cooperative interaction between users and systems. They introduce a framework distinguishing ‘cooperative queries’ (resolvable) from ‘uncooperative queries’ (irresolvable), detailing how systems can infer meaning through conventional and selective grounding. The paper argues for evaluating systems based on their interpretation capabilities and execution accuracy separately, highlighting issues with existing benchmarks that often contain ‘data-privileged queries’. It concludes by outlining implications for designing more cooperative systems and evaluation practices that embrace shared responsibility in query specification.

Natural language interfaces that allow us to ask questions about tabular data, like spreadsheets or databases, often struggle with ambiguity. When we ask a question in everyday language, it might not always be perfectly clear what data we mean or what kind of analysis we want. Traditionally, this ambiguity has been seen as a problem that needs to be fixed by making systems smarter at guessing our intent.

However, a new research paper titled “Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis” by Daniel Gomm, Cornelius Wolff, and Madelon Hulsebos, proposes a different way to look at this challenge. Instead of viewing ambiguity as a flaw, they suggest it can be a valuable part of a cooperative interaction between a user and a system. They argue that the responsibility for specifying a query should be shared, with users providing what they know and systems inferring what’s left unsaid.

Understanding Cooperative Queries

The paper introduces a framework that categorizes queries into two main types: cooperative and uncooperative. A cooperative query is one that provides enough information, either directly or through reasonable inference, for the system to figure out at least one valid way to interpret and act on the query. An uncooperative query, on the other hand, is too vague or underspecified, making it impossible for the system to identify a valid interpretation.

To understand a query, a system needs to perform what the authors call “actionable query interpretation.” This involves figuring out both the analytical procedure (like calculating a mean or median) and the exact data to apply it to (such as a specific time period or location). This process relies on a division of labor:

User-provided grounding: Information explicitly stated in the query (e.g., “Apple Inc.”) or contextually implied (e.g., “past 20 years” given the current year).
System-inferred grounding: Where the user expects the system to fill in the gaps. This can happen in two ways:
- Conventional grounding: Resolving ambiguity using common sense or universal conventions (e.g., “highest mountain” implies “in the world”).
- Selective grounding: When users intentionally leave choices open, allowing the system to pick from a set of reasonable options (e.g., “relationship between…” might let the system choose between Pearson or Spearman correlation).

The paper emphasizes that underspecification isn’t always a problem; it can be an intentional way for users to delegate tasks to the system. Queries that require no selective grounding and map to a single interpretation through user-provided and conventional grounding are called unambiguous queries.

Rethinking Evaluation and System Design

This new perspective has significant implications for how we evaluate natural language systems for tabular data. The authors argue that current evaluation methods often mix different query types, making it hard to tell if a system is failing because it can’t understand the query (interpretation) or because it can’t execute the analysis correctly (accuracy).

They suggest that unambiguous queries are best for testing a system’s execution accuracy, while cooperative queries that require selective grounding are ideal for evaluating a system’s ability to make reasonable, human-aligned choices when faced with controlled ambiguity. Uncooperative queries can be used to test how robust a system is when it encounters unanswerable questions.

The paper also highlights the issue of “data-privileged queries” in existing benchmarks. These are queries that use knowledge (like specific column headers or internal IDs) that an average user in an open-domain setting wouldn’t have. Such queries provide an unrealistic advantage and don’t accurately reflect real-world interactions.

Their analysis of 15 popular datasets revealed that many contain a high number of data-privileged queries and a surprisingly low number of truly unambiguous queries. This means current benchmarks often don’t effectively isolate and test a system’s interpretation and execution capabilities.

Also Read:

Moving Forward

To elevate natural language systems for tabular data analysis, the researchers propose several directions:

More effective evaluation: Augmenting existing datasets with annotations that describe query specification levels, allowing for more targeted evaluations.
Novel datasets: Creating new datasets that explore multiple ways to interpret underspecified queries, testing a system’s ability to recognize when selective grounding is needed and to make appropriate choices or ask for clarification.
Cooperative system design: Building systems that actively engage in a productive division of labor with users, proactively interpreting queries and disclosing their grounding choices.
Cooperative dialogue: Moving beyond single-shot queries to iterative conversations where systems can resolve ambiguities by asking clarifying questions.

Ultimately, the paper advocates for a fundamental shift in how we approach natural language interfaces for tabular data, moving from a mindset of fixing ambiguity to one of embracing cooperation in resolving queries. This will lead to more informed design and evaluation, paving the way for truly intelligent and user-friendly data analysis tools.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Rethinking Ambiguity: A Cooperative Approach to Natural Language Data Queries

Understanding Cooperative Queries

Rethinking Evaluation and System Design

Moving Forward

Gen AI News and Updates

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

Dremio Launches ‘The Agentic Lakehouse’ for AI-Driven Data Management

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates