LyS System: Zero-Shot Code Generation for Answering Questions from Tables

TLDR: The LyS system introduces a zero-shot approach for Tabular Question Answering (Tabular QA) by using a Large Language Model to generate Python code that extracts information from tables. Its modular pipeline includes a column selector, a code generator, and an iterative error-handling module that refines code based on execution failures. The system performed well in SemEval 2025 Task 8, demonstrating the effectiveness of zero-shot code generation for Tabular QA, despite challenges with highly complex data types.

In the evolving landscape of artificial intelligence, the ability for machines to understand and answer questions based on structured data, known as Tabular Question Answering (Tabular QA), is becoming increasingly vital. This field holds immense potential for real-world applications, from analyzing financial reports and business intelligence to exploring scientific datasets. Unlike traditional question answering that deals with unstructured text, Tabular QA requires systems to navigate tables, understand column relationships, and handle various data types to extract precise information.

Historically, Tabular QA systems often relied on complex supervised methods, involving structured prediction or sequence-to-sequence models that required extensive training on large annotated datasets. However, with the emergence of powerful instruction-based Large Language Models (LLMs), a new paradigm has taken hold: zero-shot generation. This approach allows models to generate answers without prior task-specific fine-tuning, significantly reducing the need for vast amounts of labeled data.

A recent paper, “LyS at SemEval 2025 Task 8: Zero-Shot Code Generation for Tabular QA,” explores this zero-shot approach by leveraging LLMs to dynamically generate functional code. This code is designed to extract relevant information from tabular data based on a user’s input question. The team behind LyS developed a modular pipeline to enhance accuracy and reliability, consisting of three main stages.

The LyS System: A Modular Approach

The LyS system is built around a core idea: using an LLM to generate executable code. To support this, it incorporates additional components that refine the process and improve robustness:

Column Selector: This initial module uses an instruction-based LLM to identify the most relevant columns in a table for a given question. Instead of relying on predefined rules, it intelligently determines which parts of the table are essential for answering the query.
Answer Generator: Once the relevant columns are identified, this component instructs another LLM to generate Python code. Python was chosen due to its widespread use in data analysis and strong support for tabular data processing through libraries like Pandas. This generated code is then executed to retrieve the answer from the tabular source.
Code Fixer: A crucial part of the pipeline, this module captures any execution errors that might occur due to incorrect syntax or data mismatches. If an error is detected, the error message and context are fed back into the LLM, prompting it to regenerate a corrected version of the code. This iterative refinement process significantly enhances the system’s reliability.

The system also includes a preprocessing step to standardize column names and infer common data schemes, which helps prevent errors in the code generation phase.

Performance and Insights

The LyS team participated in the SemEval 2025 Task 8, a competition that provided a diverse dataset of real-world tabular data. Their zero-shot approach meant no explicit training or fine-tuning was conducted; instead, they validated different open-source LLMs on a development dataset to select the best performer. Models like Qwen-2.5-Coder (7B and 32B versions), Mistral-7B, and Codestral-22B were tested, with Qwen-2.5-Coder 32B showing superior performance.

During the development phase, the LyS system consistently outperformed the baseline, demonstrating the viability of zero-shot code generation for Tabular QA. The integration of the Column Selector module led to a clear improvement in accuracy, highlighting the importance of pre-selecting relevant attributes. Furthermore, the Code Fixer module, especially when combined with an enhanced column selection, significantly boosted performance, particularly for Subtask 1 which involved larger databases. This showed that incorporating error feedback helps the LLM generate better queries.

In the final test phase of the competition, the best-performing configuration of LyS achieved a respectable rank of 33 out of 53 participants. While there was a noticeable drop in accuracy compared to the development phase results, this was attributed to the increased complexity of data types in the test tables, such as lists not enclosed by brackets or dictionaries with variable keys. This indicates that while the system is robust, handling highly complex and inconsistently formatted data types remains a challenge.

Also Read:

Looking Ahead

The LyS system demonstrates that zero-shot code generation is a valid and promising approach for Tabular QA, capable of adapting to different dataset schemes without extensive training. Future work aims to further refine prompt templates, improve schema adaptation, optimize execution efficiency, and potentially incorporate a voting system with multiple LLMs. Enhancing the detection and handling of complex data types is also a critical area for improvement, as it will make the system more generalizable to the vast amount of less structured online data. For more technical details, you can refer to the full research paper available here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

LyS System: Zero-Shot Code Generation for Answering Questions from Tables

The LyS System: A Modular Approach

Performance and Insights

Looking Ahead

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates