Language Models Struggle with Efficiency in Deductive Reasoning, Study Finds

TLDR: A new research paper introduces a framework using logic programming to evaluate the efficiency of language models (LMs) in deductive reasoning, beyond just correctness. By injecting irrelevant information into math word problems, the study found that LMs exhibit significant accuracy declines and generate proofs with frequent detours through unnecessary inferences. This inefficiency is particularly pronounced when irrelevant information semantically overlaps with the query, highlighting a need for LMs to better discern and ignore distractions.

Large language models (LMs) have shown impressive capabilities in deductive reasoning, solving a wide array of complex tasks. However, a new study suggests that while these models might often get to the correct answer, they are far from efficient in their reasoning process, especially when faced with irrelevant information.

Human-like reasoning often involves sifting through vast amounts of data and skillfully ignoring distractions to arrive at a conclusion. This efficiency, a crucial aspect of intelligence, has largely been overlooked in standard evaluations of LMs, which primarily focus on correctness.

The Efficiency Gap in AI Reasoning

Researchers propose a novel framework to assess how efficiently LMs reason, drawing insights from logic programming. Imagine a proof as a path in a complex network of information. The most efficient proof is simply the shortest path to the goal. The new method aligns the natural language proofs generated by LMs with these shortest, most direct proofs found through logic programming. This allows for a precise quantification of efficiency, specifically by measuring how well a model avoids unnecessary inferences.

To test this, a unique dataset was created using grade school math word problems. These problems were deliberately injected with varying amounts of irrelevant information, or ‘axioms,’ which also differed in how much they semantically overlapped with the problem’s actual goal. For instance, a problem asking about ‘Ryan’s cats’ might include irrelevant information also mentioning ‘Ryan’ or ‘cats,’ making it harder for the LM to distinguish relevant from irrelevant data.

Key Findings: Distractions Degrade Performance

The empirical results were striking. Current LMs experienced noticeable declines in accuracy when irrelevant axioms were introduced. This performance drop occurred even with minimal, domain-consistent distractions and worsened as more irrelevant information was added. The proofs generated by these models frequently took ‘detours’ through inferences that were not necessary to solve the problem.

One significant observation was that LMs were particularly inefficient when the irrelevant axioms had semantic overlap with the query. This suggests that while LMs might use lexical cues (like matching names or entities) as a heuristic to guide their search, this strategy can backfire when those cues lead to irrelevant paths.

Furthermore, the study found that LMs often generated more tokens than required to solve problems correctly. This indicates that inefficiency isn’t just about making irrelevant logical steps, but also about verbosity in expressing those steps in natural language. The efficiency scores, which measure the ratio of the shortest proof length to the LM’s proof length, were consistently far from 100%, confirming that models frequently produced irrelevant theorems even when they reached the correct final answer.

The research also compared performance on ‘non-ground’ queries (e.g., “How many drones does Yanick have?”) versus ‘ground’ queries (e.g., “Show that Yanick has 5 drones.”). LMs generally performed better on non-ground queries, which are more common in typical math word problems, suggesting a potential influence from their training data.

Also Read:

A Call for More Efficient AI Reasoning

This work, titled “Are Language Models Efficient Reasoners? A Perspective from Logic Programming”, by Andreas Opedal, Yanick Zengaffinen, Haruki Shirakami, Clemente Pasti, Mrinmaya Sachan, Abulhair Saparov, Ryan Cotterell, and Bernhard Schölkopf, highlights a critical area for improvement in language models. It underscores the need for models that not only provide correct answers but also do so efficiently, by effectively identifying and ignoring irrelevant information. The framework presented offers a valuable tool for future research aimed at developing more human-like and resource-efficient AI reasoning systems.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Language Models Struggle with Efficiency in Deductive Reasoning, Study Finds

The Efficiency Gap in AI Reasoning

Key Findings: Distractions Degrade Performance

A Call for More Efficient AI Reasoning

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

STV: Smarter In-Context Learning for Multimodal AI

Unveiling LLM Refusal: A Multi-Directional Approach Using Self-Organizing Maps

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates