Boosting LLM Reasoning: A New Approach with Question Augmentation

TLDR: QuestA is a novel strategy that enhances the multi-step reasoning capabilities of large language models (LLMs), particularly on challenging problems. By augmenting training data with partial solutions, QuestA provides more informative learning signals during reinforcement learning (RL) training. This simple yet effective method improves performance on math reasoning tasks, achieving state-of-the-art results for 1.5B-parameter models and demonstrating significant gains in sample efficiency without causing entropy collapse.

Large Language Models (LLMs) have made incredible strides in various complex tasks, from writing creative text to solving intricate problems. A key method behind their advanced reasoning abilities is Reinforcement Learning (RL). However, recent observations have highlighted a challenge: standard RL often struggles to significantly improve multi-step reasoning, especially when faced with very difficult problems.

A new research paper titled “QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation” introduces an innovative and straightforward approach to tackle this limitation. Authored by Jiazheng Li, Hong Lu, Kaiyue Wen, Zaiwen Yang, Jiaxuan Gao, Hongzhou Lin, Yi Wu, and Jingzhao Zhang, the paper proposes a method called QuestA, which stands for Question Augmentation.

What is QuestA and How Does It Work?

QuestA’s core idea is surprisingly simple yet highly effective: it introduces ‘partial solutions’ into the training process of LLMs. Imagine a student struggling with a complex math problem. Instead of just telling them the final answer, you give them a hint—the first few steps of the solution. This makes the problem less daunting and provides a clearer path forward. QuestA applies this same principle to LLMs.

Unlike other methods that might tweak the RL algorithm itself or change how rewards are given, QuestA operates purely at the input level. When training an LLM, especially on problems where the model initially fails completely, QuestA takes the original question and prepends a segment of its correct solution. For instance, it might add the first 50% of the solution sketch as a hint to the prompt. This ‘scaffolding’ helps the model explore the problem space more effectively and find correct solutions, even when it would otherwise get stuck due to a lack of positive feedback.

The researchers specifically focused on challenging math reasoning tasks. They used a dataset of 26,000 difficult problems from OpenR1-Math-220K. By injecting these partial solutions, QuestA provides a denser and more informative learning signal, allowing the RL process to make progress where it previously stalled. This approach also helps prevent ‘entropy collapse,’ a phenomenon where the model’s output becomes too narrow, limiting its ability to explore diverse solutions. QuestA, in contrast, encourages more varied and exploratory behavior.

Why is This Important?

The paper provides theoretical backing for QuestA’s effectiveness, explaining that it significantly improves ‘sample efficiency.’ In simpler terms, it means the model needs far fewer attempts to find a correct solution because the hints guide it more directly. This is crucial for training large models, as it can save substantial computational resources and time.

Impressive Results

QuestA was applied to strong open-source 1.5B-parameter models, DeepScaleR and Nemotron, and the results were remarkable. The method achieved new state-of-the-art performance on several challenging math benchmarks:

AIME24: 67.1% accuracy (a 5.3% improvement)
AIME25: 59.5% accuracy (a significant 10.0% improvement)
HMMT25: 35.5% accuracy (a 4.0% improvement)

What’s particularly impressive is that QuestA-enhanced 1.5B-parameter models not only outperformed other models of similar size but also matched or even exceeded the performance of much larger models, such as DeepSeek-R1-Distill-32B, on several benchmarks. This demonstrates QuestA’s ability to unlock deeper reasoning capabilities in smaller models through targeted training.

Even though QuestA was trained exclusively on mathematical problems, it showed minor improvements in other domains like general knowledge, logic, and coding tasks, suggesting its potential for broader application. An ablation study using a different dataset, OpenMathReasoning, yielded similar positive results, further indicating the generalizability of the method.

Also Read:

Looking Ahead

QuestA offers a practical and broadly applicable pathway for expanding the reasoning capacity of LLMs through RL. By focusing on data augmentation rather than complex algorithmic changes, it provides a flexible tool for improving model performance on difficult tasks. The researchers believe this method could be extended to other challenging domains like competitive coding and software engineering, paving the way for even more capable AI systems. You can read the full research paper for more details at https://arxiv.org/pdf/2507.13266.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Boosting LLM Reasoning: A New Approach with Question Augmentation

What is QuestA and How Does It Work?

Why is This Important?

Impressive Results

Looking Ahead

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Avalara Secures $500 Million Investment from BlackRock to Propel AI-Powered Tax Automation

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates