spot_img
HomeResearch & DevelopmentUnlocking More Reliable AI Reasoning: A Solvability-Based Approach to...

Unlocking More Reliable AI Reasoning: A Solvability-Based Approach to Multiple-Choice Questions

TLDR: This research paper introduces ‘solvability’ as a new metric to assess if a large language model (LLM) can genuinely solve a multiple-choice question, rather than just guessing. By integrating this solvability into outcome-supervised reward models (MCQ-ORM) and reinforcement learning (MCQ-DrGRPO), the authors demonstrate significant improvements in the process-correctness of the AI’s reasoning steps and, in RL, also enhance final answer accuracy. This approach helps reduce ‘hallucinations’ and makes AI reasoning more reliable by focusing learning on genuinely solvable problems.

Large language models (LLMs) have shown remarkable abilities in complex reasoning tasks, often by generating a ‘chain of thought’ (CoT) – a series of intermediate steps leading to a final answer. However, the quality of this reasoning isn’t just about getting the right answer; it’s also about whether the steps taken are logically sound and correct. Sometimes, an LLM might arrive at the correct answer through a flawed or ‘spurious’ reasoning process, leading to what are known as false positives. This issue is particularly noticeable in multiple-choice question answering (MCQA), where models can sometimes guess correctly without truly understanding the problem.

A recent research paper, titled ‘BOOSTING PROCESS-CORRECT COT REASONING BY MODELING SOLVABILITY OF MULTIPLE-CHOICE QA,’ by Raphael Schumann and Stefan Riezler, delves into this challenge. The authors propose a novel approach: explicitly modeling the ‘solvability’ of a question for a given LLM. They argue that when a question is effectively unsolvable for a model, it’s more prone to generating these misleading, process-incorrect CoTs. By understanding and quantifying solvability, they aim to make AI reasoning more reliable and reduce ‘hallucinations’ – instances where the model generates plausible but incorrect information.

Understanding Solvability

At its core, solvability refers to the probability that a model’s true performance on a question exceeds random guessing. For multiple-choice questions, random guessing is simply 1 divided by the number of options. The researchers estimate this ‘true performance’ by sampling multiple CoTs for each question and observing how many lead to the correct answer. The more correct answers observed, the higher the estimated solvability. Interestingly, the number of answer choices and the number of CoTs sampled significantly influence this estimation. More choices or more samples provide a clearer picture of whether a question is genuinely solvable for the model.

The paper empirically demonstrates a strong link between a question’s solvability and the model’s ability to generate a ‘process-correct’ CoT – one where the thought process itself is judged to be valid. If a question is deemed unsolvable, the model is highly unlikely to produce a correct thought process, even if it occasionally stumbles upon the right answer.

Improving Reasoning at Test-Time

One way to enhance reasoning is to select the best CoT from multiple generated options. Traditionally, outcome-supervised reward models (ORMs) are trained to predict if a generated answer is correct. Schumann and Riezler introduce a modification called MCQ-ORM, which incorporates the estimated solvability into the ORM’s objective. This means that CoTs generated for questions that are likely unsolvable (and thus prone to false positives) are given less weight. This adjustment helps the model prioritize and select CoTs that are not only answer-correct but also more likely to be process-correct. Experiments on various math reasoning datasets showed that MCQ-ORM consistently outperformed standard ORMs and other baselines in selecting process-correct CoTs.

Also Read:

Reinforcement Learning with Solvability-Adjusted Advantage

The researchers also applied their solvability concept to reinforcement learning (RL), specifically by adjusting the ‘advantage’ calculation in algorithms like DrGRPO. Advantage in RL determines how much a particular action (generating a CoT) is favored. They found that traditional advantage calculations could sometimes over-emphasize correct answers from questions where the model was largely guessing, leading to noisy learning signals. To counter this, they proposed MCQ-DrGRPO, which multiplies the advantage by the question’s solvability. This effectively down-weights CoTs from unsolvable questions, focusing the learning on instances with higher ‘learning potential’ – a balance between novelty (how much the model currently struggles) and solvability (how likely it is to genuinely learn).

The results from RL experiments were even more compelling. MCQ-DrGRPO consistently achieved higher rewards during training and led to significant improvements in both process accuracy and answer accuracy across math and novel multimodal reasoning datasets (like geo-guessing and year-guessing from images). The analysis further revealed that MCQ-DrGRPO leads to a ‘sharper’ output distribution, meaning the model generates correct CoTs more consistently, rather than relying on diverse but potentially noisy outputs. For more technical details, you can refer to the full research paper here.

In conclusion, this research highlights ‘solvability’ as a crucial factor for developing more reliable and less hallucinatory LLM reasoning. By explicitly modeling whether a question is genuinely within a model’s grasp, and integrating this understanding into both reward models and reinforcement learning, we can significantly boost the process-correctness of AI’s thought processes, leading to more trustworthy and accurate AI systems.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -