TLDR: A new method called EVoSS improves Large Language Models’ (LLMs) ability to solve Math Word Problems (MWPs) by having the LLM generate equations, an external solver find a precise answer, and then the LLM estimate the answer for verification. This approach achieves state-of-the-art results on numeric and algebraic MWPs, successfully tackles trigonometric problems, and introduces new datasets for advanced testing.
Large Language Models (LLMs) have shown remarkable capabilities across many tasks, from generating text to answering complex questions. However, one area where they often face significant hurdles is solving Math Word Problems (MWPs). These problems demand a blend of reasoning and mathematical skills that LLMs frequently struggle with, leading to incorrect solutions despite their advanced nature.
Addressing this challenge, a new research paper introduces a novel method called Equation Verification of Symbolic Solvers (EVoSS). This approach aims to significantly improve how LLMs tackle MWPs by combining their natural language understanding with the precision of external mathematical tools and a clever verification step.
The core idea behind EVoSS is multi-faceted. Initially, an LLM is prompted to break down a math word problem into simpler statements and then generate algebraic equations from this decomposition. These equations are then passed to an external symbolic solver, a dedicated mathematical tool, which calculates a precise answer. This step ensures that the mathematical computations are performed accurately, avoiding common arithmetic errors LLMs might make.
What truly sets EVoSS apart is its unique verification process, inspired by a long-standing recommendation from math teachers: always check your answer with an estimate. After obtaining a precise solution from the symbolic solver, the LLM is prompted a second time, but this time its goal is to estimate the correct answer rather than solving it exactly. This estimation is then compared against the precisely calculated answer. If the two values align within an acceptable margin, the solution is deemed correct. If they diverge significantly, it signals a potential error in the initial equation generation, triggering an iterative rectification process to find the right answer.
This innovative method has yielded impressive results, achieving new state-of-the-art performance on widely used datasets for numeric and algebraic MWPs. On average, EVoSS improved upon previous best results by nearly two percent. Furthermore, it successfully tackled trigonometric MWPs, a type of problem not previously explored by similar methods, demonstrating its versatility beyond basic arithmetic and algebra.
The researchers also contributed to the field by introducing two new datasets. One, named SV AMPClean, is a refined version of an existing dataset (SV AMP) where identified errors and ambiguities were corrected. The other, Trig300, is an entirely new dataset comprising 300 trigonometry-based questions. This dataset is designed to push the boundaries of LLM reasoning abilities, requiring more complex mathematical operations than simple addition, subtraction, multiplication, and division. For more details on this research, you can refer to the full paper here.
Also Read:
- CogAtom: Building Advanced Math Problems to Elevate AI Reasoning
- How Different Languages Enhance AI’s Mathematical Abilities
The EVoSS approach leverages the strengths of LLMs for understanding and decomposition, while offloading precise calculation to a reliable symbolic solver. The estimation verification acts as a crucial safeguard, catching errors and guiding the model towards accurate solutions. This blend of natural language processing, symbolic computation, and common-sense verification represents a significant step forward in enabling AI to solve complex mathematical challenges more reliably.


