spot_img
HomeResearch & DevelopmentTHOR: Bridging LLM Reasoning and Precise Computation for Math...

THOR: Bridging LLM Reasoning and Precise Computation for Math Problems

TLDR: THOR is a new framework that enhances Large Language Models’ (LLMs) mathematical reasoning by integrating external tools. It addresses key challenges in tool-integrated reasoning (TIR) through three main components: TIRGen, a data pipeline for creating high-quality tool-use data; a hierarchical reinforcement learning strategy that optimizes both overall problem-solving and specific code generation steps; and a self-correction mechanism during inference that uses immediate tool feedback to fix errors. THOR achieves state-of-the-art performance on various mathematical and code benchmarks, demonstrating strong generalization and efficiency.

Large Language Models (LLMs) have shown incredible advancements in many areas, including mathematical reasoning. However, they often struggle with tasks requiring high precision, such as complex numerical calculations or formal symbolic manipulations. This is where integrating external tools, like code interpreters, becomes crucial to bridge the gap between LLM’s reasoning capabilities and the need for exact computation.

Despite recent progress in combining LLMs with tools, researchers have faced three main hurdles: creating high-quality datasets for tool-integrated reasoning, optimizing models at a very detailed level, and improving how models use tools during inference. A new framework called THOR (Tool-Integrated Hierarchical Optimization via RL) has been proposed to tackle these challenges.

Building Better Tool-Integrated Data with TIRGen

One of THOR’s core innovations is TIRGen, a multi-agent pipeline designed to construct high-quality datasets of tool-integrated reasoning paths. Think of it as a collaborative effort between two AI agents: an ‘Actor’ that generates natural language reasoning steps, and a ‘Critic’ that identifies which of these steps can be solved using code. The Critic then converts these parts into executable Python code, runs it, and uses the precise results to refine the reasoning path. This iterative process creates a dataset that is well-aligned with how the model actually thinks and uses tools, making it highly effective for training.

Hierarchical Learning for Precision and Problem Solving

THOR introduces a sophisticated reinforcement learning (RL) strategy for fine-grained optimization. The key insight here is that the success of an intermediate tool call is a strong indicator of whether the final answer will be correct. Based on this, THOR optimizes on two levels:

  • Trajectory-level Optimization: This focuses on the overall problem-solving ability, rewarding the model for generating correct final answers to mathematical problems.
  • Step-level Optimization: This is a more granular approach, specifically targeting and correcting errors in code generation steps. If a tool call fails, the model learns to improve its code generation for similar situations, directly enhancing its precision.

Self-Correction for Robust Inference

During the inference phase (when the model is solving new problems), THOR incorporates a self-correction mechanism. If a tool call fails, the model doesn’t just give up. Instead, it uses the immediate feedback from the failed execution to dynamically revise its reasoning path. It can backtrack to the problematic step and regenerate a new reasoning suffix and action, exploring alternative solutions until a successful path is found. This significantly boosts the model’s robustness and overall performance, ensuring it can recover from errors on the fly.

Also Read:

Impressive Performance and Generalization

THOR has been rigorously evaluated on a wide range of challenging mathematical benchmarks, including MATH500, AIME, AMC, Minerva Math, and Olympiad Bench. It has achieved state-of-the-art performance among models of comparable size, demonstrating strong generalization across both reasoning and non-reasoning models. Furthermore, THOR also shows consistent improvements on code generation benchmarks like HumanEval and MBPP, validating its versatility across different reasoning domains. The framework also manages to reduce inference overhead, making it computationally efficient.

The researchers behind THOR are making their code publicly available, which you can find at https://github.com/JingMog/THOR. This work represents a significant step forward in enabling LLMs to tackle complex mathematical problems with both advanced reasoning and precise computation.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -