spot_img
HomeResearch & DevelopmentAI Models Advance Mathematical Proofs and Discoveries

AI Models Advance Mathematical Proofs and Discoveries

TLDR: A new research paper introduces a protocol using GPT-5 models as collaborative provers and verifiers for mathematical theorems. This system, which includes a crucial human verification step, successfully solved five out of six 2025 International Mathematical Olympiad problems, closed 16 previously open conjectures in number theory, and discovered new theorems in machine learning, graph theory, and sphere packing. The methodology addresses LLM hallucinations by ensuring formal proof correctness through human-certified semantic equivalence.

A recent research paper titled “Mathematics with large language models as provers and verifiers” by Hieu Le Duc and Leo Liberti explores a novel approach to mathematical theorem proving using advanced artificial intelligence. The paper details a protocol that leverages large language models (LLMs) to tackle complex mathematical problems, including those from the International Mathematical Olympiad (IMO) and open conjectures in number theory.

The core of their methodology involves a sophisticated interaction between different instances of the GPT-5 model, acting as collaborative ‘provers’ and ‘verifiers’. This system, designed with minimal human intervention, aims to overcome the inherent challenges of LLMs, such as the tendency to ‘hallucinate’ or produce incorrect information. The protocol, termed Test-Time Verify-Revise (TTVR), mimics the human process of problem-solving: proposing a solution, checking it, correcting it, and repeating until a valid proof is found.

In this TTVR loop, a ‘prover’ LLM proposes a proof, which is then evaluated by a ‘verifier’ LLM. If the proof is rejected, the verifier provides feedback on the issues, which the prover uses to revise its attempt. This iterative process continues for a set number of times. Once a proof is accepted by the verifier, it is formalized into a proof assistant language, specifically Lean. A crucial final step involves a human verifying that the formal statement of the theorem (premises and conclusion) aligns perfectly with its natural language description. This human check, though minimal, is vital for guaranteeing the correctness of the formal proof, even if the LLM might have experienced hallucinations during its generation process.

The researchers highlight that this protocol ensures the validity of the formal proof, even if the natural language proof or the initial formalization contained errors due to LLM hallucinations. The human’s role is to bridge the semantic gap between natural language and formal logic, ensuring that what the LLM *thinks* it proved matches the actual mathematical statement.

Also Read:

Impressive Achievements

The system demonstrated remarkable capabilities across various mathematical domains. In the 2025 International Mathematical Olympiad, the protocol successfully solved five out of six problems, matching the performance of specialized LLMs developed by major AI labs. This is particularly noteworthy as the GPT-5 model used was trained on data only up to 2024, meaning it had no prior exposure to the 2025 IMO problems.

Beyond competitive problems, the system also made significant strides in number theory. It closed twenty-two out of sixty-six open conjectures on cyclic numbers from a recent paper, with sixteen of these being previously unsettled. The paper provides an example of a complex proof for the “Third Firoozbakht conjecture on cyclics” that the system managed to derive.

Furthermore, the research introduced an advanced “research mode” where the system, given a broad mathematical goal, could discover and prove new theorems. This mode involves additional LLM agents like a literature reviewer, context preparer, and predictor. In this mode, the system explored areas such as improving gradient descent in machine learning, the 1-2-3 conjecture in graph theory, and sphere packing in Euclidean spaces. It successfully settled several conjectures in these fields, proving some and refuting others, showcasing its ability to contribute to active research areas.

The paper also delves into the practical aspects of implementing such a system, discussing the use of OpenAI’s API and the careful crafting of prompts to define the behavior of the prover and verifier agents. It acknowledges challenges such as the probabilistic nature of LLM outputs, which affects reproducibility, and the time-consuming nature of human verification for converting natural language proofs into formal Lean code. The authors suggest that more intensive LLM training on formal code could alleviate this bottleneck.

This work represents a significant step forward in the application of large language models to formal mathematics, demonstrating their potential not only as problem-solvers but also as tools for discovering and verifying new mathematical knowledge. For more details, you can read the full paper here.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -