TLDR: Goedel-Prover-V2 is a new series of open-source AI models that achieve state-of-the-art performance in automated theorem proving. It uses scaffolded data synthesis, verifier-guided self-correction, and model averaging to efficiently generate and refine formal mathematical proofs. Despite being significantly smaller, Goedel-Prover-V2 models outperform much larger predecessors on benchmarks like MiniF2F and PutnamBench, demonstrating a breakthrough in computational efficiency and accuracy for formal reasoning.
A new series of open-source language models, Goedel-Prover-V2, has been introduced, marking a significant advancement in automated theorem proving. These models are designed to construct step-by-step, machine-verifiable proofs in formal languages like Lean, a task that demands rigorous logical flow and has historically been a major challenge for AI systems.
Key Innovations Driving Performance
Goedel-Prover-V2 stands out due to three core innovations that enhance its ability to tackle complex mathematical theorems:
First, Scaffolded Data Synthesis involves generating synthetic tasks of increasing difficulty. This method trains the model to master progressively more complex theorems by providing it with a structured learning path, starting from simpler problems and gradually moving to harder ones. This approach helps the model build foundational skills before attempting advanced proofs.
Second, Verifier-Guided Self-Correction allows the model to iteratively refine its proofs. By leveraging immediate feedback from the Lean compiler—a tool that checks the correctness of formal proofs—the model can identify errors and revise its attempts. This mimics how human mathematicians refine their work, leading to more accurate and robust proofs.
Third, Model Averaging is employed to maintain diversity in the model’s outputs. In the later stages of training, models can sometimes become too specialized, reducing their ability to explore different valid proof paths. By merging multiple model checkpoints, Goedel-Prover-V2 mitigates this issue, ensuring a broader range of problem-solving strategies.
Unprecedented Performance and Efficiency
The performance of Goedel-Prover-V2 is particularly impressive given its relatively small size. The Goedel-Prover-V2-8B model, with only 8 billion parameters, achieves an 84.6% pass@32 on the MiniF2F benchmark. This performance surpasses DeepSeek-Prover-V2-671B, a model that is 80 times larger, under the same metric. This demonstrates a remarkable leap in computational efficiency.
The flagship model, Goedel-Prover-V2-32B, further pushes the boundaries, achieving 88.1% on MiniF2F at pass@32 in standard mode and an even higher 90.4% in self-correction mode. This significantly outperforms previous state-of-the-art models, including the 72B Kimina-Prover and the 671B DeepSeek-Prover-V2, while using substantially fewer parameters.
On the more challenging PutnamBench, Goedel-Prover-V2-32B solves 86 problems at pass@184 with self-correction, securing the top spot among open-source models. This more than doubles the 47 problems solved by DeepSeek-Prover-V2-671B, highlighting Goedel-Prover-V2’s superior capability on complex, college-level mathematics problems.
The consistent gains from verifier-guided self-correction, adding approximately 2 percentage points in accuracy on MiniF2F and solving 14 more problems on PutnamBench, underscore the effectiveness of integrating Lean compiler feedback into the proof revision process.
Also Read:
- ThinkingF: A New Approach to Automate Mathematical Formalization with AI
- Geoint-R1: A New Era for AI in Formal Geometric Problem Solving
Open-Source and Future Impact
At the time of its release (July–August 2025), Goedel-Prover-V2 achieves the strongest overall performance among all open-source theorem provers. Its models, code, and data are openly available, fostering community collaboration and accelerating progress in AI systems capable of reliably solving and verifying complex mathematical problems. This initiative aims to bridge the long-standing divide between intuitive human reasoning and formal proof verification.
For more technical details, you can refer to the original research paper: Goedel-Prover-V2 Research Paper.


