TLDR: KELPS is a novel neuro-symbolic framework that translates informal mathematics into machine-verifiable theorems in multiple formal languages (Lean, Coq, Isabelle). It introduces Knowledge Equations (KEs) as an intermediate representation, enabling high accuracy and multi-language support. KELPS uses an iterative data generation process, combining expert annotation with synthetic data created from a mathematical ontology, resulting in a dataset of over 60,000 problems. The framework achieves 88.9% syntactic accuracy on MiniF2F, outperforming existing state-of-the-art models.
In the world of mathematics, translating informal ideas into precise, machine-verifiable theorems has long been a complex challenge. While advanced AI models, known as large language models (LLMs), have shown promise in this area, they often struggle due to a shortage of high-quality, diverse training data that connects natural language with formal mathematical languages.
A new neuro-symbolic framework called KELPS, which stands for Knowledge-Equation based Logical Processing System, has been introduced to tackle these very issues. Developed by Jiyao Zhang, Chengli Zhong, Hui Xu, Qige Li, and Yi Zhou, KELPS offers an innovative approach to autoformalization, the process of automatically converting informal mathematical statements into formal, verifiable code.
Understanding KELPS: A Multi-Stage Approach
KELPS operates as an iterative system, designed to translate, synthesize, and refine informal mathematical data into multiple formal languages, including Lean, Coq, and Isabelle. At its core, KELPS introduces a novel intermediate language called Knowledge Equations (KEs). This language is theoretically rooted in assertional logic and acts as a crucial bridge between natural language and the complex syntax of formal systems.
The framework functions in three main stages:
- Semantic Parsing: First, natural language mathematical problems are translated into Knowledge Equations. This step involves understanding the meaning of the informal statement and representing it in the structured KE format.
- Syntactic Alignment: Once in the KE format, these equations are then converted into the desired target formal languages (Lean, Coq, Isabelle) using carefully defined rules. These rules are designed to preserve both the structure and meaning of the original statement. Compilers for these formal languages then check the correctness of the generated code.
- Semantic Verification: Even if the formal statements compile successfully, KELPS goes a step further by performing semantic checks to ensure the meaning is truly preserved. This stage leverages an LLM-as-a-Judge framework to assess the alignment between the formal statement and the original natural language, assigning a score based on consistency.
Building a Robust Dataset
A significant challenge in autoformalization is the lack of large, high-quality datasets. KELPS addresses this by generating a massive parallel corpus of over 60,000 problems. This was achieved through a combination of strategies:
- Data Synthesis: KELPS includes a module that efficiently generates diverse formalization data. This is done by combining mathematical concepts and operators from a predefined ontology, which includes 40 core concepts and 180 operators covering K-12 and undergraduate-level mathematics. This allows for precise control over the complexity and variety of generated problems.
- Expert Iteration: The framework also employs an iterative process where an initial model, fine-tuned on a smaller set of manually annotated examples, automatically processes unannotated data. The outputs are then syntactically and semantically checked, with validated results continuously expanding the training set. This process allowed the model to process over 50,000 validated samples through multiple refinement cycles.
Impressive Performance
The KELPS framework has demonstrated superior performance compared to existing models. On the MiniF2F benchmark, a widely used dataset for autoformalization, KELPS achieved an 88.9% syntactic accuracy (pass@1). This outperforms state-of-the-art models such as Deepseek-V3 (81%) and Herald (81.3%). The framework also shows robust performance across multiple datasets and maintains consistent accuracy when translating into different formal languages, unlike some other models that show significant drops in performance across languages.
Also Read:
- Balancing Logic and Scale: New Grounding Methods for Neural-Symbolic AI
- Leanabell-Prover-V2: Advancing Formal Theorem Proving with AI Self-Correction
The Future of Formal Mathematics
The introduction of Knowledge Equations as an intermediate representation is a key innovation, simplifying the translation process and improving accuracy. The large-scale, multi-language dataset created by KELPS also provides a valuable resource for future research in the field. The authors believe that Knowledge Equations have the potential to become a universal, machine-learnable language for mathematics, offering new avenues for both education and advanced research in theorem proving.
For more technical details, you can refer to the full research paper available at arXiv:2507.08665.


