TLDR: A new research paper introduces Partial Label Learning (PLL) as a framework for learning-guided Automated Theorem Proving (ATP), specifically addressing how to effectively use multiple alternative proofs. It demonstrates that PLL methods, particularly Libra-loss and 0.5-merit-loss, significantly improve the performance of theorem provers like plCoP by learning jointly from all available proofs, outperforming traditional single-proof or MCTS-imitation strategies.
Automated Theorem Proving (ATP) is a field of artificial intelligence focused on developing computer programs that can prove mathematical theorems or logical statements. This task is incredibly challenging because the space of possible derivations, or steps to a proof, can grow exponentially, making it difficult for systems to find a valid proof efficiently.
Traditionally, ATP systems rely on heuristics to navigate this vast search space. More recently, Machine Learning (ML) has emerged as a powerful tool to enhance these systems, particularly in guiding the proof search. One common challenge in training these ML models is dealing with the fact that a single theorem can often have multiple valid proofs. How to best leverage these alternative proofs during the learning process has been an open question.
A recent research paper, “Partial Label Learning for Automated Theorem Proving,” introduces a novel approach by formulating learning-guided ATP as a Partial Label Learning (PLL) problem. PLL is a subfield of machine learning that deals with situations where each training example is associated with a set of possible labels, only one of which is the true label, but the exact true label is unknown. In the context of theorem proving, this means that for a given theorem, there might be several known proofs, and the learning system needs to figure out how to best use all of them, even if it doesn’t know which one is the ‘best’ or ‘true’ proof.
The authors, Zsolt Zombori and Balázs Indruck, highlight that while ATP might seem like an unusual fit for PLL due to proofs being sequential objects, the infinite nature of possible derivations, and the concept of multiple ‘true’ proofs, these challenges can be effectively addressed. The paper demonstrates that methods from the PLL literature can significantly improve the performance of learning-assisted theorem provers.
The research utilizes the plCoP theorem prover to conduct experiments. plCoP employs a learning method called Expert Iteration, where an ‘Expert’ system (guided Monte Carlo Tree Search) searches for proofs, and an ‘Apprentice’ system (a neural network policy model) learns to imitate the Expert’s successful strategies. The core idea is to abstract away from the specific search statistics and instead view the discovered proofs and failed attempts as a partial PLL dataset.
The paper explores several prominent PLL loss functions for training the policy model, including Negative Log Likelihood (NLL) loss, Uniform loss, β-Meritocratic loss, and Libra-loss. Each of these methods handles the uncertainty of multiple possible proofs differently. For instance, NLL-loss tends to focus on a single, most probable proof, while Uniform-loss aims for an even distribution of probability among all allowed proofs. Libra-loss and β-Meritocratic loss attempt to strike a balance between these extremes.
Experiments on standard ATP benchmarks like M2K, MPTP2078, and RA-2 datasets show compelling results. The PLL methods consistently yield better performance compared to the baseline, which imitates Monte Carlo Tree Search statistics or relies on training with only a single, often the shortest, proof. Specifically, Libra-loss and 0.5-merit-loss emerged as the best performers, demonstrating an improvement of 14-28% over the baseline. In a more extensive comparison, Libra-loss even outperformed previous state-of-the-art guided MCTS systems on the leanCoP prover by 7%.
Also Read:
- A Call for Transparency in AI Formal Reasoning Benchmarks
- Unstructured Reasoning Outperforms Structured Approaches in LLMs for Complex Problem Solving
The findings suggest that learning jointly from all available proofs, rather than selecting a single one or simply imitating search statistics, is a more effective strategy for guiding theorem provers. This work builds a crucial bridge between Partial Label Learning and Automated Theorem Proving, offering new theoretical frameworks and practical tools for developing more powerful and efficient theorem proving systems. For more details, you can read the full paper here.


