TLDR: The paper “Combining Textual and Structural Information for Premise Selection in Lean” introduces a graph-augmented approach to improve premise selection in Lean theorem proving. It combines dense text embeddings with graph neural networks (GNNs) over a heterogeneous dependency graph that captures state–premise and premise–premise relations. This method significantly outperforms the ReProver language-based baseline by over 25% across standard retrieval metrics on the LeanDojo Benchmark, demonstrating the power of incorporating relational information for more effective premise selection.
In the world of automated theorem proving, a critical challenge known as premise selection involves identifying the most relevant theorems and definitions from vast libraries to help construct new proofs. This task is a significant hurdle for scaling up theorem proving in large formal systems like Lean.
Traditional language-based methods, such as those employing large language models (LLMs) like ReProver, often treat premises in isolation. They map proof states and premises into a shared digital space using text embeddings, then retrieve relevant premises based on similarity. However, these methods frequently overlook the intricate web of dependencies and relationships that connect premises within formal libraries. These structural connections, which include references and dependencies among axioms, theorems, lemmas, and definitions, offer valuable prior knowledge that can guide the proof process more effectively.
A new research paper, titled “Combining Textual and Structural Information for Premise Selection in Lean,” introduces an innovative graph-augmented approach to address this limitation. Authored by Job Petrovˇciˇc, David Eliecer Narvaez Denis, and Ljupˇco Todorovski, the study proposes integrating structural information into language-based premise retrieval for Lean 4. You can read the full paper here: Combining Textual and Structural Information for Premise Selection in Lean.
The Graph-Augmented Approach
The core of this new methodology lies in constructing a heterogeneous dependency graph from Lean source files. This graph captures two main types of relationships: state–premise relations (how proof states relate to premises) and premise–premise relations (how different premises depend on each other). By merging proof states and premises into a single set of nodes and treating references as edges, the Lean library is transformed into a rich, text-attributed directed graph.
The process begins by generating initial text embeddings for premises and proof states using ReProver’s ByT5 dual-encoder. These initial embeddings are then refined using a Relational Graph Convolutional Network (RGCN). The RGCN propagates information across the dependency graph, allowing each premise node’s embedding to be updated iteratively by considering its connections to other nodes. This propagation helps create “graph-aware” representations of both premises and proof states.
During retrieval, the proof state is treated as a temporary query node, and a similar message-passing mechanism is used to produce refined embeddings for the states by aggregating information from referenced premises. This ensures that the model considers the structural context when determining premise relevance.
Enhanced Data Extraction and Training
To support this graph-augmented approach, the researchers extended LeanDojo’s dataset extraction framework. This allowed them to query the proof assistant for additional information needed to build the comprehensive dependency graph, including premise-premise and premise-state edges. The dataset now includes not only textual representations but also the full directed graph structure.
The model is trained using the InfoNCE loss, which helps it distinguish between relevant (positive) and irrelevant (negative) premises for each proof state. To prevent trivial memorization and encourage the learning of general structural patterns, the GNN is trained on the full premise graph, excluding edges from proof relations.
Significant Performance Improvements
The experiments were conducted using the LeanDojo Mathlib benchmark dataset with the newly augmented graph information. The results demonstrated a substantial improvement over the ReProver language-based baseline. The GNN-augmented retriever outperformed the baseline by over 25% across standard retrieval metrics such as Recall@1, Recall@10, and Mean Reciprocal Rank (MRR).
Specifically, the GNN-augmented retriever achieved a Recall@1 of 17.98% (a 33.98% improvement over the baseline), a Recall@10 of 50.04% (a 26.36% improvement), and an MRR of 0.4095 (a 24.73% improvement). Further optimization using Exponential Model Averaging (EMA) pushed these numbers even higher, showcasing the power of incorporating relational information for more effective premise selection.
Also Read:
- Bridging the Gap: A New Benchmark for Research-Level Mathematical Proofs
- ProofSketch: Smarter, More Efficient Reasoning for Large Language Models
Future Directions
The success of this graph-augmented method highlights the significant benefits of considering the structural context in formal libraries. Future work in this area could explore more advanced GNN architectures, such as graph attention networks, to further enhance the integration of structural information. Additionally, analyzing the impact of the number of GNN layers and evaluating the model on more realistic data splits (like LeanDojo’s “novel” split or splits based on premise creation time) could provide further insights and improvements.


