Leveraging Structural Data for Better Theorem Proving in Lean

TLDR: The paper “Combining Textual and Structural Information for Premise Selection in Lean” introduces a graph-augmented approach to improve premise selection in Lean theorem proving. It combines dense text embeddings with graph neural networks (GNNs) over a heterogeneous dependency graph that captures state–premise and premise–premise relations. This method significantly outperforms the ReProver language-based baseline by over 25% across standard retrieval metrics on the LeanDojo Benchmark, demonstrating the power of incorporating relational information for more effective premise selection.

In the world of automated theorem proving, a critical challenge known as premise selection involves identifying the most relevant theorems and definitions from vast libraries to help construct new proofs. This task is a significant hurdle for scaling up theorem proving in large formal systems like Lean.

Traditional language-based methods, such as those employing large language models (LLMs) like ReProver, often treat premises in isolation. They map proof states and premises into a shared digital space using text embeddings, then retrieve relevant premises based on similarity. However, these methods frequently overlook the intricate web of dependencies and relationships that connect premises within formal libraries. These structural connections, which include references and dependencies among axioms, theorems, lemmas, and definitions, offer valuable prior knowledge that can guide the proof process more effectively.

A new research paper, titled “Combining Textual and Structural Information for Premise Selection in Lean,” introduces an innovative graph-augmented approach to address this limitation. Authored by Job Petrovˇciˇc, David Eliecer Narvaez Denis, and Ljupˇco Todorovski, the study proposes integrating structural information into language-based premise retrieval for Lean 4. You can read the full paper here: Combining Textual and Structural Information for Premise Selection in Lean.

The Graph-Augmented Approach

The core of this new methodology lies in constructing a heterogeneous dependency graph from Lean source files. This graph captures two main types of relationships: state–premise relations (how proof states relate to premises) and premise–premise relations (how different premises depend on each other). By merging proof states and premises into a single set of nodes and treating references as edges, the Lean library is transformed into a rich, text-attributed directed graph.

The process begins by generating initial text embeddings for premises and proof states using ReProver’s ByT5 dual-encoder. These initial embeddings are then refined using a Relational Graph Convolutional Network (RGCN). The RGCN propagates information across the dependency graph, allowing each premise node’s embedding to be updated iteratively by considering its connections to other nodes. This propagation helps create “graph-aware” representations of both premises and proof states.

During retrieval, the proof state is treated as a temporary query node, and a similar message-passing mechanism is used to produce refined embeddings for the states by aggregating information from referenced premises. This ensures that the model considers the structural context when determining premise relevance.

Enhanced Data Extraction and Training

To support this graph-augmented approach, the researchers extended LeanDojo’s dataset extraction framework. This allowed them to query the proof assistant for additional information needed to build the comprehensive dependency graph, including premise-premise and premise-state edges. The dataset now includes not only textual representations but also the full directed graph structure.

The model is trained using the InfoNCE loss, which helps it distinguish between relevant (positive) and irrelevant (negative) premises for each proof state. To prevent trivial memorization and encourage the learning of general structural patterns, the GNN is trained on the full premise graph, excluding edges from proof relations.

Significant Performance Improvements

The experiments were conducted using the LeanDojo Mathlib benchmark dataset with the newly augmented graph information. The results demonstrated a substantial improvement over the ReProver language-based baseline. The GNN-augmented retriever outperformed the baseline by over 25% across standard retrieval metrics such as Recall@1, Recall@10, and Mean Reciprocal Rank (MRR).

Specifically, the GNN-augmented retriever achieved a Recall@1 of 17.98% (a 33.98% improvement over the baseline), a Recall@10 of 50.04% (a 26.36% improvement), and an MRR of 0.4095 (a 24.73% improvement). Further optimization using Exponential Model Averaging (EMA) pushed these numbers even higher, showcasing the power of incorporating relational information for more effective premise selection.

Also Read:

Future Directions

The success of this graph-augmented method highlights the significant benefits of considering the structural context in formal libraries. Future work in this area could explore more advanced GNN architectures, such as graph attention networks, to further enhance the integration of structural information. Additionally, analyzing the impact of the number of GNN layers and evaluating the model on more realistic data splits (like LeanDojo’s “novel” split or splits based on premise creation time) could provide further insights and improvements.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Leveraging Structural Data for Better Theorem Proving in Lean

The Graph-Augmented Approach

Enhanced Data Extraction and Training

Significant Performance Improvements

Future Directions

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates