TLDR: EGG-SR is a new framework that improves symbolic regression by using “equality graphs” (e-graphs) to recognize mathematically equivalent expressions. This prevents algorithms like Monte Carlo Tree Search (MCTS), Deep Reinforcement Learning (DRL), and Large Language Models (LLMs) from wasting time exploring redundant solutions. EGG-SR makes these algorithms learn faster, more stably, and discover more accurate scientific equations by treating equivalent expressions as one, reducing search space and variance.
A groundbreaking new framework, EGG-SR, is set to significantly advance the field of symbolic regression, a critical area in AI-driven scientific discovery. Symbolic regression aims to unearth fundamental physical laws from experimental data by identifying closed-form mathematical expressions. However, this task is notoriously challenging due to the immense and exponentially growing search space of potential equations.
The core innovation of EGG-SR lies in its ability to recognize and leverage symbolic equivalence. Often, many mathematically distinct expressions can represent the exact same function. For example, log(x1^2 * x2^3), log(x1^2) + log(x2^3), and 2 log(x1) + 3 log(x2) all describe the same underlying relationship. Traditional algorithms typically treat these variants as unique, leading to inefficient and redundant exploration of the search space, which slows down the learning process.
EGG-SR addresses this by integrating a powerful data structure known as equality graphs, or e-graphs, into various symbolic regression algorithms. E-graphs provide a compact and efficient way to represent sets of equivalent expressions by storing shared sub-expressions only once. This avoids the computational burden and memory overhead of explicitly enumerating and storing every possible equivalent variant.
The framework seamlessly integrates its EGG module into three diverse symbolic regression paradigms:
EGG-MCTS (Monte Carlo Tree Search)
For search tree-based methods, EGG-MCTS intelligently prunes redundant exploration. When the algorithm evaluates a particular path in its search tree, the EGG module identifies all other symbolically equivalent paths. The knowledge and rewards gained from the initial exploration are then simultaneously propagated to all these equivalent paths. This effectively reduces the ‘branching factor’ of the search, meaning fewer unique nodes need to be explored, leading to faster and more efficient learning.
EGG-DRL (Deep Reinforcement Learning)
In reward-driven learning, EGG-DRL aggregates rewards across equivalent expressions. When a deep reinforcement learning model samples an expression, EGG-SR generates its equivalent forms. The policy gradient estimator, which guides the model’s learning, then considers the probabilities of all these equivalent sequences together. This aggregation significantly reduces the variance of the gradient estimator, resulting in more stable and efficient training of the neural network.
Also Read:
- AI Agents Collaborate to Uncover New Scientific Machine Learning Methods
- FLEX: Enabling LLM Agents to Learn and Evolve Continuously from Experience
EGG-LLM (Large Language Models)
For approaches leveraging large language models, EGG-LLM enriches the feedback mechanism. LLMs often generate Python functions as candidate expressions. EGG-SR parses these into symbolic expressions, constructs e-graphs, and extracts a richer set of equivalent expressions. This comprehensive set of variants is then summarized and fed back into the LLM’s prompt for the next round of generation. This enhanced feedback provides the LLM with a deeper understanding of functional equivalence, guiding it to produce higher-quality and more accurate predictions.
The theoretical foundations of EGG-SR are robust. The paper demonstrates that EGG-MCTS achieves a tighter regret bound compared to standard MCTS, indicating a faster convergence to optimal solutions. Similarly, EGG-DRL is proven to yield an unbiased gradient estimator with a strictly reduced variance, ensuring more reliable and efficient policy updates.
Empirical evaluations across challenging benchmarks, including complex trigonometric functions and equations from the Feynman dataset, consistently show EGG-SR’s superiority. Algorithms enhanced with EGG discover equations with lower normalized mean squared error than existing state-of-the-art methods. Furthermore, case studies highlight the practical efficiency of the EGG module, demonstrating significantly lower memory consumption for storing equivalent expressions and introducing negligible computational overhead when integrated into DRL frameworks.
In essence, EGG-SR provides a unified and scalable solution for incorporating symbolic equivalence into various symbolic regression algorithms. By intelligently managing the vast search space through e-graphs, it accelerates learning and improves the accuracy of discovering governing equations from experimental data. The code implementation for EGG-SR is openly available for researchers and practitioners. Read the full paper here.


