TLDR: A new learning-augmented algorithm, NeuralSum-of-Squares, uses a Transformer model to predict compact monomial bases for Sum of Squares (SOS) polynomial certification. This approach drastically reduces the size of Semidefinite Programs (SDPs), achieving over 100x speedups compared to state-of-the-art solvers and solving previously intractable problems. It combines efficient training data generation, a Transformer for basis prediction, and a systematic fallback mechanism to ensure correctness and robustness.
Certifying whether a polynomial is always non-negative is a fundamental problem in mathematics and computer science, with wide-ranging applications in fields like optimization, control systems, and robotics. However, this task is notoriously difficult, classified as NP-hard. A common approach involves checking if a polynomial can be written as a “Sum of Squares” (SOS) of other polynomials. While this provides a sufficient condition for non-negativity, the computational cost of verifying the SOS property can be immense, often requiring the solution of large Semidefinite Programs (SDPs).
The primary bottleneck in this process is selecting an appropriate set of basic polynomial terms, known as a monomial basis. The size of this basis directly impacts the complexity of the SDP. Traditional methods for choosing this basis can lead to very large SDPs, making the problem intractable for many real-world scenarios.
Introducing NeuralSum-of-Squares
A groundbreaking new research paper, titled “NEURALSUM-OF-SQUARES: CERTIFYING THE NON-NEGATIVITY OF POLYNOMIALS WITH TRANSFORMERS,” introduces the first learning-augmented algorithm to tackle this challenge. Developed by Nico Pelleriti, Christoph Spiegel, Shiwei Liu, David Mart´ınez-Rubio, Max Zimmer, and Sebastian Pokutta, this approach leverages the power of Transformer models, a type of neural network, to significantly speed up the SOS certification process.
The core idea is to use a Transformer model to predict an “almost-minimal” monomial basis for a given polynomial. By identifying a much smaller, yet sufficient, set of monomials, the size of the resulting SDP is drastically reduced, leading to substantial computational savings.
How It Works: A Three-Part System
The NeuralSum-of-Squares methodology is built upon three key components:
- Efficient Training Data Generation: To train the Transformer, the researchers created an extensive dataset of over 100 million SOS polynomials. This was achieved through a clever “reverse sampling” process, where polynomials with known compact SOS decompositions and minimal bases were constructed. This ensures the model learns from high-quality, relevant examples.
- Transformer Architecture Design and Training: A specialized Transformer model was designed and trained. It takes a polynomial, represented as a sequence of tokens (coefficients and exponents), and predicts a compact monomial basis. The model learns to identify the essential building blocks needed for an SOS decomposition.
- Systematic Fallback Mechanism: Machine learning predictions are not always perfect. If the Transformer’s initial predicted basis is incomplete (missing crucial monomials), the system employs a robust repair strategy. This involves iteratively expanding the basis by adding more monomials, ranked by a learned scoring mechanism, until a valid SOS decomposition is found. This ensures the algorithm maintains correctness guarantees, meaning it will always find an SOS certificate if one exists, or correctly identify if none does.
Impressive Performance Gains
The researchers rigorously validated their approach on over 200 benchmark datasets, demonstrating remarkable improvements over existing state-of-the-art solvers. The NeuralSum-of-Squares method achieved speedups of over 100 times, and in some cases, over 2000 times, compared to traditional methods. Crucially, it enabled the solution of complex instances that competing approaches failed to solve due to memory limitations or excessive computation time.
For example, in problems with 6 variables and degree 40, or 100 variables and degree 10, all baseline solvers either ran out of memory or timed out. In contrast, NeuralSum-of-Squares successfully found solutions in under 60 seconds. The method consistently produced the smallest bases and fastest solve times, highlighting its efficiency.
The study also showed that the Transformer model is robust to “distribution shifts,” meaning it can generalize well to polynomials with characteristics different from those it was trained on. This is largely attributed to its “monomial-embedding” technique, which captures structural patterns that extend beyond the specific training data.
Also Read:
- Ax-Prover: Advancing Automated Theorem Proving Across Science with LLM Agents
- Adaptive Control: Learning Uncertainty Metrics for Smarter, More Robust Systems
Theoretical Soundness and Future Directions
Beyond its practical efficiency, the algorithm comes with theoretical guarantees. The analysis shows that even if the machine learning predictions are completely incorrect, the worst-case computational cost of NeuralSum-of-Squares is at most a constant factor higher than the standard approach. This provides a strong foundation for its reliability.
While the results are highly promising, the authors acknowledge a limitation: the reliance on synthetic datasets for evaluation due to the absence of widely available real-world SOS benchmarks. Access to such benchmarks would further strengthen empirical validation.
This work represents a significant leap forward in SOS programming, bridging the gap between practical efficiency and mathematical rigor. By transforming the computational bottleneck of basis selection, NeuralSum-of-Squares opens new avenues for tackling complex polynomial non-negativity problems across various scientific and engineering disciplines. For more details, you can read the full research paper here.


