Accelerating DeepONet Training with a Hybrid Optimization Strategy

TLDR: This research introduces a hybrid least squares/gradient descent (LSGD) method to significantly speed up the training of Deep Operator Networks (DeepONets). By treating the last layer parameters of the branch network as linear, they can be optimized using efficient least squares, while hidden layers are updated via gradient descent. The key innovation is decomposing the large least squares system into smaller, manageable subproblems. The proposed LS+Adam algorithm consistently outperforms traditional Adam-only training in speed and accuracy across various PDE problems, making DeepONets more practical for scientific computing.

Deep Operator Networks, or DeepONets, represent a significant advancement in the field of scientific computing, particularly for tackling complex problems governed by partial differential equations (PDEs). Unlike traditional deep neural networks that learn specific solutions, DeepONets are designed to learn entire solution operators, meaning they can map a wide range of input functions (like initial conditions or source terms) to corresponding output solution functions. This capability makes them powerful tools for creating fast surrogate models that can quickly generate solutions for various scenarios without needing to re-solve the PDE from scratch every time.

However, the sophisticated architecture of DeepONets, which effectively couples two neural networks—a ‘branch’ network for input functions and a ‘trunk’ network for output coordinates—comes with a challenge: training them can be a very time-consuming process. The computational cost is high due to this intricate structure, making it difficult to adopt existing techniques that accelerate the training of simpler deep neural networks.

A recent research paper, “Hybrid Least Squares/Gradient Descent Methods for DeepONets”, introduces an innovative approach to significantly speed up DeepONet training. The authors, Jun Choi, Chang-Ock Lee, and Minam Moon, propose a hybrid method that combines the strengths of least squares (LS) optimization with gradient descent (GD) techniques.

The Hybrid Approach: LS and GD Combined

The core idea behind this hybrid method is to recognize that the output of a DeepONet can be viewed as linear with respect to the parameters of the branch network’s final layer. This linearity allows these specific parameters to be optimized very efficiently using a least squares solve. Meanwhile, the remaining parameters in the hidden layers of both the branch and trunk networks, which are non-linear, are updated using standard gradient descent methods.

A major hurdle in applying a direct least squares approach is the sheer size of the resulting linear problem. If one were to build an LS system for all possible combinations of branch and trunk inputs, it would become prohibitively large and practically impossible to solve directly. To circumvent this, the researchers developed a clever factorization technique. Their method decomposes the massive LS system into two much smaller, more manageable subproblems—one for the branch network and one for the trunk network—which can then be solved separately and efficiently.

This method is also versatile, extending to a broader range of L2 loss functions, including those with regularization terms for the last layer parameters. This is particularly important for unsupervised learning scenarios that incorporate physics-informed loss, where physical laws are embedded directly into the training process.

LS+Adam: A Practical Algorithm

The paper introduces a practical algorithm called LS+Adam. This approach starts with an initial phase of training all parameters using the Adam optimizer, a popular gradient descent variant. This helps to ensure a good starting point and prevents the model from getting stuck in poor local minima early on. After this initial phase, the training switches to a hybrid stage where LS steps are used to optimize the last layer parameters, interspersed with Adam epochs for the hidden layer parameters. The LS step is applied periodically, for instance, after every few Adam epochs, balancing computational cost and convergence.

Also Read:

Demonstrated Performance Gains

The effectiveness of the LS+Adam method was rigorously tested across various PDE problems, including advection, diffusion-reaction, and 2D Poisson equations, covering both supervised and unsupervised learning settings. The experimental results consistently showed that LS+Adam significantly accelerates training and improves solution accuracy compared to traditional Adam-only training. For example, in many cases, LS+Adam achieved better performance in 10,000 “Work Units” (a cycle of Adam epochs followed by an LS step) than Adam-only achieved in 100,000 Work Units, demonstrating a substantial speed-up.

The errors in solutions generated by LS+Adam were often much smaller, and the method proved robust even when dealing with complex input functions like 2D images or variable boundary conditions. This research offers a powerful new tool for the efficient training of DeepONets, paving the way for their broader application as fast and accurate surrogates for solving complex scientific and engineering problems.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Accelerating DeepONet Training with a Hybrid Optimization Strategy

The Hybrid Approach: LS and GD Combined

LS+Adam: A Practical Algorithm

Demonstrated Performance Gains

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates