spot_img
HomeResearch & DevelopmentAccelerating DeepONet Training with a Hybrid Optimization Strategy

Accelerating DeepONet Training with a Hybrid Optimization Strategy

TLDR: This research introduces a hybrid least squares/gradient descent (LSGD) method to significantly speed up the training of Deep Operator Networks (DeepONets). By treating the last layer parameters of the branch network as linear, they can be optimized using efficient least squares, while hidden layers are updated via gradient descent. The key innovation is decomposing the large least squares system into smaller, manageable subproblems. The proposed LS+Adam algorithm consistently outperforms traditional Adam-only training in speed and accuracy across various PDE problems, making DeepONets more practical for scientific computing.

Deep Operator Networks, or DeepONets, represent a significant advancement in the field of scientific computing, particularly for tackling complex problems governed by partial differential equations (PDEs). Unlike traditional deep neural networks that learn specific solutions, DeepONets are designed to learn entire solution operators, meaning they can map a wide range of input functions (like initial conditions or source terms) to corresponding output solution functions. This capability makes them powerful tools for creating fast surrogate models that can quickly generate solutions for various scenarios without needing to re-solve the PDE from scratch every time.

However, the sophisticated architecture of DeepONets, which effectively couples two neural networks—a ‘branch’ network for input functions and a ‘trunk’ network for output coordinates—comes with a challenge: training them can be a very time-consuming process. The computational cost is high due to this intricate structure, making it difficult to adopt existing techniques that accelerate the training of simpler deep neural networks.

A recent research paper, “Hybrid Least Squares/Gradient Descent Methods for DeepONets”, introduces an innovative approach to significantly speed up DeepONet training. The authors, Jun Choi, Chang-Ock Lee, and Minam Moon, propose a hybrid method that combines the strengths of least squares (LS) optimization with gradient descent (GD) techniques.

The Hybrid Approach: LS and GD Combined

The core idea behind this hybrid method is to recognize that the output of a DeepONet can be viewed as linear with respect to the parameters of the branch network’s final layer. This linearity allows these specific parameters to be optimized very efficiently using a least squares solve. Meanwhile, the remaining parameters in the hidden layers of both the branch and trunk networks, which are non-linear, are updated using standard gradient descent methods.

A major hurdle in applying a direct least squares approach is the sheer size of the resulting linear problem. If one were to build an LS system for all possible combinations of branch and trunk inputs, it would become prohibitively large and practically impossible to solve directly. To circumvent this, the researchers developed a clever factorization technique. Their method decomposes the massive LS system into two much smaller, more manageable subproblems—one for the branch network and one for the trunk network—which can then be solved separately and efficiently.

This method is also versatile, extending to a broader range of L2 loss functions, including those with regularization terms for the last layer parameters. This is particularly important for unsupervised learning scenarios that incorporate physics-informed loss, where physical laws are embedded directly into the training process.

LS+Adam: A Practical Algorithm

The paper introduces a practical algorithm called LS+Adam. This approach starts with an initial phase of training all parameters using the Adam optimizer, a popular gradient descent variant. This helps to ensure a good starting point and prevents the model from getting stuck in poor local minima early on. After this initial phase, the training switches to a hybrid stage where LS steps are used to optimize the last layer parameters, interspersed with Adam epochs for the hidden layer parameters. The LS step is applied periodically, for instance, after every few Adam epochs, balancing computational cost and convergence.

Also Read:

Demonstrated Performance Gains

The effectiveness of the LS+Adam method was rigorously tested across various PDE problems, including advection, diffusion-reaction, and 2D Poisson equations, covering both supervised and unsupervised learning settings. The experimental results consistently showed that LS+Adam significantly accelerates training and improves solution accuracy compared to traditional Adam-only training. For example, in many cases, LS+Adam achieved better performance in 10,000 “Work Units” (a cycle of Adam epochs followed by an LS step) than Adam-only achieved in 100,000 Work Units, demonstrating a substantial speed-up.

The errors in solutions generated by LS+Adam were often much smaller, and the method proved robust even when dealing with complex input functions like 2D images or variable boundary conditions. This research offers a powerful new tool for the efficient training of DeepONets, paving the way for their broader application as fast and accurate surrogates for solving complex scientific and engineering problems.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -