TLDR: ZO-SAH is a new zeroth-order optimization method that significantly speeds up convergence by estimating the function’s curvature (Hessian) within randomly selected two-dimensional subspaces. It achieves this efficiently by fitting quadratic polynomials and reusing function evaluations, outperforming existing methods on various machine learning tasks, including deep neural network training, with fewer function queries.
In the world of machine learning, optimizing models often relies on knowing how a function changes, typically through its gradient. However, there are many real-world situations where this “gradient information” is either impossible or impractical to get. Think of complex simulations in physics or chemistry, or using black-box AI models through third-party services where you can’t see their internal workings. This is where “zeroth-order optimization” comes in, offering a way to improve models without direct access to gradients.
Understanding Zeroth-Order Optimization
Traditional optimization methods, like those used for training deep learning models, depend on first-order derivatives (gradients) to guide the search for optimal parameters. When these gradients are unavailable, zeroth-order (ZO) methods step in. They approximate derivatives by observing how the function output changes when small perturbations are made to the input. While many existing ZO methods focus on these first-order approximations, incorporating “second-order” information, which describes the curvature of the function (like the Hessian matrix), can significantly speed up the optimization process.
The Challenge of Second-Order Information
The problem with using second-order information in ZO optimization is its high cost. Estimating the Hessian matrix accurately typically requires a very large number of function evaluations, especially in high-dimensional problems. This can make the process prohibitively expensive and slow. Previous attempts, like the ZO-Hessian aware (ZOHA) algorithm, aimed to reduce these costs but still required substantial evaluations. Another method, HiZOO, simplified the problem by only approximating the diagonal elements of the Hessian, which is more efficient but struggles with functions where parameters are highly interdependent (anisotropic functions).
Introducing ZO-SAH: A Novel Approach
Researchers from POSTECH have introduced a new zeroth-order optimization algorithm called the Subspace-based Approximate Hessian (ZO-SAH) method. This innovative approach tackles the high cost of Hessian estimation by focusing on randomly selected, low-dimensional subspaces. Specifically, ZO-SAH restricts its operations to two-dimensional subspaces, dramatically reducing the computational and function evaluation complexity. This allows the algorithm to leverage valuable curvature information without the prohibitive costs associated with full Hessian estimation.
How ZO-SAH Works
At the heart of ZO-SAH is its strategy for estimating the Hessian within these two-dimensional subspaces. It does this by fitting a quadratic polynomial to the objective function and then extracting its second-order coefficients. This method provides a more stable estimate of the Hessian compared to traditional coordinate-wise finite differences. A clever “periodic subspace-switching strategy” is also employed, which reuses function evaluations from previous optimization steps. This caching mechanism further enhances efficiency by minimizing the number of new function queries required. By focusing on these small subspaces, ZO-SAH can capture important parameter interdependencies that diagonal-only methods miss, leading to more effective search directions.
Also Read:
- Deep Learning Unlocks High-Dimensional Stochastic Control with Neural Hamiltonian Operators
- Unlocking Global Optimization: A New GPU-Powered Search Method for Complex Functions
Experimental Validation
The effectiveness of ZO-SAH was rigorously tested on eight benchmark datasets, including tasks like logistic regression and training deep neural networks (ResNet8 and ResNet20 on CIFAR10 and CIFAR100). The results were compelling: ZO-SAH consistently achieved significantly faster convergence than existing zeroth-order methods. For instance, in deep neural network training, ZO-SAH reached the same loss levels as DeepZero (a leading ZO method for deep learning) with approximately 50% fewer function queries on CIFAR10 and 29% fewer on CIFAR100. Furthermore, ZO-SAH also achieved higher test accuracy, demonstrating that its approach not only speeds up training but also maintains or improves generalization performance.
This research marks a significant step forward in making second-order information more accessible and practical for zeroth-order optimization problems. For more technical details, you can refer to the full research paper: Subspace-based Approximate Hessian Method for Zeroth-Order Optimization.


