Enhancing Zeroth-Order Optimization with Subspace-Based Curvature Estimation

TLDR: ZO-SAH is a new zeroth-order optimization method that significantly speeds up convergence by estimating the function’s curvature (Hessian) within randomly selected two-dimensional subspaces. It achieves this efficiently by fitting quadratic polynomials and reusing function evaluations, outperforming existing methods on various machine learning tasks, including deep neural network training, with fewer function queries.

In the world of machine learning, optimizing models often relies on knowing how a function changes, typically through its gradient. However, there are many real-world situations where this “gradient information” is either impossible or impractical to get. Think of complex simulations in physics or chemistry, or using black-box AI models through third-party services where you can’t see their internal workings. This is where “zeroth-order optimization” comes in, offering a way to improve models without direct access to gradients.

Understanding Zeroth-Order Optimization

Traditional optimization methods, like those used for training deep learning models, depend on first-order derivatives (gradients) to guide the search for optimal parameters. When these gradients are unavailable, zeroth-order (ZO) methods step in. They approximate derivatives by observing how the function output changes when small perturbations are made to the input. While many existing ZO methods focus on these first-order approximations, incorporating “second-order” information, which describes the curvature of the function (like the Hessian matrix), can significantly speed up the optimization process.

The Challenge of Second-Order Information

The problem with using second-order information in ZO optimization is its high cost. Estimating the Hessian matrix accurately typically requires a very large number of function evaluations, especially in high-dimensional problems. This can make the process prohibitively expensive and slow. Previous attempts, like the ZO-Hessian aware (ZOHA) algorithm, aimed to reduce these costs but still required substantial evaluations. Another method, HiZOO, simplified the problem by only approximating the diagonal elements of the Hessian, which is more efficient but struggles with functions where parameters are highly interdependent (anisotropic functions).

Introducing ZO-SAH: A Novel Approach

Researchers from POSTECH have introduced a new zeroth-order optimization algorithm called the Subspace-based Approximate Hessian (ZO-SAH) method. This innovative approach tackles the high cost of Hessian estimation by focusing on randomly selected, low-dimensional subspaces. Specifically, ZO-SAH restricts its operations to two-dimensional subspaces, dramatically reducing the computational and function evaluation complexity. This allows the algorithm to leverage valuable curvature information without the prohibitive costs associated with full Hessian estimation.

How ZO-SAH Works

At the heart of ZO-SAH is its strategy for estimating the Hessian within these two-dimensional subspaces. It does this by fitting a quadratic polynomial to the objective function and then extracting its second-order coefficients. This method provides a more stable estimate of the Hessian compared to traditional coordinate-wise finite differences. A clever “periodic subspace-switching strategy” is also employed, which reuses function evaluations from previous optimization steps. This caching mechanism further enhances efficiency by minimizing the number of new function queries required. By focusing on these small subspaces, ZO-SAH can capture important parameter interdependencies that diagonal-only methods miss, leading to more effective search directions.

Also Read:

Experimental Validation

The effectiveness of ZO-SAH was rigorously tested on eight benchmark datasets, including tasks like logistic regression and training deep neural networks (ResNet8 and ResNet20 on CIFAR10 and CIFAR100). The results were compelling: ZO-SAH consistently achieved significantly faster convergence than existing zeroth-order methods. For instance, in deep neural network training, ZO-SAH reached the same loss levels as DeepZero (a leading ZO method for deep learning) with approximately 50% fewer function queries on CIFAR10 and 29% fewer on CIFAR100. Furthermore, ZO-SAH also achieved higher test accuracy, demonstrating that its approach not only speeds up training but also maintains or improves generalization performance.

This research marks a significant step forward in making second-order information more accessible and practical for zeroth-order optimization problems. For more technical details, you can refer to the full research paper: Subspace-based Approximate Hessian Method for Zeroth-Order Optimization.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Zeroth-Order Optimization with Subspace-Based Curvature Estimation

Understanding Zeroth-Order Optimization

The Challenge of Second-Order Information

Introducing ZO-SAH: A Novel Approach

How ZO-SAH Works

Experimental Validation

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates