Unpacking Neural Architecture Search Evaluation with Confopt

TLDR: A new library, confopt, and a benchmark suite, DARTS-Bench-Suite, are introduced to improve the development and evaluation of gradient-based one-shot Neural Architecture Search (NAS) methods. The research highlights critical flaws in current NAS assessment, showing that method rankings are inconsistent across different search spaces and are heavily influenced by hyperparameter choices, emphasizing the need for more comprehensive and unbiased evaluation protocols.

Neural Architecture Search, or NAS, is a field dedicated to automating the design of neural network architectures. Over the past decade, it has significantly matured, moving from computationally intensive methods like reinforcement learning and evolutionary search to more efficient gradient-based approaches. A pivotal development in this area was Differentiable Architecture Search (DARTS), which dramatically reduced the time needed to explore architectural possibilities.

However, the world of gradient-based one-shot NAS faces significant hurdles. One major issue is the heavy reliance on the DARTS benchmark for evaluating new methods. This has led to a situation where reported improvements often fall within the margin of noise, making it difficult to truly assess progress. Furthermore, the implementations of these methods are scattered across many different repositories, making fair comparisons and further development a complex task.

To address these challenges, researchers have introduced Configurable Optimizer, or confopt, an extensible library designed to streamline the development and evaluation of gradient-based one-shot NAS methods. Confopt offers a simple interface, making it easy for users to integrate new search spaces. It also allows for the breakdown of NAS optimizers into their core components, fostering better understanding and development. The code for this innovative library can be found at this link.

A key contribution of this work is the creation of DARTS-Bench-Suite, a collection of new benchmarks derived from the original DARTS search space. This suite aims to provide a more comprehensive evaluation environment. It consists of nine distinct benchmarks, each an instantiation of the DARTS search space with varying configurations, including different candidate operations, network depth, and width. While these benchmarks retain large and expressive search spaces, their associated “supernets” (large networks encompassing all possible architectures) are significantly more efficient to train.

The paper also proposes a novel evaluation protocol to address existing flaws in how NAS methods are assessed. Traditionally, a supernet is trained on a dataset, and then the derived discrete architecture is re-trained from scratch on the same training data. This setup makes it hard to determine if performance is due to the architecture’s inherent quality or simply its prior exposure to the data. The new protocol tackles this by splitting the training dataset into two halves: one for supernet training and the other for training the final discrete models. This ensures the final architecture is evaluated on unseen data from the same distribution.

Moreover, to minimize confounding factors like hyperparameter selection, each discovered architecture is trained using nine different hyperparameter configurations. This approach provides a more unbiased assessment of an architecture’s intrinsic quality, independent of specific tuning. The research also makes the target network match the size of the supernet, which helps in evaluating the NAS method itself without the complication of how well a smaller “proxy” supernet predicts the performance of a much larger final model.

The experiments conducted using confopt and DARTS-Bench-Suite revealed important insights. They evaluated seven NAS optimizers across the nine new benchmarks and found that their rankings differed substantially across these settings. This highlights a critical need for more comprehensive evaluation beyond the single, original DARTS search space. Furthermore, the choice of hyperparameters used to train the final model significantly impacted the rankings of the NAS methods, suggesting that careful consideration of these settings is crucial for fair comparisons.

Also Read:

In conclusion, this work introduces confopt as a valuable tool for the NAS community, enabling easier implementation and comparison of gradient-based one-shot methods. By demonstrating the inconsistencies in method performance across varied search spaces and the impact of hyperparameters, the researchers call for the community to design more robust search spaces and develop novel methods that achieve truly significant improvements, moving beyond the limitations of current evaluation practices.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unpacking Neural Architecture Search Evaluation with Confopt

Gen AI News and Updates

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates