TLDR: A new library, confopt, and a benchmark suite, DARTS-Bench-Suite, are introduced to improve the development and evaluation of gradient-based one-shot Neural Architecture Search (NAS) methods. The research highlights critical flaws in current NAS assessment, showing that method rankings are inconsistent across different search spaces and are heavily influenced by hyperparameter choices, emphasizing the need for more comprehensive and unbiased evaluation protocols.
Neural Architecture Search, or NAS, is a field dedicated to automating the design of neural network architectures. Over the past decade, it has significantly matured, moving from computationally intensive methods like reinforcement learning and evolutionary search to more efficient gradient-based approaches. A pivotal development in this area was Differentiable Architecture Search (DARTS), which dramatically reduced the time needed to explore architectural possibilities.
However, the world of gradient-based one-shot NAS faces significant hurdles. One major issue is the heavy reliance on the DARTS benchmark for evaluating new methods. This has led to a situation where reported improvements often fall within the margin of noise, making it difficult to truly assess progress. Furthermore, the implementations of these methods are scattered across many different repositories, making fair comparisons and further development a complex task.
To address these challenges, researchers have introduced Configurable Optimizer, or confopt, an extensible library designed to streamline the development and evaluation of gradient-based one-shot NAS methods. Confopt offers a simple interface, making it easy for users to integrate new search spaces. It also allows for the breakdown of NAS optimizers into their core components, fostering better understanding and development. The code for this innovative library can be found at this link.
A key contribution of this work is the creation of DARTS-Bench-Suite, a collection of new benchmarks derived from the original DARTS search space. This suite aims to provide a more comprehensive evaluation environment. It consists of nine distinct benchmarks, each an instantiation of the DARTS search space with varying configurations, including different candidate operations, network depth, and width. While these benchmarks retain large and expressive search spaces, their associated “supernets” (large networks encompassing all possible architectures) are significantly more efficient to train.
The paper also proposes a novel evaluation protocol to address existing flaws in how NAS methods are assessed. Traditionally, a supernet is trained on a dataset, and then the derived discrete architecture is re-trained from scratch on the same training data. This setup makes it hard to determine if performance is due to the architecture’s inherent quality or simply its prior exposure to the data. The new protocol tackles this by splitting the training dataset into two halves: one for supernet training and the other for training the final discrete models. This ensures the final architecture is evaluated on unseen data from the same distribution.
Moreover, to minimize confounding factors like hyperparameter selection, each discovered architecture is trained using nine different hyperparameter configurations. This approach provides a more unbiased assessment of an architecture’s intrinsic quality, independent of specific tuning. The research also makes the target network match the size of the supernet, which helps in evaluating the NAS method itself without the complication of how well a smaller “proxy” supernet predicts the performance of a much larger final model.
The experiments conducted using confopt and DARTS-Bench-Suite revealed important insights. They evaluated seven NAS optimizers across the nine new benchmarks and found that their rankings differed substantially across these settings. This highlights a critical need for more comprehensive evaluation beyond the single, original DARTS search space. Furthermore, the choice of hyperparameters used to train the final model significantly impacted the rankings of the NAS methods, suggesting that careful consideration of these settings is crucial for fair comparisons.
Also Read:
- Advancing AI’s Ability to Generalize: A New Method for Consistent Learning Across Diverse Data Domains
- Bridging the Gap: Neural Architecture Search Unlocks High Performance in Bio-Inspired AI
In conclusion, this work introduces confopt as a valuable tool for the NAS community, enabling easier implementation and comparison of gradient-based one-shot methods. By demonstrating the inconsistencies in method performance across varied search spaces and the impact of hyperparameters, the researchers call for the community to design more robust search spaces and develop novel methods that achieve truly significant improvements, moving beyond the limitations of current evaluation practices.


