TLDR: This research introduces CASH+, an extension of AutoML for optimizing complex machine learning pipelines that include fine-tuning and ensembling, not just hyperparameter optimization. It proposes PS-PFN, a new method that uses prior-data fitted networks (PFNs) and in-context learning to efficiently select and adapt these diverse pipelines by modeling their varied performance and cost characteristics. Experimental results show PS-PFN outperforms existing methods, offering a more flexible and effective approach to modern AutoML challenges.
Automated Machine Learning, or AutoML, has been a game-changer for building powerful machine learning models. Traditionally, a key challenge in AutoML has been Combined Algorithm Selection and Hyperparameter Optimization (CASH), which involves picking the best machine learning algorithm and fine-tuning its settings for a specific task. However, with the rise of advanced pre-trained models, modern machine learning workflows are becoming much more complex. They often involve not just hyperparameter tuning, but also fine-tuning, combining multiple models (ensembling), and other specialized adaptation techniques.
The core problem remains: how do we find the best-performing model for a given task? But the increasing variety and complexity of these ML pipelines demand new approaches to AutoML. This is where a new framework called CASH+ comes in. It extends the traditional CASH framework to handle the selection and adaptation of these modern, diverse ML pipelines.
The researchers propose a new method called PS-PFN, which stands for Posterior Sampling using Prior-Data Fitted Networks. This method is designed to efficiently explore and optimize these complex ML pipelines. It tackles the problem as a “Max K-armed Bandit” problem, which is like having several slot machines (each representing an ML pipeline) and trying to figure out which one will give the highest payout over time, especially when the payouts can change and vary significantly.
PS-PFN uses a clever technique called “prior-data fitted networks” (PFNs). Think of PFNs as smart, pre-trained neural networks that can quickly learn from a small amount of data to estimate the potential performance of each ML pipeline. This “in-context learning” allows PS-PFN to make quick and informed decisions about which pipeline to try next, even when the performance patterns are unusual or change over time. Unlike older methods that assume all pipelines behave similarly, PS-PFN is flexible enough to handle pipelines with very different performance characteristics and even varying costs (like how long it takes to run a particular optimization step).
The paper highlights three main challenges that PS-PFN addresses: heterogeneous reward distributions (different pipelines perform very differently), changes in reward over time (a pipeline might improve or its performance pattern might shift), and varying costs associated with running different pipelines. By using PFNs, PS-PFN can model these complex behaviors effectively.
Experimental results on both new and existing benchmark tasks show that PS-PFN performs better than other common bandit and AutoML strategies. This indicates a significant step forward in making AutoML systems more adaptable and efficient for the increasingly diverse world of machine learning. The code and data for this research are openly available for others to use and build upon. You can find more details in the full research paper: In-Context Decision Making for Optimizing Complex AutoML Pipelines.
Also Read:
- Advancing AI’s Problem-Solving: A Dual Approach to Heuristic Design
- Boosting Search System Training with Smarter Synthetic Data Generation
While PS-PFN offers superior performance, the authors acknowledge some limitations. It can be computationally more expensive than simpler methods, especially for very fast optimization steps. Also, its theoretical analysis is complex due to the use of synthetic data generation and machine learning models to approximate distributions. However, for typical AutoML tasks where individual optimization steps are already time-consuming, the benefits of PS-PFN often outweigh these costs.


