TLDR: FACTORS is a new framework that combines Design of Experiments with Shapley decomposition to systematically identify optimal machine learning training configurations. It estimates main and two-factor interaction effects, integrating them into a risk- and cost-adjusted objective function. The framework offers interpretable insights through effect maps, enabling reliable selection of configurations under budget constraints and improving performance stability across diverse datasets.
In the dynamic world of machine learning, the continuous adjustment of training factors like learning rate, batch size, and optimizers is a common, yet often costly, endeavor. These adjustments are crucial because the performance, stability, and reproducibility of machine learning models are highly sensitive to how these factors are combined. Traditional trial-and-error tuning methods consume significant time and resources, while regulatory frameworks increasingly demand reliability and accountability from AI systems.
Addressing these challenges, researchers Dongseok Kim, Wonjun Jeong, and Gisung Oh from Gachon University have introduced a novel framework called FACTORS. This innovative approach aims to transform the complex process of model tuning from a speculative exercise into a systematic, interpretable, and risk-aware decision-making process. You can find the full research paper here.
What is FACTORS?
FACTORS is designed to provide a clear understanding of how different training factors influence model performance. It achieves this by combining two powerful techniques: Design of Experiments (DOE) and Shapley decomposition. The core idea is to reliably estimate the individual impact of each factor (main effects) and how pairs of factors interact with each other (two-factor interactions). These insights are then integrated into a special objective function that considers not only expected performance gains but also the associated risks and costs, allowing for the selection of optimal configurations within a predefined budget.
How Does It Work?
The framework employs two complementary methods for estimating these effects:
-
Cell-mean (CM) Path: This is a straightforward approach that directly calculates effects based on the average performance observed in different experimental settings. It’s computationally light and good for initial broad exploration.
-
SHAP-fit (SF) Path: This method uses Shapley values, a concept from game theory, to fairly attribute the contribution of each factor to the model’s output. These contributions are then used to reconstruct the main and interaction effects through a least-squares fitting process. The SF path is more robust to imbalances in experimental data.
Both paths are designed to work together, even when the experimental data might be uneven or biased. FACTORS also incorporates techniques for standardizing estimates, correcting for potential biases, and quantifying uncertainty, ensuring that comparisons across different factors and experimental designs are fair and reliable.
Once the effects are estimated, FACTORS defines an objective function that balances performance, uncertainty (risk), and cost. A lightweight search algorithm, based on coordinate ascent, then efficiently explores the configuration space to identify the best settings that maximize this objective function while adhering to operational constraints.
Key Contributions and Benefits
The researchers highlight several significant contributions:
-
Actionable Insights: Instead of just explaining what a model does, FACTORS provides concrete prescriptions for adjusting design variables to improve performance.
-
Risk-Aware Tuning: By explicitly accounting for uncertainty and cost, the framework helps reduce decision-making risks, especially under budget constraints.
-
Interpretability: It summarizes main effects and interactions in easy-to-understand visual maps, guiding users on which factors to prioritize for adjustment and identifying safe pathways for improvement.
-
Theoretical Guarantees: The framework is backed by rigorous theoretical analysis, including error decompositions and sample complexity analysis, ensuring its reliability.
Empirical Validation and Real-World Impact
FACTORS was rigorously tested using synthetic datasets and public benchmarks, including UCI Concrete, UCI Car Evaluation, and Fashion-MNIST. The results consistently demonstrated that incorporating two-factor interactions significantly improves the accuracy of identifying optimal configurations and preserves the ranking of factor importance, leading to more stable performance gains.
For instance, in the UCI Concrete dataset, the learning rate emerged as the most critical factor, followed by the optimizer (Adam consistently outperformed SGD), batch size, and the number of epochs. L2 regularization, surprisingly, had a negligible impact. Similar patterns were observed across other datasets, with learning rate and batch size often being primary determinants.
An ablation study further confirmed the importance of including two-factor approximations for accurate rank preservation and optimal configuration identification. It also showed that the SHAP-fit path is robust even with partial or biased experimental designs.
Interpreting the Performance Maps
The visual maps generated by FACTORS are a powerful tool for interpretability. Main-effect plots show the average impact of each factor level, revealing monotonic trends, plateaus, or reversals. Interaction plots, on the other hand, highlight how pairs of factors deviate from purely additive effects, indicating when factors should be tuned together rather than independently.
These visualizations help practitioners quickly identify which factors have the steepest slopes (highest priority for tuning), which pairs exhibit strong interactions (requiring careful joint adjustment), and which factors have flat effects (can be fixed for simplicity). This approach condenses thousands of potential configurations into a handful of actionable insights, making the tuning process more efficient and transparent.
Also Read:
- Unveiling Causal Importance: A New Approach to Evaluating Explainable AI for Boolean Logic
- Benchmarking AI Systems as a Learning Problem with FlexBench
Limitations and Future Directions
While powerful, FACTORS acknowledges its limitations. It primarily focuses on main and two-factor interactions, meaning higher-order interactions (involving three or more factors) might remain as residuals. The researchers suggest methods to detect these cases and selectively expand checks when necessary. Discretizing continuous factors can also introduce distortions, which can be mitigated through techniques like smoothing or monotonic constraints.
Despite these, FACTORS offers a robust and interpretable foundation for optimizing machine learning training factors. It provides a clear path to achieving reproducible performance gains under budget constraints, with natural extensions towards more complex modeling and adaptive experimental designs in the future.


