Balancing Accuracy and Robustness: A Deep Dive into Demand Forecasting Evaluation Functions

TLDR: This research paper compares two evaluation functions for demand forecasting models: FMAE and HEF. FMAE focuses on minimizing mean absolute errors and is computationally efficient, while HEF, a hierarchical multi-metric function, prioritizes explanatory power, global accuracy, and robustness against large errors. Experiments show HEF consistently outperforms FMAE in global metrics, making it suitable for long-term strategic planning, whereas FMAE is more efficient for short-term operational applications due to its focus on average error and faster execution.

Demand forecasting is a crucial element for businesses to plan effectively, manage resources, and adapt to market changes. However, predicting future demand, especially for multiple products over time, is complex due to fluctuating data, inherent uncertainties, and sudden market shifts. Traditional methods often rely on single evaluation metrics, which can sometimes lead to biased results and limit how well a model performs in real-world situations.

A recent research paper, Hierarchical Evaluation Function (HEF): A Multi-Metric Approach for Optimizing Demand Forecasting Models, by Adolfo González and Víctor Parada, delves into this challenge by comparing two specialized evaluation functions: FMAE (Focused Mean Absolute Error) and HEF (Hierarchical Evaluation Function). The study aims to find better ways to optimize demand forecasting models, ensuring they are more accurate and robust.

Understanding the Evaluation Functions

The paper highlights that relying on a single metric like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE) can be limiting. While MAE is good for understanding average errors and is less affected by extreme values, RMSE heavily penalizes larger errors. Neither, however, provides a complete picture of a model’s performance, especially when comparing across different types of data or business goals.

To address this, the researchers propose and evaluate two functions:

FMAE (Focused Mean Absolute Error): This function is straightforward, focusing on minimizing the average absolute difference between predicted and actual values. It’s computationally simple and effective when the primary goal is to control the average error.
HEF (Hierarchical Evaluation Function): This is a more sophisticated approach. HEF combines three key metrics: R² (Coefficient of Determination), MAE, and RMSE. R² measures how well the model explains the variability in the data, while MAE and RMSE focus on error magnitude. HEF also includes a system of progressive penalties for large errors or illogical predictions (like negative demand forecasts), making it more robust. It even adapts its tolerance thresholds based on the variability of the data, meaning it’s stricter for stable demand patterns and more lenient for highly volatile ones.

The Experiment and Key Findings

The researchers conducted extensive experiments using various demand forecasting models, from traditional statistical methods like ARIMA to modern machine learning techniques such as XGBoost and deep neural networks like LSTM. They tested these models across different datasets (Walmart, M3, M4, M5) and with various data splits for training and testing (91:9, 80:20, and 70:30). To optimize the models, they used three different hyperparameter optimizers: Grid Search, Particle Swarm Optimization (PSO), and Optuna (based on Bayesian optimization).

The results showed a clear and consistent pattern, regardless of the data split or the optimizer used:

HEF’s Strengths: HEF consistently outperformed FMAE in global metrics. This includes R² (indicating better explanatory power), Global Relative Accuracy (measuring overall cumulative accuracy), RMSE, and RMSSE (both sensitive to and penalizing large errors). This means models optimized with HEF were better at explaining the overall trends in demand and were more robust against significant forecasting errors.
FMAE’s Strengths: FMAE maintained advantages in local metrics like MAE and MASE (Mean Absolute Scaled Error), which focus on average absolute errors. It also generally resulted in shorter execution times, making it more computationally efficient.

A crucial finding was that the improvements observed with HEF were directly attributable to the design of the evaluation function itself, not to the specific optimization method employed. Statistical tests confirmed these differences were highly significant, ruling out chance as a factor.

Also Read:

Choosing the Right Tool for the Job

The study concludes that there’s a clear trade-off between the two evaluation functions. HEF is the more robust choice for strategic business planning and long-term forecasting, where understanding the overall explanatory power of the model and minimizing the impact of large, potentially costly errors is paramount. Its ability to adapt to data volatility and penalize undesirable predictions makes it ideal for complex, uncertain environments.

On the other hand, FMAE is more efficient for short-term operational applications or in situations where computational simplicity and strict control over average errors are the main priorities. It’s a practical option for environments with limited resources or where quick, consistent average error reduction is key.

Ultimately, the research provides a flexible framework for optimizing predictive models in dynamic settings, emphasizing that the choice of evaluation function should align directly with the specific objectives and context of the demand forecasting task.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Balancing Accuracy and Robustness: A Deep Dive into Demand Forecasting Evaluation Functions

Understanding the Evaluation Functions

The Experiment and Key Findings

Choosing the Right Tool for the Job

Gen AI News and Updates

Food Delivery Giant Swiggy Fuels Growth with AI-Powered Logistics, Eyes Profitability

Advancing 3D Point Cloud Generation: New Metrics and a Transformer Model

Africa’s FMCG Sector: AI and Data Analytics Spark a New Era of Supply Chain Excellence

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates