TLDR: This paper introduces a unified framework for interpreting time series forecasts using LIME and SHAP, two model-agnostic explanation techniques. It converts univariate time series into a leakage-free supervised learning problem and applies these methods to an ARIMA model and a gradient-boosted tree, using the Air Passengers dataset as a case study. The research demonstrates that the twelve-month lag and seasonal encodings are the primary drivers of forecast variance, providing a robust methodology for achieving both accuracy and interpretability in time series forecasting.
Time-series forecasting is a crucial tool across many industries, from predicting airline passenger demand to managing energy consumption and monitoring public health. These forecasts help businesses make informed decisions, but there’s often a trade-off: highly accurate models can be complex and difficult to understand, while simpler, more transparent models might not be as precise.
A recent research paper, “Interpreting Time Series Forecasts with LIME and SHAP: A Case Study on the Air Passengers Dataset”, addresses this challenge by proposing a unified framework to interpret time-series forecasts. The paper focuses on two popular model-agnostic explanation techniques: Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP). These methods help shed light on why a model makes a particular prediction, even if the model itself is a complex ‘black box’.
Bridging the Gap: Accuracy and Interpretability
The core problem in time-series forecasting is that while advanced machine learning models like gradient-boosted decision trees can capture intricate patterns, their inner workings are often opaque. Domain experts, however, need to understand the reasoning behind predictions to trust them, troubleshoot issues, and make critical adjustments. Time-series data presents unique interpretability challenges due to its sequential nature, requiring careful handling to avoid ‘data leakage’ where future information inadvertently influences past predictions.
The researchers tackled this by transforming a single time series into a supervised learning problem. This involves creating features from past observations, such as lagged values (e.g., passenger counts from the previous month or year), rolling statistics (like a 12-month rolling mean), and seasonal encodings (using sine and cosine transforms of the month). This ensures that predictions for any given time point only rely on information available up to that point.
The Air Passengers Case Study
The study used the well-known Air Passengers dataset, which records monthly international airline passenger totals from 1949 to 1960. This dataset is ideal because it exhibits clear trends and strong yearly seasonality. The paper compared two forecasting models: a traditional statistical model called Seasonal ARIMA (SARIMA) and a machine learning model, XGBoost, which is a type of gradient-boosted tree.
How LIME and SHAP Provide Insights
LIME provides ‘local’ explanations, meaning it explains a single prediction by creating a simpler, interpretable model around that specific data point. Imagine trying to understand why a forecast for July 1959 was made; LIME would highlight which features were most influential for that particular month’s prediction.
SHAP, on the other hand, provides ‘global’ explanations by calculating the contribution of each feature to the prediction across many instances. It’s based on cooperative game theory, assigning a value to each feature that represents its impact on the model’s output. The paper used permutation SHAP to estimate these values, which involves randomly shuffling features to see how much the prediction changes.
Key Findings: What Drives Forecasts?
The analysis revealed that the twelve-month lag (the passenger count from exactly one year prior) was by far the most dominant factor in explaining forecast variance. This strongly confirms the powerful yearly seasonality in airline passenger traffic. Other important factors included the one-month lag (short-term persistence) and seasonal encodings. Rolling statistics contributed modestly to the predictions.
In terms of accuracy, the XGBoost model performed slightly better than the ARIMA baseline, though the difference was not statistically significant. This suggests that while machine learning models can offer a slight edge in performance, their interpretability can be effectively unlocked using LIME and SHAP, providing valuable insights without sacrificing accuracy.
Also Read:
- UniCast: Enhancing Time Series Forecasting with Visual and Textual Context
- Holistic Explainable AI: A New Framework for End-to-End AI Transparency
Implications for Practitioners
This research offers a robust methodology for applying LIME and SHAP to time series data, ensuring that the explanations respect the temporal order of observations. It provides practical guidelines for practitioners looking to understand and trust their time-series forecasting models. By understanding which features drive predictions, businesses can gain deeper insights into the underlying dynamics of their data, leading to better decision-making and more reliable forecasts.


