TLDR: A new research paper introduces the concept of “inner-instance distribution shift” in time series data, where statistical properties change within individual data instances. To address this, two novel point-level normalization methods, Learning Distribution (LD) and Learning Conditional Distribution (LCD), are proposed. These methods adapt normalization parameters at each time step, significantly improving the accuracy and stability of various time series forecasting models by effectively mitigating both inner-instance and inter-instance distribution shifts.
Time series forecasting, a crucial task in fields ranging from finance to meteorology, often grapples with a fundamental challenge: non-stationarity. This phenomenon describes how the statistical properties of data change over time, leading to what are known as distribution shifts. These shifts can significantly impair the accuracy and reliability of forecasting models.
While existing normalization techniques have aimed to mitigate these distribution shifts by separating instance-specific characteristics from global features, a recent research paper titled Inner-Instance Normalization for Time Series Forecasting identifies a critical, previously overlooked issue: distribution shifts occurring within individual data instances. This “inner-instance shift” means that even within a single sequence of data, statistical properties can vary at different time points, a complexity that traditional, instance-level normalization methods struggle to address effectively.
To confront this challenge, authors Zipo Jibao, Yingyi Fu, Xinyang Chen, and Guoting Chen introduce two innovative point-level normalization methods: Learning Distribution (LD) and Learning Conditional Distribution (LCD). These approaches are designed to adapt normalization parameters at a much finer granularity, specifically at each individual time point within a data instance.
Learning Distribution (LD)
The LD method focuses on independently learning the internal distributions of both the input and target sequences. It begins by applying a standard z-score normalization to handle shifts between different data instances. The key innovation lies in introducing learnable parameters, represented by matrices A and B, which fit the internal distribution at each time step. This allows LD to dynamically adjust to the unique statistical characteristics at every point in time, rather than applying a uniform adjustment across an entire sequence. After the backbone forecasting model generates its output, LD denormalizes it by reintroducing the learned expectations for each time step and the instance-specific statistics that were initially removed. This comprehensive process ensures that the model’s input and output are free from both inter-instance (between different sequences) and inner-instance (within a single sequence) distribution shifts.
Experimental results demonstrated that LD significantly enhanced the performance of various backbone models, including Informer, N-BEATS, and SCINet, across diverse datasets. It consistently outperformed RevIN, a prominent instance-level normalization model, particularly in datasets exhibiting high non-stationarity, such as the Exchange dataset. This underscores LD’s capability to adapt to the dynamic changes within data instances.
Learning Conditional Distribution (LCD)
The LCD method takes a different approach by learning the conditional distribution of the target sequence given the input sequence. Its core principle involves centering the sequence and then scaling the centered values with distinct coefficients for each time point. This is achieved by predicting a future mean (μy) to center the target and a scaling coefficient matrix (S) to perform the scaling. These predictions are implemented using neural networks, which can be either linear (LCD-linear) or based on attention scores (LCD-as).
By predicting these fine-grained scaling coefficients and means for each time point, LCD effectively mitigates inner-instance shifts within the prediction horizon. This allows the final predictions to conform to different distributions at various time steps, even if the raw outputs from the backbone model initially exhibit a uniform distribution. LCD also addresses inter-space shifts, which occur between the input and target sequences.
Evaluations of LCD-linear and LCD-as with models such as DLinear, PatchTST, and iTransformer showed consistent performance improvements across all tested datasets. Notably, LCD-linear reduced the error of state-of-the-art models like SAN by approximately 10%, while also offering greater ease of integration due to its single-stage training process. The point-level nature of both LD and LCD proved superior to traditional instance-level and slice-level normalization techniques, confirming the advantages of a more granular approach to handling distribution shifts.
Also Read:
- CATS-Linear: A New Approach to Time Series Forecasting with Adaptive Linear Models
- Unpacking Sequence Model Design: A Unified View Through Coefficient Dynamics
Conclusion
The introduction of the “inner-instance distribution shift” concept and the development of point-level normalization methods (LD and LCD) mark a significant advancement in time series forecasting. These lightweight, plug-and-play frameworks are efficient and can be readily integrated into existing or future forecasting models. By effectively addressing distribution shifts at a granular, time-point level, they offer substantial improvements in prediction accuracy for complex, real-world time series data, thereby paving the way for more robust and reliable forecasting applications.


