TLDR: PENGUIN is a novel Transformer-based model that significantly improves long-term time series forecasting by explicitly modeling periodic patterns and incorporating a periodic-nested group attention mechanism. It outperforms existing MLP-based and Transformer-based models across diverse benchmarks, demonstrating enhanced accuracy and computational efficiency, even with missing or incorrect periodic information.
Long-term time series forecasting (LTSF) is a critical task with widespread applications across various fields, including finance, traffic management, and healthcare. Accurate predictions of future values are essential for informed decision-making. While Transformer-based models have achieved remarkable success in many sequence-based tasks, their effectiveness in LTSF has been a subject of ongoing debate, with some simpler linear models even outperforming them.
Introducing PENGUIN: A Novel Approach to Time Series Forecasting
A new research paper, titled PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting, revisits the core of Transformer models – the self-attention mechanism – and proposes a simple yet highly effective enhancement. Developed by Tian Sun, Yuqi Chen, and Weiwei Sun, PENGUIN (Periodic-Nested Group Attention) highlights the importance of explicitly modeling periodic patterns and incorporating a relative attention bias for more effective time series modeling.
The key innovation in PENGUIN lies in its ability to directly capture periodic structures. Time series data often exhibit recurring patterns, such as daily or weekly cycles, which traditional attention mechanisms struggle to capture over long periods. PENGUIN addresses this by introducing a periodic-nested relative attention bias. Furthermore, to handle multiple coexisting periodicities, the model employs a grouped attention mechanism. Each group is specifically designed to target a particular periodicity, utilizing a multi-query attention mechanism for improved efficiency.
How PENGUIN Works
PENGUIN’s architecture begins by transforming time series data into channel-independent ‘patch’ representations, which helps in capturing both local and long-term information. It also incorporates a technique called Reversible Instance Normalization (Revin) to handle shifts in data distribution, enhancing the model’s robustness. The core of PENGUIN is its unique attention mechanism within the Transformer encoder. This mechanism uses two types of attention biases:
-
Non-Periodic Bias: For cases where periodic information is absent or less prominent, PENGUIN can still define an effective linear bias based on the relative positional distance between data points. This helps the model focus on local context while retaining access to long-range information.
-
Periodic Bias: This is where PENGUIN truly shines. By adopting a periodic-nested attention bias, it adeptly captures the cyclical attributes of time series data. The model can leverage known periodic information (e.g., daily, weekly cycles) and adjust its attention based on these cycles, even after the data has been transformed into patches.
To further boost efficiency, PENGUIN replaces standard multi-head attention with Grouped Query Attention (GQA), allowing keys and values to be shared across queries within an attention group. This design makes the model computationally lighter without sacrificing performance.
Also Read:
- Advancing Long-Term Spatio-Temporal Prediction with Multiscale Mamba Models
- STPFormer: A New Approach to Smarter Traffic Prediction
Impressive Performance and Robustness
Extensive experiments across nine diverse benchmark datasets demonstrate that PENGUIN consistently outperforms both MLP-based and other Transformer-based models. It achieves significant overall improvements, surpassing state-of-the-art MLP models like CycleNet and leading Transformer models like CATS in terms of Mean Squared Error (MSE).
PENGUIN also shows superior performance compared to existing decomposition approaches such as Autoformer and FEDformer, underscoring the benefit of explicitly modeling periodic information. Its robustness was tested by varying input lengths and by intentionally introducing missing or incorrect periodic information. Even in challenging scenarios, PENGUIN maintained strong performance, highlighting its ability to capture temporal dependencies effectively.
The research also indicates PENGUIN’s extendability, showing significant improvements when integrated into both decoder-only and encoder-decoder Transformer architectures. Furthermore, PENGUIN is highly efficient, requiring fewer parameters and Multiply-Accumulate Operations (MACs) compared to other leading models, making it a computationally attractive solution for LTSF.
In conclusion, PENGUIN represents a significant step forward in long-term time series forecasting. By intelligently combining a periodic linear bias with a grouped query attention structure, it enables Transformer models to capture diverse periodic patterns while maintaining temporal causality, setting a new benchmark for accuracy and efficiency in the field.


