spot_img
HomeResearch & DevelopmentSpectral Filtering Boosts Transformer Performance in Long Time-Series Forecasting

Spectral Filtering Boosts Transformer Performance in Long Time-Series Forecasting

TLDR: A new approach called “Filter then Attend” significantly improves Transformer-based models for long time-series forecasting. By adding learnable spectral filters at the beginning of these models, researchers achieved 5-10% relative performance improvement, reduced model size, and enabled better capture of high-frequency data patterns, effectively counteracting the low-frequency bias of traditional Transformers.

Long time-series forecasting (LTSF) is a crucial task across various fields, from predicting energy usage and traffic patterns to analyzing financial markets. Deep learning models, particularly those based on the Transformer architecture, have shown great promise in this area. However, these models often face challenges such as a bias towards low-frequency data components and high computational and memory demands.

Researchers Elisha Dayag, Nhat Thanh Van Tran, and Jack Xin from the University of California, Irvine, have introduced a novel approach called “Filter then Attend” to address these limitations. Their paper, FILTER THEN ATTEND : IMPROVING ATTENTION-BASED TIME SERIES FORECASTING WITH SPECTRAL FILTERING, demonstrates that integrating learnable frequency filters at the initial stage of Transformer-based models can significantly enhance their performance.

The core idea is simple yet powerful: process the time series data through a learnable filter before it enters the Transformer’s attention mechanism. This pre-filtering step, which adds only about 1000 additional parameters, has been shown to yield a 5-10% relative improvement in forecasting performance across multiple Transformer-based models. Moreover, this method allows for a reduction in the models’ embedding dimensions, leading to architectures that are both smaller and more effective than their non-filtering counterparts.

How it Works: The FilterFormer Architecture

The proposed models, dubbed FilterFormers (including iFilterFormer and FilterLeddam, based on PatchTST, iTransformer, and Leddam respectively), begin by converting the input time series into a series of overlapping or non-overlapping patches. Each patch is then embedded into a latent space representation.

The crucial innovation lies in the “Spectral Block.” Here, the embedded signal is passed through a learnable frequency filter. This filter, initially randomized, modifies the spectral content of the signal by performing a pointwise multiplication in the frequency domain. After filtering, the signal is converted back to the time domain using the Inverse Discrete Fourier Transform (IDFT). This process helps the model to better capture different frequency components and temporal dependencies in the data.

Following the spectral block, the filtered signal proceeds to a standard “Attention Block,” which utilizes multi-head attention. This combination allows the model to leverage both the frequency-domain insights from the filter and the powerful sequence modeling capabilities of attention mechanisms.

Experimental Validation and Key Findings

The researchers evaluated their FilterFormers on nine diverse datasets, including ETT (Electricity Transformer Temperature), Exchange-rate, Weather, Traffic, Electricity, and Solar-Energy. They compared their models against several baselines, including iTransformer, PatchTST, Leddam, FilterNet, and DLinear.

The results consistently showed that FilterFormers outperformed most baseline models. The improvements were particularly noticeable on larger and more complex datasets like ECL and Solar-Energy. An interesting observation was made with the Traffic dataset, where FilterFormer initially performed worse than its base model. This was attributed to a significant discrepancy in Fourier spectra between training and test sets for certain channels. Upon removing an outlier channel, FilterFormer’s performance dramatically improved, becoming competitive with and often surpassing other baselines.

A significant advantage highlighted by the study is the efficiency gain. While adding a filter block does introduce some parameters, the overall FilterFormer models can be lighter and faster. This is because the filtering allows for the use of smaller embedding dimensions while still achieving superior or comparable results, effectively making filtering a “free lunch” – a minor addition with substantial potential benefits.

Analysis of the learned frequency filters revealed that FilterFormers and iFilterFormers tend to bias towards middle and high-frequency components. This is crucial because Transformer models are theoretically known to act as low-pass filters, often struggling with high-frequency patterns. The learnable filters effectively counteract this “oversmoothing” effect, enabling the models to better utilize the full spectrum for forecasting.

Also Read:

Conclusion

The research concludes that incorporating a simple, learnable frequency filter is a highly effective strategy for improving Transformer-based models in long time-series forecasting. This approach leads to enhanced forecasting accuracy, more efficient models, and a better ability to capture complex frequency patterns in data. The authors recommend adding a learned filter as a basic configuration for future Transformer-based forecasting models, suggesting further advancements could be made with more sophisticated and interpretable filter designs.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -