Spectral Filtering Boosts Transformer Performance in Long Time-Series Forecasting

TLDR: A new approach called “Filter then Attend” significantly improves Transformer-based models for long time-series forecasting. By adding learnable spectral filters at the beginning of these models, researchers achieved 5-10% relative performance improvement, reduced model size, and enabled better capture of high-frequency data patterns, effectively counteracting the low-frequency bias of traditional Transformers.

Long time-series forecasting (LTSF) is a crucial task across various fields, from predicting energy usage and traffic patterns to analyzing financial markets. Deep learning models, particularly those based on the Transformer architecture, have shown great promise in this area. However, these models often face challenges such as a bias towards low-frequency data components and high computational and memory demands.

Researchers Elisha Dayag, Nhat Thanh Van Tran, and Jack Xin from the University of California, Irvine, have introduced a novel approach called “Filter then Attend” to address these limitations. Their paper, FILTER THEN ATTEND : IMPROVING ATTENTION-BASED TIME SERIES FORECASTING WITH SPECTRAL FILTERING, demonstrates that integrating learnable frequency filters at the initial stage of Transformer-based models can significantly enhance their performance.

The core idea is simple yet powerful: process the time series data through a learnable filter before it enters the Transformer’s attention mechanism. This pre-filtering step, which adds only about 1000 additional parameters, has been shown to yield a 5-10% relative improvement in forecasting performance across multiple Transformer-based models. Moreover, this method allows for a reduction in the models’ embedding dimensions, leading to architectures that are both smaller and more effective than their non-filtering counterparts.

How it Works: The FilterFormer Architecture

The proposed models, dubbed FilterFormers (including iFilterFormer and FilterLeddam, based on PatchTST, iTransformer, and Leddam respectively), begin by converting the input time series into a series of overlapping or non-overlapping patches. Each patch is then embedded into a latent space representation.

The crucial innovation lies in the “Spectral Block.” Here, the embedded signal is passed through a learnable frequency filter. This filter, initially randomized, modifies the spectral content of the signal by performing a pointwise multiplication in the frequency domain. After filtering, the signal is converted back to the time domain using the Inverse Discrete Fourier Transform (IDFT). This process helps the model to better capture different frequency components and temporal dependencies in the data.

Following the spectral block, the filtered signal proceeds to a standard “Attention Block,” which utilizes multi-head attention. This combination allows the model to leverage both the frequency-domain insights from the filter and the powerful sequence modeling capabilities of attention mechanisms.

Experimental Validation and Key Findings

The researchers evaluated their FilterFormers on nine diverse datasets, including ETT (Electricity Transformer Temperature), Exchange-rate, Weather, Traffic, Electricity, and Solar-Energy. They compared their models against several baselines, including iTransformer, PatchTST, Leddam, FilterNet, and DLinear.

The results consistently showed that FilterFormers outperformed most baseline models. The improvements were particularly noticeable on larger and more complex datasets like ECL and Solar-Energy. An interesting observation was made with the Traffic dataset, where FilterFormer initially performed worse than its base model. This was attributed to a significant discrepancy in Fourier spectra between training and test sets for certain channels. Upon removing an outlier channel, FilterFormer’s performance dramatically improved, becoming competitive with and often surpassing other baselines.

A significant advantage highlighted by the study is the efficiency gain. While adding a filter block does introduce some parameters, the overall FilterFormer models can be lighter and faster. This is because the filtering allows for the use of smaller embedding dimensions while still achieving superior or comparable results, effectively making filtering a “free lunch” – a minor addition with substantial potential benefits.

Analysis of the learned frequency filters revealed that FilterFormers and iFilterFormers tend to bias towards middle and high-frequency components. This is crucial because Transformer models are theoretically known to act as low-pass filters, often struggling with high-frequency patterns. The learnable filters effectively counteract this “oversmoothing” effect, enabling the models to better utilize the full spectrum for forecasting.

Also Read:

Conclusion

The research concludes that incorporating a simple, learnable frequency filter is a highly effective strategy for improving Transformer-based models in long time-series forecasting. This approach leads to enhanced forecasting accuracy, more efficient models, and a better ability to capture complex frequency patterns in data. The authors recommend adding a learned filter as a basic configuration for future Transformer-based forecasting models, suggesting further advancements could be made with more sophisticated and interpretable filter designs.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Spectral Filtering Boosts Transformer Performance in Long Time-Series Forecasting

How it Works: The FilterFormer Architecture

Experimental Validation and Key Findings

Conclusion

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates