spot_img
HomeResearch & DevelopmentXihe: A New Era for Zero-Shot Time Series Forecasting...

Xihe: A New Era for Zero-Shot Time Series Forecasting with Hierarchical Attention

TLDR: Xihe is a new family of time series foundation models developed by Ant Group, featuring a novel Hierarchical Interleaved Block Attention (HIBA) mechanism. HIBA enables the models to effectively capture multi-scale temporal dependencies by combining local (intra-block) and global (inter-block) attention. Ranging from 9.5M to 1.5B parameters, Xihe models achieve state-of-the-art zero-shot forecasting performance on the GIFT-Eval benchmark, demonstrating superior accuracy and efficiency compared to existing approaches, particularly in adapting to diverse time series data without specific training.

Time series forecasting, the art of predicting future values based on historical data, is a cornerstone of decision-making across countless industries. From predicting stock prices and energy consumption to weather patterns and sales figures, its applications are vast. However, a significant challenge arises when models need to perform well on new, unseen datasets without any specific training – a task known as zero-shot transfer.

Traditional time series foundation models (TSFMs) have often borrowed their architectural designs from successful language models. While these models have shown promise, they frequently struggle to effectively capture the complex, multi-scale temporal dependencies that are inherent to time series data. This limitation becomes particularly evident when attempting zero-shot transfers across datasets with vastly different underlying patterns and sampling rates.

Addressing these critical challenges, researchers from Ant Group have introduced a new family of scalable zero-shot time series learners called Xihe. The core innovation behind Xihe is a novel mechanism named Hierarchical Interleaved Block Attention (HIBA). This architecture is specifically designed to overcome the limitations of previous models by employing a hierarchical approach to attention, allowing it to capture dependencies at multiple scales.

HIBA works by dividing a time series into blocks of varying granularities. It then uses two types of attention: intra-block attention, which focuses on local information exchange within each block, and inter-block attention, which operates across these blocks to capture global temporal patterns and their dynamic evolution. This interleaved and hierarchical design enables Xihe to understand both short-term fluctuations and long-term trends within the data, making it highly adaptable to diverse time series characteristics.

The Xihe family of models ranges from an ultra-efficient configuration with just 9.5 million parameters (Xihe-tiny) to a high-capacity variant with 1.5 billion parameters (Xihe-max). This scalability ensures that the model can be deployed in various environments, from resource-constrained settings to those requiring maximum predictive power.

To train these models, the team assembled a massive pre-training corpus of 325 billion time points. This dataset combines publicly available data with synthetically generated time series, and crucially, employs a data-quality-aware mixing strategy. This means datasets with higher predictability (based on periodicity, trend strength, and noise level) are sampled more frequently during training, leading to more robust learning.

Evaluated on the comprehensive GIFT-Eval benchmark, which includes 23 datasets spanning seven domains and ten sampling frequencies, Xihe has demonstrated exceptional performance. The compact Xihe-tiny model, despite its small size, outperforms many contemporary TSFMs, showcasing remarkable parameter efficiency. More impressively, the largest model, Xihe-max, has set new state-of-the-art records for zero-shot performance, significantly surpassing previous best results. This consistent excellence across the entire parameter spectrum underscores the generalization capabilities and architectural superiority of HIBA.

Ablation studies further confirmed the effectiveness of the HIBA design. Replacing HIBA with standard attention mechanisms or removing its hierarchical structure led to noticeable performance drops, highlighting the importance of its multi-scale modeling capabilities. The use of multiple prediction heads for different forecasting horizons also proved beneficial, encouraging the model to learn complex temporal dependencies that generalize across various forecast lengths.

Also Read:

In essence, Xihe represents a significant leap forward in time series forecasting, offering unparalleled transferability across diverse time series data. Its innovative HIBA structure allows it to better capture the intricate local and global information, leading to impressive zero-shot forecasting accuracy and efficiency. For more technical details, you can refer to the original research paper.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -