Xihe: A New Era for Zero-Shot Time Series Forecasting with Hierarchical Attention

TLDR: Xihe is a new family of time series foundation models developed by Ant Group, featuring a novel Hierarchical Interleaved Block Attention (HIBA) mechanism. HIBA enables the models to effectively capture multi-scale temporal dependencies by combining local (intra-block) and global (inter-block) attention. Ranging from 9.5M to 1.5B parameters, Xihe models achieve state-of-the-art zero-shot forecasting performance on the GIFT-Eval benchmark, demonstrating superior accuracy and efficiency compared to existing approaches, particularly in adapting to diverse time series data without specific training.

Time series forecasting, the art of predicting future values based on historical data, is a cornerstone of decision-making across countless industries. From predicting stock prices and energy consumption to weather patterns and sales figures, its applications are vast. However, a significant challenge arises when models need to perform well on new, unseen datasets without any specific training – a task known as zero-shot transfer.

Traditional time series foundation models (TSFMs) have often borrowed their architectural designs from successful language models. While these models have shown promise, they frequently struggle to effectively capture the complex, multi-scale temporal dependencies that are inherent to time series data. This limitation becomes particularly evident when attempting zero-shot transfers across datasets with vastly different underlying patterns and sampling rates.

Addressing these critical challenges, researchers from Ant Group have introduced a new family of scalable zero-shot time series learners called Xihe. The core innovation behind Xihe is a novel mechanism named Hierarchical Interleaved Block Attention (HIBA). This architecture is specifically designed to overcome the limitations of previous models by employing a hierarchical approach to attention, allowing it to capture dependencies at multiple scales.

HIBA works by dividing a time series into blocks of varying granularities. It then uses two types of attention: intra-block attention, which focuses on local information exchange within each block, and inter-block attention, which operates across these blocks to capture global temporal patterns and their dynamic evolution. This interleaved and hierarchical design enables Xihe to understand both short-term fluctuations and long-term trends within the data, making it highly adaptable to diverse time series characteristics.

The Xihe family of models ranges from an ultra-efficient configuration with just 9.5 million parameters (Xihe-tiny) to a high-capacity variant with 1.5 billion parameters (Xihe-max). This scalability ensures that the model can be deployed in various environments, from resource-constrained settings to those requiring maximum predictive power.

To train these models, the team assembled a massive pre-training corpus of 325 billion time points. This dataset combines publicly available data with synthetically generated time series, and crucially, employs a data-quality-aware mixing strategy. This means datasets with higher predictability (based on periodicity, trend strength, and noise level) are sampled more frequently during training, leading to more robust learning.

Evaluated on the comprehensive GIFT-Eval benchmark, which includes 23 datasets spanning seven domains and ten sampling frequencies, Xihe has demonstrated exceptional performance. The compact Xihe-tiny model, despite its small size, outperforms many contemporary TSFMs, showcasing remarkable parameter efficiency. More impressively, the largest model, Xihe-max, has set new state-of-the-art records for zero-shot performance, significantly surpassing previous best results. This consistent excellence across the entire parameter spectrum underscores the generalization capabilities and architectural superiority of HIBA.

Ablation studies further confirmed the effectiveness of the HIBA design. Replacing HIBA with standard attention mechanisms or removing its hierarchical structure led to noticeable performance drops, highlighting the importance of its multi-scale modeling capabilities. The use of multiple prediction heads for different forecasting horizons also proved beneficial, encouraging the model to learn complex temporal dependencies that generalize across various forecast lengths.

Also Read:

In essence, Xihe represents a significant leap forward in time series forecasting, offering unparalleled transferability across diverse time series data. Its innovative HIBA structure allows it to better capture the intricate local and global information, leading to impressive zero-shot forecasting accuracy and efficiency. For more technical details, you can refer to the original research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Xihe: A New Era for Zero-Shot Time Series Forecasting with Hierarchical Attention

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates