TLDR: HTMformer is a new Transformer-based model for time series forecasting that introduces Hybrid Temporal and Multivariate Embeddings (HTME). Unlike previous models that overemphasize temporal data, HTMformer extracts both temporal and multivariate features to create richer data representations. This approach significantly improves forecasting accuracy and computational efficiency across various real-world datasets for both long-term and short-term predictions, while also reducing training time, GPU memory usage, and model parameters.
Time series forecasting, a critical task in fields ranging from finance to climate science, has seen significant advancements with the rise of Transformer-based models. However, these existing models often face challenges, primarily by focusing too heavily on temporal dependencies, leading to increased computational costs without proportional gains in accuracy. A new research paper introduces HTMformer, a novel approach designed to overcome these limitations by rethinking how data is represented before it even reaches the Transformer.
The core innovation behind HTMformer is the Hybrid Temporal and Multivariate Embeddings (HTME) extractor. This component is engineered to capture a richer, more meaningful representation of time series data. Instead of just looking at how data changes over time, HTME also extracts multivariate features, which are crucial for understanding how different variables in a time series influence each other. This dual approach ensures that the embedding layer, which is vital for a Transformer’s performance, provides a comprehensive view of the data.
HTMformer itself is a lightweight forecasting model that integrates the HTME extractor with a standard Transformer encoder architecture. It employs an ‘inverted input’ design, allowing the Transformer’s attention mechanism to directly model relationships between different data channels (variables) rather than just temporal sequences. This clever design significantly reduces computational complexity, making the model more efficient.
How HTME Works
The HTME extractor consists of two main parts:
- Temporal Feature Extractor: This module segments the time series into smaller ‘patches’ and uses convolutional operations to identify short-term patterns. It then flattens these outputs and applies a linear projection to capture longer-term temporal correlations. By treating each data channel independently during this stage, it minimizes interference and enhances the capture of temporal features.
- Multivariate Feature Extractor: This module also uses patching but focuses on modeling correlations among different input variables. It flattens patch features, applies a linear layer to learn inter-variable relationships, and then processes these through a GRU (Gated Recurrent Unit) network. The GRU adaptively emphasizes or suppresses historical information, and a final convolution layer expands the features to match the temporal module’s output.
A learnable fusion weight is used to adaptively balance the contributions of these two modules, allowing HTME to adjust to datasets with varying characteristics.
Also Read:
- Rethinking Periodicity for Efficient Time Series Forecasting
- Advancing Time Series Forecasting with Residual-Stacked Gaussian Linear Models
Performance and Efficiency
Extensive experiments on eight real-world datasets demonstrate that HTMformer consistently outperforms existing state-of-the-art models in both accuracy and efficiency. For instance, integrating HTME into various Transformer variants yielded average performance gains of 27% to 46% in MSE reduction on datasets like Electricity, Weather, and Traffic. The model shows superior performance in both long-term and short-term forecasting tasks, particularly on high-dimensional datasets such as Traffic and Solar-Energy.
Beyond accuracy, HTMformer also excels in efficiency. Compared to leading models like MultiPatchFormer, HTMformer achieves training runtimes that are approximately one-third faster, requires only 20% to 45% of the GPU memory, and uses about half the number of model parameters. These significant efficiency gains mean faster training and inference, making HTMformer more practical for real-time applications and deployment on a wider range of devices.
The research highlights that while temporal features are often primary, multivariate correlations are indispensable for accurate time series forecasting. HTMformer’s hybrid strategy effectively combines these two dimensions, leading to richer, more informative embeddings without incurring excessive computational overhead. For more technical details, you can refer to the full research paper: HTMformer: Hybrid Temporal and Multivariate Transformer for Time Series Forecasting.
This work underscores the importance of a well-designed embedding layer in Transformer-based forecasters and opens new avenues for future research, particularly in jointly modeling complex spatiotemporal dependencies.


