TLDR: DCATS is a novel framework that utilizes LLM-powered agents to enhance time series forecasting by intelligently refining training data, rather than solely optimizing model architectures. It achieved an average 6% error reduction on a large traffic volume dataset by leveraging rich metadata to select relevant auxiliary time series, demonstrating the significant potential of data-centric AI in automating and improving time series analysis.
In the rapidly evolving world of artificial intelligence, Large Language Models (LLMs) are proving to be more than just powerful text generators. They are now being harnessed to act as intelligent agents, capable of planning and executing complex tasks, particularly in the realm of Automated Machine Learning (AutoML).
Traditional AutoML systems often focus on finding the best model architecture or fine-tuning hyperparameters. However, recent insights in time series forecasting suggest a different path to improved performance: enhancing the quality of the data itself. This idea forms the core of a new approach called Data-Centric AI, which prioritizes refining data over endlessly tweaking models.
Researchers at Visa Research have introduced a novel framework called DCATS, which stands for Data-Centric Agent for Time Series. This innovative system leverages the reasoning capabilities of LLM-agents to intelligently clean and enrich time series data, ultimately optimizing forecasting performance. Instead of focusing on building more complex models, DCATS aims to make the existing models perform better by feeding them higher-quality, more relevant data.
How DCATS Works
The DCATS framework operates through an iterative process involving four key components: a time series dataset, a metadata database, an LLM-agent, and a forecasting module. When a user wants to forecast a specific time series, the LLM-agent springs into action.
First, the agent consults a rich metadata database that contains background information about all available time series, such as location, historical volume, city, county, population, and even freeway details. Based on this information, the LLM-agent generates several “proposals.” Each proposal suggests a specific subset of time series (neighbors) to be included in the training data, along with a clear explanation of why these particular neighbors were chosen. For example, to forecast traffic for a highway entrance in San Mateo, DCATS might suggest incorporating data from nearby Burlingame or from other geographically distant locations that show similar traffic patterns.
These proposals are then evaluated by the forecasting module, which trains models using the suggested data subsets and measures their performance on a validation set. The LLM-agent then reviews these results and refines its proposals for the next round, typically building upon the most successful strategies from the previous iteration. This cycle continues until no further improvements are observed, ensuring the best possible dataset is identified for the forecasting task.
Impressive Results
To validate its effectiveness, DCATS was tested on the LargeST dataset, a comprehensive collection of traffic time series from 8,600 sensors across California. The framework was applied to four different time series forecasting models: Linear, MLP, SparseTSF, and UltraSTF. The results were compelling: DCATS consistently improved performance across all tested models and metrics, achieving an average 6% reduction in forecasting error. This significant improvement highlights that the data selection process employed by DCATS enhances forecast quality regardless of the underlying model used.
The research also revealed that the LLM-agent intelligently balances different criteria for selecting neighbors, such as road network similarity, temporal pattern similarity, and geodetic distance. The explanations generated by the agent for its choices demonstrated its ability to reason over diverse metadata, adapting its strategy based on the specific characteristics of each forecasting query. This level of automated, intelligent data curation would be incredibly labor-intensive for a human to perform manually for a large number of queries.
Also Read:
- SymbolBench: Assessing Large Language Models in Time Series Reasoning
- AI Agents Uncover Deeper Geographic Insights
A New Frontier in Time Series Forecasting
The introduction of DCATS marks a significant step forward in automating time series forecasting. By shifting the focus from complex model architectures to intelligent data refinement, this framework offers a powerful new paradigm for achieving more accurate predictions. The ability of LLM-agents to reason over rich metadata and iteratively optimize data selection opens up exciting possibilities for future applications across various domains beyond traffic forecasting.
For more in-depth information, you can read the full research paper here: Empowering Time Series Forecasting with LLM-Agents.


