Improving Time Series Predictions with AI-Powered Data Refinement

TLDR: DCATS is a novel framework that utilizes LLM-powered agents to enhance time series forecasting by intelligently refining training data, rather than solely optimizing model architectures. It achieved an average 6% error reduction on a large traffic volume dataset by leveraging rich metadata to select relevant auxiliary time series, demonstrating the significant potential of data-centric AI in automating and improving time series analysis.

In the rapidly evolving world of artificial intelligence, Large Language Models (LLMs) are proving to be more than just powerful text generators. They are now being harnessed to act as intelligent agents, capable of planning and executing complex tasks, particularly in the realm of Automated Machine Learning (AutoML).

Traditional AutoML systems often focus on finding the best model architecture or fine-tuning hyperparameters. However, recent insights in time series forecasting suggest a different path to improved performance: enhancing the quality of the data itself. This idea forms the core of a new approach called Data-Centric AI, which prioritizes refining data over endlessly tweaking models.

Researchers at Visa Research have introduced a novel framework called DCATS, which stands for Data-Centric Agent for Time Series. This innovative system leverages the reasoning capabilities of LLM-agents to intelligently clean and enrich time series data, ultimately optimizing forecasting performance. Instead of focusing on building more complex models, DCATS aims to make the existing models perform better by feeding them higher-quality, more relevant data.

How DCATS Works

The DCATS framework operates through an iterative process involving four key components: a time series dataset, a metadata database, an LLM-agent, and a forecasting module. When a user wants to forecast a specific time series, the LLM-agent springs into action.

First, the agent consults a rich metadata database that contains background information about all available time series, such as location, historical volume, city, county, population, and even freeway details. Based on this information, the LLM-agent generates several “proposals.” Each proposal suggests a specific subset of time series (neighbors) to be included in the training data, along with a clear explanation of why these particular neighbors were chosen. For example, to forecast traffic for a highway entrance in San Mateo, DCATS might suggest incorporating data from nearby Burlingame or from other geographically distant locations that show similar traffic patterns.

These proposals are then evaluated by the forecasting module, which trains models using the suggested data subsets and measures their performance on a validation set. The LLM-agent then reviews these results and refines its proposals for the next round, typically building upon the most successful strategies from the previous iteration. This cycle continues until no further improvements are observed, ensuring the best possible dataset is identified for the forecasting task.

Impressive Results

To validate its effectiveness, DCATS was tested on the LargeST dataset, a comprehensive collection of traffic time series from 8,600 sensors across California. The framework was applied to four different time series forecasting models: Linear, MLP, SparseTSF, and UltraSTF. The results were compelling: DCATS consistently improved performance across all tested models and metrics, achieving an average 6% reduction in forecasting error. This significant improvement highlights that the data selection process employed by DCATS enhances forecast quality regardless of the underlying model used.

The research also revealed that the LLM-agent intelligently balances different criteria for selecting neighbors, such as road network similarity, temporal pattern similarity, and geodetic distance. The explanations generated by the agent for its choices demonstrated its ability to reason over diverse metadata, adapting its strategy based on the specific characteristics of each forecasting query. This level of automated, intelligent data curation would be incredibly labor-intensive for a human to perform manually for a large number of queries.

Also Read:

A New Frontier in Time Series Forecasting

The introduction of DCATS marks a significant step forward in automating time series forecasting. By shifting the focus from complex model architectures to intelligent data refinement, this framework offers a powerful new paradigm for achieving more accurate predictions. The ability of LLM-agents to reason over rich metadata and iteratively optimize data selection opens up exciting possibilities for future applications across various domains beyond traffic forecasting.

For more in-depth information, you can read the full research paper here: Empowering Time Series Forecasting with LLM-Agents.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Improving Time Series Predictions with AI-Powered Data Refinement

How DCATS Works

Impressive Results

A New Frontier in Time Series Forecasting

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates