TLDR: TS-Agent is an AI agent that improves time series reasoning by combining large language models (LLMs) for high-level reasoning with specialized analytical tools for precise data analysis. It uses an iterative process, an evidence log, and self-correction mechanisms (critic and quality gate) to ensure interpretability, verifiability, and to prevent LLM hallucinations or reliance on memorized knowledge. Experiments show it performs well on understanding tasks and significantly better on complex reasoning tasks compared to existing LLM approaches.
Large language models (LLMs) have shown impressive capabilities in various reasoning and problem-solving tasks. However, when it comes to understanding and making decisions based on time series data—like financial markets, patient health records, or climate patterns—these powerful models often face significant challenges. Issues such as generating incorrect information (hallucination) or relying on memorized knowledge rather than actual data analysis (knowledge leakage) can undermine their effectiveness in these critical domains.
Addressing this gap, researchers from JPMorgan AI Research have introduced TS-Agent, a novel AI agent designed specifically for time series reasoning. Unlike approaches that try to convert time series data into text or images for LLMs to process, TS-Agent takes a different, more specialized route. It uses LLMs for their strong reasoning and synthesis abilities, while delegating the precise extraction of statistical and structural information to dedicated time series analytical tools.
How TS-Agent Works
The core idea behind TS-Agent is to interact directly with raw numerical time series data using a set of atomic operators. Instead of trying to make LLMs “perceive” the data, TS-Agent allows the LLM to act as a coordinator, guiding a step-by-step reasoning process. Each action taken by an analytical tool, and its resulting output, is meticulously recorded in an “evidence log.” This log serves as a transparent memory, allowing the agent to iteratively refine its reasoning.
A crucial part of TS-Agent’s design is its self-critic and a final quality gate. These components ensure that the agent’s reasoning is sound and its conclusions are firmly grounded in the evidence gathered. The self-critic reviews each step, checking if the chosen tool is appropriate, if the observations are plausible, and if enough evidence has been collected. The quality gate, at the end, verifies that the final answer complies with the question’s requirements and is fully supported by the evidence log, effectively preventing unsupported outputs or hallucinations.
Key Advantages of TS-Agent
This innovative design offers several significant benefits:
- It eliminates the need for complex multi-modal training, as LLMs don’t directly process raw time series data.
- It preserves the native, quantitative form of time series, ensuring no information is lost and allowing for precise statistical computations.
- It guides problem-solving in a human-like, auditable manner through an iterative “think-act-observe” loop.
- It enhances reliability through self-refinement and evidence grounding, significantly reducing the risk of hallucination and knowledge leakage.
Understanding vs. Reasoning in Time Series
The paper makes a clear distinction between “time series understanding” and “time series reasoning.” Understanding tasks involve extracting direct properties or characteristics, such as identifying a trend or detecting an anomaly. Reasoning tasks, on the other hand, require logical inference, combining multiple aspects of the series dynamics to draw higher-level conclusions. For example, predicting the effect of a temperature spike on electricity consumption requires understanding causal relationships and daily patterns. While LLMs have shown some success in understanding tasks, true reasoning remains a major hurdle.
TS-Agent is equipped with a comprehensive library of analytical tools, categorized into data processing, detection & classification, numerical operations, and relations & comparison. There’s even a mechanism for synthesizing custom operators from natural language descriptions, allowing the agent to adapt to novel tasks.
Also Read:
- Augur: Large Language Models Discover Causal Patterns for Better Time Series Forecasts
- FETA: A New Approach to Training-Free Time Series Classification with LLM Agents
Performance and Impact
Evaluations on established benchmarks demonstrate TS-Agent’s effectiveness. On time series understanding tasks, it achieves performance comparable to or even surpasses state-of-the-art LLMs, despite using a relatively lightweight LLM backbone (gpt-4o-mini). More importantly, on complex time series reasoning tasks, where existing LLMs often struggle due to knowledge leakage, TS-Agent delivers significant improvements. For instance, in tasks involving reasoning across two time series, TS-Agent outperformed all baselines, achieving 57.3% accuracy, compared to LLMs that often rely on memorized knowledge.
This success highlights that by separating the reasoning capabilities of LLMs from the precise analytical needs of time series data, and by enforcing evidence grounding, TS-Agent offers a robust and interpretable solution. It paves the way for more reliable and auditable AI systems in domains where time series data is critical for decision-making. You can read the full research paper here.


