TLDR: Augur is a novel AI framework for time series forecasting that uses large language models (LLMs) to identify and leverage directed causal associations among covariates. It employs a two-stage teacher-student architecture: a powerful teacher LLM infers a causal graph, and a lightweight student agent refines this graph and uses the high-confidence causal links, encoded as textual prompts, to perform accurate and interpretable predictions. This approach significantly improves forecasting accuracy and zero-shot generalization compared to existing methods, while also providing transparent reasoning about variable interactions.
Time series forecasting, which involves predicting future values based on historical data, is a crucial task across many fields, from finance to weather prediction. Recently, large language models (LLMs) have shown great promise in this area, especially with their ability to integrate various types of data, including text.
However, current LLM-based methods for time series forecasting often have limitations. LLMs are typically used in a supporting role, rather than as the main engine for reasoning. They often rely on very basic statistical summaries in their prompts, which limits their ability to understand complex relationships. Furthermore, these methods often lack transparency, making it difficult to understand why a particular prediction was made.
Introducing Augur: A Causal Approach to Time Series Forecasting
A new framework called Augur aims to overcome these limitations by fully leveraging the causal reasoning capabilities of LLMs. Augur is designed to discover and utilize direct cause-and-effect relationships among different variables (covariates) within time series data. This not only improves prediction accuracy but also provides clear, traceable explanations for how variables interact and influence forecasts.
Augur operates using a two-stage teacher-student architecture. Imagine a powerful, experienced teacher (a large LLM) and a more focused, efficient student (a lightweight LLM agent). The teacher’s role is to infer a directed causal graph from the time series data. It does this by combining a heuristic search, which narrows down the possibilities, with pairwise causality testing to identify potential cause-and-effect links. This process helps to filter out misleading connections and establish a robust understanding of how variables influence each other.
Once the teacher has established this causal graph, the student agent takes over. The student refines this graph, focusing on high-confidence causal associations. These validated relationships are then encoded as rich textual prompts, rather than just simple data summaries. The student then uses these prompts to perform the actual forecasting. This design allows Augur to achieve competitive predictive accuracy while offering transparent and understandable reasoning about variable interactions.
How Augur Works: The Teacher and Student in Detail
The process begins with the **Causal Explanation Generation via Teacher Model**. A powerful pre-trained LLM, like GPT-5, acts as the teacher. It first reduces the vast number of possible causal links by identifying the most correlated variable pairs using Spearman’s rank correlation. For each promising pair, the teacher translates numerical patterns into causal hypotheses (e.g., A causes B, B causes A, or they share a common confounder). These hypotheses are then aggregated into an initial causal graph.
This initial graph is then refined through an iterative process. The teacher identifies and resolves structural inconsistencies, such as cycles (where A causes B, B causes C, and C causes A). It does this by evaluating the plausibility of each link within the cycle and removing the weakest or least plausible one. This ensures the final graph is a valid Directed Acyclic Graph (DAG), representing clear, one-way causal flows.
Finally, the teacher synthesizes a coherent narrative summary based on this validated causal graph and any modifications made during refinement. This summary explains the causal structure in human-readable language. This information, along with the corresponding time series, forms a dataset used to train the student model.
The second stage is the **Distillation and Training of Student Agent**. Here, the corpus generated by the teacher is carefully curated, filtering for only the highest-quality causal explanations. This involves assessing ‘causal stability’ (how consistent the causal structure is across multiple samplings) and ‘informational efficiency’ (how concise and logically grounded the explanation is). A smaller, more efficient LLM, like Qwen3-8b, serves as the student. It is then fine-tuned on these refined explanations, learning to map time series data to its causal explanation and perform specific prediction tasks.
Benefits and Performance
Augur has been extensively tested on real-world datasets from diverse domains including air quality, power consumption, traffic, and finance. The results show that Augur consistently outperforms 25 other advanced baseline models in predictive performance, measured by metrics like F1-Score and AUROC. Crucially, Augur also demonstrates robust zero-shot generalization, meaning it performs well on new, unseen datasets without additional training.
The quality of the causal summaries generated by Augur is also superior, as confirmed by both automated metrics and human evaluations. These summaries are not just accurate but also insightful and easy to understand, providing valuable interpretability that is often missing in other models.
An ablation study confirmed that each core component of Augur—the initial pruning of variables, the LLM-based causal judgment, and the iterative graph refinement—is essential for its efficiency and accuracy. The research also found that focusing on a few high-quality causal discoveries yields better performance than simply adding more minor details or expanding the dataset volume without quality control.
Also Read:
- TimePD: Enhancing Time Series Forecasting Across Domains Without Source Data
- FETA: A New Approach to Training-Free Time Series Classification with LLM Agents
Conclusion and Future Outlook
Augur represents a significant step forward in time series forecasting by integrating the powerful causal reasoning abilities of large language models. By extracting explicit causal associations and using them to guide predictions, Augur enhances both forecasting accuracy and interpretability. While the approach relies on certain assumptions, such as the absence of unobserved confounders, its modular design allows for future extensions to incorporate other forms of time-series analysis and statistical properties.
This framework offers a pragmatic yet powerful method to harness the reasoning power of state-of-the-art LLMs, making time series analysis more efficient, economical, and controllable for real-world applications. For more technical details, you can refer to the full research paper: Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models.


