Augur: Large Language Models Discover Causal Patterns for Better Time Series Forecasts

TLDR: Augur is a novel AI framework for time series forecasting that uses large language models (LLMs) to identify and leverage directed causal associations among covariates. It employs a two-stage teacher-student architecture: a powerful teacher LLM infers a causal graph, and a lightweight student agent refines this graph and uses the high-confidence causal links, encoded as textual prompts, to perform accurate and interpretable predictions. This approach significantly improves forecasting accuracy and zero-shot generalization compared to existing methods, while also providing transparent reasoning about variable interactions.

Time series forecasting, which involves predicting future values based on historical data, is a crucial task across many fields, from finance to weather prediction. Recently, large language models (LLMs) have shown great promise in this area, especially with their ability to integrate various types of data, including text.

However, current LLM-based methods for time series forecasting often have limitations. LLMs are typically used in a supporting role, rather than as the main engine for reasoning. They often rely on very basic statistical summaries in their prompts, which limits their ability to understand complex relationships. Furthermore, these methods often lack transparency, making it difficult to understand why a particular prediction was made.

Introducing Augur: A Causal Approach to Time Series Forecasting

A new framework called Augur aims to overcome these limitations by fully leveraging the causal reasoning capabilities of LLMs. Augur is designed to discover and utilize direct cause-and-effect relationships among different variables (covariates) within time series data. This not only improves prediction accuracy but also provides clear, traceable explanations for how variables interact and influence forecasts.

Augur operates using a two-stage teacher-student architecture. Imagine a powerful, experienced teacher (a large LLM) and a more focused, efficient student (a lightweight LLM agent). The teacher’s role is to infer a directed causal graph from the time series data. It does this by combining a heuristic search, which narrows down the possibilities, with pairwise causality testing to identify potential cause-and-effect links. This process helps to filter out misleading connections and establish a robust understanding of how variables influence each other.

Once the teacher has established this causal graph, the student agent takes over. The student refines this graph, focusing on high-confidence causal associations. These validated relationships are then encoded as rich textual prompts, rather than just simple data summaries. The student then uses these prompts to perform the actual forecasting. This design allows Augur to achieve competitive predictive accuracy while offering transparent and understandable reasoning about variable interactions.

How Augur Works: The Teacher and Student in Detail

The process begins with the **Causal Explanation Generation via Teacher Model**. A powerful pre-trained LLM, like GPT-5, acts as the teacher. It first reduces the vast number of possible causal links by identifying the most correlated variable pairs using Spearman’s rank correlation. For each promising pair, the teacher translates numerical patterns into causal hypotheses (e.g., A causes B, B causes A, or they share a common confounder). These hypotheses are then aggregated into an initial causal graph.

This initial graph is then refined through an iterative process. The teacher identifies and resolves structural inconsistencies, such as cycles (where A causes B, B causes C, and C causes A). It does this by evaluating the plausibility of each link within the cycle and removing the weakest or least plausible one. This ensures the final graph is a valid Directed Acyclic Graph (DAG), representing clear, one-way causal flows.

Finally, the teacher synthesizes a coherent narrative summary based on this validated causal graph and any modifications made during refinement. This summary explains the causal structure in human-readable language. This information, along with the corresponding time series, forms a dataset used to train the student model.

The second stage is the **Distillation and Training of Student Agent**. Here, the corpus generated by the teacher is carefully curated, filtering for only the highest-quality causal explanations. This involves assessing ‘causal stability’ (how consistent the causal structure is across multiple samplings) and ‘informational efficiency’ (how concise and logically grounded the explanation is). A smaller, more efficient LLM, like Qwen3-8b, serves as the student. It is then fine-tuned on these refined explanations, learning to map time series data to its causal explanation and perform specific prediction tasks.

Benefits and Performance

Augur has been extensively tested on real-world datasets from diverse domains including air quality, power consumption, traffic, and finance. The results show that Augur consistently outperforms 25 other advanced baseline models in predictive performance, measured by metrics like F1-Score and AUROC. Crucially, Augur also demonstrates robust zero-shot generalization, meaning it performs well on new, unseen datasets without additional training.

The quality of the causal summaries generated by Augur is also superior, as confirmed by both automated metrics and human evaluations. These summaries are not just accurate but also insightful and easy to understand, providing valuable interpretability that is often missing in other models.

An ablation study confirmed that each core component of Augur—the initial pruning of variables, the LLM-based causal judgment, and the iterative graph refinement—is essential for its efficiency and accuracy. The research also found that focusing on a few high-quality causal discoveries yields better performance than simply adding more minor details or expanding the dataset volume without quality control.

Also Read:

Conclusion and Future Outlook

Augur represents a significant step forward in time series forecasting by integrating the powerful causal reasoning abilities of large language models. By extracting explicit causal associations and using them to guide predictions, Augur enhances both forecasting accuracy and interpretability. While the approach relies on certain assumptions, such as the absence of unobserved confounders, its modular design allows for future extensions to incorporate other forms of time-series analysis and statistical properties.

This framework offers a pragmatic yet powerful method to harness the reasoning power of state-of-the-art LLMs, making time series analysis more efficient, economical, and controllable for real-world applications. For more technical details, you can refer to the full research paper: Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Augur: Large Language Models Discover Causal Patterns for Better Time Series Forecasts

Introducing Augur: A Causal Approach to Time Series Forecasting

How Augur Works: The Teacher and Student in Detail

Benefits and Performance

Conclusion and Future Outlook

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates