Large Language Models Prove Effective in Predicting Traffic Incident Impact

TLDR: This study explores using Large Language Models (LLMs) to forecast traffic incident impact, leveraging their ability to learn from few examples and process unstructured text. A novel example selection method for in-context learning is proposed. Experiments show that top-performing LLMs achieve accuracy comparable to state-of-the-art machine learning models, despite requiring significantly less training data, demonstrating their practical viability for traffic management.

Traffic incidents, such as accidents, are a major contributor to non-recurring congestion, leading to significant costs in terms of wasted time, productivity, and fuel. While many incidents have minor effects, some cause substantial disruptions. Accurately forecasting the future impact of a traffic incident is crucial for travelers to avoid congested routes and for traffic managers to respond effectively. However, the unpredictable nature of these incidents makes precise prediction challenging.

Historically, machine learning (ML) models have been used to predict various aspects of traffic incident impact, including duration, spatial span, and induced delay. Despite their utility, these models have two main drawbacks: they typically require large, labeled datasets for training, and they struggle to utilize valuable information often available in unstructured text formats, such as messages from first responders.

In recent years, large language models (LLMs) have emerged as a powerful tool for solving prediction problems in intelligent transportation systems (ITS). LLMs offer several advantages for traffic incident impact prediction. They possess in-context learning (ICL) capabilities, meaning they can perform new tasks based on a few examples provided in the prompt, eliminating the need for extensive retraining with large datasets. Furthermore, LLMs excel at processing unstructured text and extracting relevant information, and they can leverage their pre-learned knowledge about traffic dynamics.

Researchers have proposed a novel, fully LLM-based solution for predicting the impact of traffic incidents. This solution combines traditional traffic features with incident features extracted by an LLM from unstructured text logs. A critical component of this approach is an effective method for selecting examples to optimize the LLM’s in-context learning.

The proposed system works by first extracting relevant features. A small-scale LLM (like GPT-4o mini) processes free-text incident logs to identify key details such as the time of the incident, the number of vehicles involved, and the number of lanes blocked. Simultaneously, traffic speed data is analyzed to derive features like pre-incident relative speed and the overall speed decrease ratio at the prediction time. These features are then combined into a natural language prompt for the main LLM.

Alongside the user prompt, a system prompt is provided to guide the LLM. This system prompt defines the incident impact classes (mild, moderate, severe) and includes general knowledge about how incidents affect traffic flow. Crucially, it also contains several examples of correct predictions to enable the LLM’s in-context learning. The LLM is instructed to output its prediction as a single word: mild, moderate, or severe.

A significant innovation in this study is the method for selecting examples for in-context learning. Randomly selecting examples often leads to poor accuracy for classification tasks. To overcome this, the researchers devised a strategy that involves identifying and excluding outlier incidents. Then, for each impact class (mild, moderate, severe), a subset of “near-boundary” incidents (those closest to a neighboring class’s centroid in the feature space) are identified. A small number of examples are then randomly selected from these near-boundary incidents for each class, ensuring the LLM learns from challenging “edge cases.”

The effectiveness of this LLM-based solution was evaluated on a real traffic incident dataset derived from the PEMS-BAY traffic dataset, comprising 2777 incidents from the San Francisco Bay Area. Predictions were made for two horizons: 15 minutes and 30 minutes after an incident was first reported. The study compared the performance of three advanced LLMs—Claude 3.7 Sonnet, Gemini 2.0 Flash, and GPT 4.1—against two state-of-the-art machine learning models, Random Forest and XGBoost.

The results were highly encouraging. For both prediction horizons, the best-performing LLM achieved accuracy comparable to or nearly matching that of the most accurate machine learning model. This is particularly noteworthy because the LLMs relied on only 24 examples for in-context learning, whereas the machine learning models were trained with over 2000 labeled samples. GPT 4.1 showed the best accuracy for 15-minute-ahead predictions, while Claude 3.7 Sonnet excelled for the 30-minute horizon. Among the ML models, Random Forest performed best.

Further analysis confirmed the importance of specific features, with the pre-incident relative speed, overall speed decrease ratio, time of incident, number of vehicles involved, and number of lanes blocked being the most impactful. The study also validated the proposed example selection method, demonstrating that it consistently and substantially improved LLM performance compared to using randomly selected examples.

Also Read:

In conclusion, this research demonstrates the practical viability of using large language models to directly forecast the impact of traffic incidents. The findings suggest that LLMs, with their ability to learn from limited examples and process unstructured data, offer a powerful alternative to conventional machine learning methods for intelligent transportation systems. For more details, you can refer to the original research paper: Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Large Language Models Prove Effective in Predicting Traffic Incident Impact

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates