TLDR: This study explores using Large Language Models (LLMs) to forecast traffic incident impact, leveraging their ability to learn from few examples and process unstructured text. A novel example selection method for in-context learning is proposed. Experiments show that top-performing LLMs achieve accuracy comparable to state-of-the-art machine learning models, despite requiring significantly less training data, demonstrating their practical viability for traffic management.
Traffic incidents, such as accidents, are a major contributor to non-recurring congestion, leading to significant costs in terms of wasted time, productivity, and fuel. While many incidents have minor effects, some cause substantial disruptions. Accurately forecasting the future impact of a traffic incident is crucial for travelers to avoid congested routes and for traffic managers to respond effectively. However, the unpredictable nature of these incidents makes precise prediction challenging.
Historically, machine learning (ML) models have been used to predict various aspects of traffic incident impact, including duration, spatial span, and induced delay. Despite their utility, these models have two main drawbacks: they typically require large, labeled datasets for training, and they struggle to utilize valuable information often available in unstructured text formats, such as messages from first responders.
In recent years, large language models (LLMs) have emerged as a powerful tool for solving prediction problems in intelligent transportation systems (ITS). LLMs offer several advantages for traffic incident impact prediction. They possess in-context learning (ICL) capabilities, meaning they can perform new tasks based on a few examples provided in the prompt, eliminating the need for extensive retraining with large datasets. Furthermore, LLMs excel at processing unstructured text and extracting relevant information, and they can leverage their pre-learned knowledge about traffic dynamics.
Researchers have proposed a novel, fully LLM-based solution for predicting the impact of traffic incidents. This solution combines traditional traffic features with incident features extracted by an LLM from unstructured text logs. A critical component of this approach is an effective method for selecting examples to optimize the LLM’s in-context learning.
The proposed system works by first extracting relevant features. A small-scale LLM (like GPT-4o mini) processes free-text incident logs to identify key details such as the time of the incident, the number of vehicles involved, and the number of lanes blocked. Simultaneously, traffic speed data is analyzed to derive features like pre-incident relative speed and the overall speed decrease ratio at the prediction time. These features are then combined into a natural language prompt for the main LLM.
Alongside the user prompt, a system prompt is provided to guide the LLM. This system prompt defines the incident impact classes (mild, moderate, severe) and includes general knowledge about how incidents affect traffic flow. Crucially, it also contains several examples of correct predictions to enable the LLM’s in-context learning. The LLM is instructed to output its prediction as a single word: mild, moderate, or severe.
A significant innovation in this study is the method for selecting examples for in-context learning. Randomly selecting examples often leads to poor accuracy for classification tasks. To overcome this, the researchers devised a strategy that involves identifying and excluding outlier incidents. Then, for each impact class (mild, moderate, severe), a subset of “near-boundary” incidents (those closest to a neighboring class’s centroid in the feature space) are identified. A small number of examples are then randomly selected from these near-boundary incidents for each class, ensuring the LLM learns from challenging “edge cases.”
The effectiveness of this LLM-based solution was evaluated on a real traffic incident dataset derived from the PEMS-BAY traffic dataset, comprising 2777 incidents from the San Francisco Bay Area. Predictions were made for two horizons: 15 minutes and 30 minutes after an incident was first reported. The study compared the performance of three advanced LLMs—Claude 3.7 Sonnet, Gemini 2.0 Flash, and GPT 4.1—against two state-of-the-art machine learning models, Random Forest and XGBoost.
The results were highly encouraging. For both prediction horizons, the best-performing LLM achieved accuracy comparable to or nearly matching that of the most accurate machine learning model. This is particularly noteworthy because the LLMs relied on only 24 examples for in-context learning, whereas the machine learning models were trained with over 2000 labeled samples. GPT 4.1 showed the best accuracy for 15-minute-ahead predictions, while Claude 3.7 Sonnet excelled for the 30-minute horizon. Among the ML models, Random Forest performed best.
Further analysis confirmed the importance of specific features, with the pre-incident relative speed, overall speed decrease ratio, time of incident, number of vehicles involved, and number of lanes blocked being the most impactful. The study also validated the proposed example selection method, demonstrating that it consistently and substantially improved LLM performance compared to using randomly selected examples.
Also Read:
- AI’s New Frontier: Detecting Road Crashes with Language Models
- Natural Language: The Key to Smarter, Safer Autonomous Vehicle Communication
In conclusion, this research demonstrates the practical viability of using large language models to directly forecast the impact of traffic incidents. The findings suggest that LLMs, with their ability to learn from limited examples and process unstructured data, offer a powerful alternative to conventional machine learning methods for intelligent transportation systems. For more details, you can refer to the original research paper: Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents.


