TLDR: A new AI framework predicts flight delays, specifically post-terminal delays, by combining textual aviation information (flight details, weather reports, and aerodrome notices) with real-time aircraft trajectory data. It uses Large Language Models adapted to understand both language and trajectory patterns, achieving sub-minute prediction errors and supporting second-by-second updates for air traffic controllers. This approach provides a comprehensive, real-time understanding of airspace conditions, making it highly practical for air traffic management.
Flight delays are a significant challenge in air traffic management, leading to increased operating costs for airlines, disruptions for passengers, and ripple effects across the entire air travel network. Addressing this, a new research paper introduces a novel approach to predict flight delays, particularly those occurring after an aircraft enters the terminal area, a critical phase for air traffic controllers (ATCs).
The paper, titled Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation, presents a lightweight, multimodal framework that leverages the power of Large Language Models (LLMs) combined with detailed aircraft trajectory data. Authored by Thaweerath Phisannupawonga, Joshua Julian Damanika, and Han-Lim Choia, this research aims to provide ATCs with accurate, real-time delay predictions, enhancing operational efficiency and safety.
A Multimodal Approach to Understanding Airspace
Traditional flight delay prediction models often rely on structured tabular data, historical delays, or airport network information. This new framework takes a more comprehensive approach by integrating diverse data types, or “modalities,” that ATCs themselves monitor. These include:
- General Flight Information: Details like airline name, flight identifier, departure and destination airports, scheduled arrival times, aircraft type, and wake turbulence category.
- Weather Reports: Real-time Meteorological Aerodrome Reports (METAR) and Terminal Aerodrome Forecasts (TAF), providing crucial weather conditions and predictions.
- Aerodrome Notices (NOTAMs): Special operational constraints or disruptions, such as runway closures or equipment failures, which can directly cause delays.
- Aircraft Trajectory Data: This is a key innovation. The model processes three types of trajectories: the “focusing trajectory” (the aircraft being monitored), “active trajectories” (other aircraft in the terminal area), and “prior trajectories” (completed flights that might influence current patterns). This data provides a real-time understanding of airspace conditions and congestion.
The core of the methodology lies in “cross-modality adaptation.” Large Language Models are inherently designed to understand text. However, aircraft trajectories are complex time-series data. The framework uses a specialized trajectory representation model (ATSCC) to convert this time-series data into a format that LLMs can interpret. A lightweight adaptation network then bridges the gap, projecting these trajectory representations into the LLM’s language-compatible embedding space. This allows the LLM to process and reason over both textual and trajectory information simultaneously.
How the Model Works
Instead of training a massive model from scratch, the framework smartly utilizes pre-trained components. A frozen LLM backbone (such as LLaMA-3.2-1B-Instruct or Pythia-1B) provides robust linguistic understanding. A pre-trained trajectory encoder (ATSCC) handles the complex trajectory data. Only a small cross-modality adaptation network and a regression head (for predicting delay duration) are trained. This makes the training process efficient and memory-friendly, allowing the model to run on consumer-grade GPUs.
The model focuses on predicting the “post-terminal duration” – the time an aircraft spends in the airspace after entering the terminal area until its actual arrival. By accurately estimating this duration, and knowing the scheduled arrival and actual airspace entry times, the total delay can be precisely calculated. This formulation aligns directly with the information available to ATCs, making the predictions highly relevant to their operational needs.
Promising Results and Real-time Capabilities
Experimental results demonstrate the framework’s effectiveness. When evaluated across various LLM backbones, the model consistently achieved sub-minute prediction errors. This level of accuracy is highly practical for air traffic management, where delays are typically recorded at the minute level. The research highlights that the LLM’s inherent linguistic understanding, combined with the rich contextual information from trajectory data, is crucial for these accurate predictions.
A significant advantage of this framework is its ability to provide second-by-second delay updates. Unlike previous models constrained by the infrequent updates of some data sources (like weather reports every 30 minutes), the integration of frequently transmitted surveillance broadcast signals allows the context to be updated in real-time. This continuous monitoring capability is invaluable for ATCs, enabling them to refine predictions as an aircraft progresses towards the airport.
Also Read:
- Aeolus: A New Multi-Modal Dataset for Understanding Flight Delays
- StuckSolver: An LLM-Powered System for Autonomous Vehicle Recovery
Impact and Future Directions
This research offers a practical and scalable solution for flight delay prediction. By focusing on a single airport and integrating a wide range of multimodal data, it overcomes limitations of prior works that often relied on historical delay data or complex airport network information. The framework’s robustness, even when certain contextual information is occasionally unavailable, further underscores its operational suitability.
Looking ahead, the researchers suggest extending this framework to predict delays in other flight phases and exploring its application to broader air traffic management tasks, such as trajectory prediction and conflict detection. The integration of even higher-quality operational data and additional modalities like images or audio could further enhance the capabilities of multimodal models in aviation.


