TLDR: A new study introduces a Large Language Model (LLM) framework to generate individual travel diaries for transportation models. It creates realistic individual profiles from open-source census and land-use data, then uses an LLM (Llama 3) to generate daily travel plans. Validated against real survey data and traditional models, the LLM approach shows comparable or superior realism, especially in understanding trip purpose, and offers greater consistency and aggregate representativeness without needing proprietary survey data for training.
Understanding and predicting how people travel is crucial for effective transportation planning. From managing traffic congestion to promoting environmental health, every aspect of urban mobility relies on accurate insights into individual travel behaviors. However, traditional methods, especially Agent-Based Models (ABMs), face significant hurdles. They often require vast amounts of expensive, proprietary survey data for calibration and rely on rigid mathematical frameworks that struggle to capture the complex, nuanced reasons behind human travel decisions.
A new study introduces a groundbreaking approach that leverages Large Language Models (LLMs) to generate individual travel diaries for agent-based transportation models. This innovative method aims to overcome the limitations of traditional models by reducing data dependency and enhancing the behavioral realism of travel simulations. The research, titled “Generating Individual Travel Diaries Using Large Language Models Informed by Census and Land-Use Data,” was conducted by Sepehr Golrokh Amin, Devin Rhoads, Fatemeh Fakhrmoosavi, Nicholas E. Lownes, and John N. Ivan. You can read the full paper here: Research Paper.
A Novel Two-Stage Framework for Travel Diary Generation
The core of this new methodology is a two-stage framework designed to create realistic and interpretable daily travel diaries. Unlike approaches that create an “average” agent, this framework focuses on synthesizing specific, demographically consistent individuals and then generating their unique travel patterns.
The first stage, called Stochastic Persona Synthesis, involves creating detailed individual profiles. For each geographic area (specifically, a Census Block Group), the system probabilistically assigns key attributes like employment status, household vehicle count, age bracket, and household size. This is done by drawing from statistical distributions found in publicly available American Community Survey (ACS) and Smart Location Database (SLD) data. This ensures that each synthesized individual is plausible and reflects the diversity within that specific area.
In the second stage, Direct Diary Generation, the complete synthetic persona, along with the land-use characteristics of their home environment, is fed into an LLM. The LLM, specifically the Llama 3 model running locally via the Ollama framework to ensure data privacy, is instructed to “act as” this persona and generate a full day’s travel diary. The output is structured in a clear CSV format, detailing start and end times, trip purpose, travel mode, and distance. The LLM’s generation process is carefully controlled using parameters like ‘temperature’ and ‘top_p’ to balance realism and diversity, with adjustments made based on factors like employment status to reflect different behavioral patterns.
Moving Beyond Traditional Models
Traditional transportation models, such as the four-step models, often rely on aggregated data and treat each modeling step in isolation, struggling to capture individual-level decision-making. While Agent-Based Models improved this by modeling travel as sequences of linked activities, they still demand extensive data and computational resources. Machine learning has offered some improvements, but most research has been narrowly focused and still operates primarily on structured numerical data, failing to interpret the rich, unstructured context behind human choices.
This LLM-based framework addresses these limitations by grounding agent behavior in open-source census and land-use data in a “zero-shot” setting. This means the LLM generates diaries without any prior training on proprietary household travel survey data, making it highly adaptable and reducing the cost and privacy concerns associated with traditional data collection.
Validating Realism: A Comprehensive Approach
To ensure the realism of the LLM-generated diaries, a robust validation strategy was employed. This involved comparing the LLM’s output against two sources: the real-world 2016-2017 Connecticut Statewide Transportation Study (CSTS) dataset and a benchmark of classical travel demand models (Negative Binomial for trip generation, Multinomial Logit for mode/purpose choice) that were explicitly calibrated on the CSTS data.
The primary validation method was a “one-to-cohort” analysis. Each synthetic diary was compared against a cohort of its real-world peers, matched across six key demographic variables (age bracket, employment status, household vehicle count, income level, geographic identifier, and household size). A “Realism Score” (a composite of Trip Count, Purpose Distribution, Activity Interval, and Mode Distribution scores, using Jensen-Shannon Divergence for distributional similarity) was calculated for each diary. An aggregate-level validation also compared the overall distributions of synthetic diaries to the entire HTS population.
Key Findings: LLMs Show Promise
The results of the validation experiments were insightful. In the one-to-cohort analysis, the LLM-based approach achieved a mean realism score of 0.485, slightly outperforming the classical benchmark’s 0.455. Crucially, the LLM’s scores showed significantly lower variability (standard deviation of 0.065 vs. 0.097), indicating greater consistency and reliability in generating high-quality diaries with fewer unrealistic outputs.
A detailed breakdown revealed a trade-off: classical models excelled in replicating numerical aspects like trip count and activity duration, a direct result of their explicit calibration on the HTS dataset. However, the LLM demonstrated a massive advantage in the semantic task of assigning trip purpose, highlighting its strength in capturing the “why” behind travel decisions. For mode choice, the LLM proved more accurate in complex or less common travel scenarios.
In the aggregate-level validation, the LLM achieved a substantially higher overall score (0.612 vs. 0.435) than the classical benchmark. This suggests that the LLM not only models individual-level behavior more effectively but also generates a synthetic population whose aggregate patterns are more statistically representative of the real world, even in a zero-shot setting without direct training on the validation data.
Also Read:
- AI Models Mimic Chilean Public Opinion in New Survey Research
- Building Representative Digital Societies with Language Models
The Future of Travel Modeling: Hybrid Approaches
This study demonstrates that LLMs are not a replacement for traditional methodologies but rather a powerful complementary tool with unique strengths. By combining the statistical accuracy of classical approaches with the semantic insight LLMs provide into the “why” of travel behavior, future hybrid models could become more realistic, interpretable, and responsive to human decision-making factors. This framework holds significant potential for scenario testing, generating synthetic data in regions with limited survey responses, and forming a foundation for activity modeling where classical models are currently infeasible due to data sparsity.


