TLDR: The paper introduces a modular, multimodal generative AI framework that creates realistic, labeled synthetic urban building energy data. It addresses issues of data inaccessibility, cost, and privacy by using publicly available residential information and images. The framework integrates web scraping, LLaVA for image processing, GPT for GeoJSON and inspection note generation, EnergyPlus simulations, and a weighted heuristic labeling system. Experiments validate the AI components, showing improved focus in image processing and balanced labeling, paving the way for more accessible and reproducible energy research.
In the realm of urban planning and energy management, accurate computational models are crucial for understanding and optimizing energy consumption. However, these models often hit a roadblock: the sheer volume of data they require is frequently inaccessible, expensive to collect, or raises significant privacy concerns. This challenge has spurred researchers to explore innovative ways to generate synthetic data that can mimic real-world information without the associated hurdles.
A recent research paper titled A Modular and Multimodal Generative AI Framework for Urban Building Energy Data: Generating Synthetic Homes by Jackson Eshbaugh, Chetan Tiwari, and Jorge Silveyra introduces a groundbreaking solution: a modular and multimodal generative artificial intelligence (AI) framework designed to produce realistic, labeled synthetic data for urban building energy modeling. This framework leverages publicly accessible residential information and images, significantly reducing dependence on costly or restricted data sources and fostering more accessible and reproducible research.
The Core Idea: Building Virtual Homes with AI
The essence of this framework lies in its ability to construct synthetic homes that closely mirror their real-world counterparts, complete with detailed inspection notes. It achieves this by combining various data sources and AI models in a structured, five-component pipeline:
-
Web Scraper: This initial component collects foundational data from public county datasets, including attributes like the year built, total floor area, number of rooms, and even street view photographs and floor plans.
-
Image Processor: Using LLaVA, an advanced AI model, the collected images and floor plans are translated into detailed textual descriptions. This step is vital for inferring important aspects of a home, such as its geometry from floor plans or window quality from photographs. The researchers specifically chose LLaVA over other models like GPT due to its superior ability to focus on relevant parts of an image, ensuring more accurate descriptions.
-
GeoJSON and Inspection Note Generator: The textual descriptions from LLaVA, combined with the scraped county data, are fed into OpenAI’s GPT-4.1-mini. This powerful language model then generates a GeoJSON file for the building, which includes its geometry and estimated energy performance parameters (like HVAC efficiency and insulation R-values). Crucially, it also writes a short, energy-focused inspection note, detailing observations about insulation, HVAC systems, and visible upgrades.
-
EnergyPlus Simulation: The generated GeoJSON data is converted into an IDF file, which is then used to run a simulation in EnergyPlus, a widely recognized building energy modeling tool. This step provides quantitative results on the synthetic home’s energy performance.
-
Labeling System: The final component assesses the energy efficiency of the synthetic homes. It combines heuristic rules based on the EnergyPlus simulation results with natural language inference from GPT, which parses the inspection notes. This system assigns numerical scores for HVAC and insulation efficiency, providing a comprehensive label for each synthetic home.
Ensuring Reliability: Rigorous Validation
A key strength of this research is the extensive validation of its AI components, directly addressing common concerns like AI hallucinations and unreliability. The authors conducted two main types of experiments:
-
Occlusion Testing for Image Processing: To ensure the image processor (LLaVA) was focusing on the correct parts of an image, occlusion tests were performed. By masking out different sections of home images (e.g., roofs), the researchers observed how the model’s description changed. LLaVA demonstrated a significantly better focus on relevant areas compared to GPT, confirming its accuracy in translating visual information.
-
Ablation Testing for Labeling: The labeling system underwent ablation tests to balance the influence of textual inspection notes and numerical simulation data. Initially, a purely GPT-based labeler showed a strong bias towards text. Through iterative improvements, including the introduction of a heuristic labeler and a weighted sum (80% for simulation results, 20% for text), the researchers achieved a balanced system that accurately assigns efficiency ratings based on both modalities.
Also Read:
- AI Agents Tackle Complexity in Molecular Simulation Setup
- Charting the Path to Self-Driving Science with AI Agents
Paving the Way for Future Research
This modular and multimodal framework represents a significant leap forward in urban energy research. By generating realistic, labeled synthetic data in a cost-effective and efficient manner, it alleviates the high prices, limited availability, and privacy concerns associated with real-world data collection. The framework’s flexibility means its components can be adapted or extended for various other modeling and simulation tasks.
The authors envision future applications, such as using this pipeline to train machine learning algorithms that can recommend energy efficiency retrofits based on multimodal data. This work not only advances the urban energy field but also provides a valuable, adaptable tool for experts and researchers across diverse domains, making data-driven insights more accessible than ever before.


