TLDR: A new deep learning method accurately estimates building heights using single, very high-resolution SAR images from COSMO-SkyMed satellites. Tested across eight diverse cities globally, the object-based approach shows strong performance, particularly in European cities, outperforming some existing methods. While it faces challenges in high-density, high-rise Asian cities, the study underscores its significant potential for robust cross-city and cross-continental transfer learning in urban mapping.
Accurate knowledge of building heights is vital for a wide range of urban applications, from disaster management to city planning. However, precisely estimating these heights, especially in complex urban environments, has long been a challenging task for remote sensing technologies. Factors like obstructions from neighboring structures, varied building materials, and intricate city layouts often complicate the process.
Traditionally, methods for determining building heights have fallen into three main categories: those that combine different types of data, like radar and optical imagery; those that rely on a sequence of data from a single source over time; and those that use just a single image from one data source. While multimodal and multi-temporal approaches offer rich information, they often come with increased complexity in data processing and higher computational demands.
A new research paper introduces an innovative deep learning approach that simplifies this process significantly. Titled An Object-Based Deep Learning Approach for Building Height Estimation from Single SAR Images, this study focuses on the third category, aiming for the best balance between data simplicity and detailed structural analysis. The methodology uses single, very high-resolution (VHR) Synthetic Aperture Radar (SAR) images from the COSMO-SkyMed satellite constellation to automatically estimate building heights.
The core of this new method is an object-based regression approach. It first detects building bounding boxes and then estimates height. What makes it stand out from previous work is its streamlined input: instead of using multiple features like bounding box center coordinates, it relies primarily on the dimensions (length and width) of the building’s bounding box. This simplification helps reduce model complexity and the risk of overfitting. Furthermore, the method operates directly in the ground range domain, which simplifies the processing chain by avoiding the need to reproject data into the SAR image plane.
To rigorously test this approach, the researchers trained and evaluated their model on a unique and diverse dataset. This dataset includes eight geographically distinct cities across Europe (Milan, Rome, Munich), North and South America (Los Angeles, New York, Buenos Aires), and Asia (Shanghai, Shenzhen). This wide range of urban environments, from the low-to-mid-rise structures of European cities to the high-rise dominance of Asian megacities, allowed for a comprehensive assessment of the model’s ability to generalize.
The deep learning framework behind this method is based on ResNet-101, a powerful neural network known for efficient feature extraction. The model takes a SAR image and the corresponding building footprint as input, extracts features, and then integrates additional attributes derived from the building’s footprint bounding box to predict the building’s projected footprint, from which the height is calculated using the satellite’s incidence angle.
The results of the study are highly promising. The model demonstrated excellent performance, particularly in European cities, where it achieved a Mean Absolute Error (MAE) of approximately one building story (2.20 meters in Munich). This significantly outperforms recent state-of-the-art methods in similar out-of-distribution scenarios, meaning the model performed well on cities it hadn’t been specifically trained on.
However, the study also highlighted challenges when generalizing to cities with very high building densities and complex architectural diversity, such as Buenos Aires, New York, Shanghai, and Shenzhen. The prevalence of very tall buildings and intricate urban typologies in these Asian cities, for instance, introduced increased variability and higher error rates. This suggests that while the foundational framework holds significant promise for cross-city and cross-continental transfer learning, further research is needed to bridge the performance gap in highly diverse global urban landscapes.
Also Read:
- MoSAiC: Enhancing Land Cover Classification in Remote Sensing with Hybrid Contrastive Learning
- Geo-ORBIT: Advancing Roadway Digital Twins with Privacy-Preserving Lane Detection
In conclusion, this research marks a significant step forward in automated building height estimation from single SAR images. By reducing computational costs and simplifying data acquisition, it offers a scalable model with reasonable accuracy, especially for urban environments similar to those in Europe. Future work will focus on enhancing the model’s generalization capabilities for more diverse urban settings and exploring combinations with other methodologies.


