spot_img
HomeResearch & DevelopmentMoSAiC: Enhancing Land Cover Classification in Remote Sensing with...

MoSAiC: Enhancing Land Cover Classification in Remote Sensing with Hybrid Contrastive Learning

TLDR: MoSAiC is a new framework that improves multi-label land cover classification from multi-modal satellite imagery (optical and SAR). It combines self-supervised and supervised contrastive learning to better distinguish similar land cover types, especially when labeled data is scarce. Experiments on BigEarthNet V2.0 and Sent12MS datasets show MoSAiC outperforms existing methods in accuracy and feature representation, making it more robust for Earth System Observation applications.

In the rapidly evolving field of Earth System Observation (ESO), where vast amounts of data are collected from satellites, a new framework called MoSAiC has emerged. This innovative approach, detailed in the research paper MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing, aims to significantly improve how we classify land cover using multi-modal satellite imagery, especially in challenging scenarios with limited labeled data.

Earth System Observation involves gathering diverse data from various satellite modalities, such as optical imagery (like from Sentinel-2) and Synthetic Aperture Radar (SAR) imagery (like from Sentinel-1). These different views of the same geographical region offer unique insights, making them ideal for advanced machine learning techniques. However, ESO data presents unique challenges: subtle visual similarities between different land cover types (e.g., distinguishing different forest species), significant clutter from overlapping features, and often ill-defined boundaries for objects of interest. Traditional methods often struggle with these complexities, particularly in multi-label settings where a single area can have multiple land cover types.

Contrastive learning (CL) has gained prominence for its ability to learn powerful data representations without needing extensive labeled datasets. It works by teaching models to identify similarities and differences among data samples. While effective, many existing CL frameworks are limited to single-modality data or lack the mechanisms to handle the multi-label and semantic precision requirements of ESO.

Introducing MoSAiC: A Unified Framework

MoSAiC, which stands for Multi-Modal Multi-Label Supervision-Aware Contrastive Learning, is designed to overcome these limitations. It’s a unified framework that combines two powerful learning paradigms: self-supervised contrastive learning and supervised contrastive learning. The core idea is to jointly optimize these approaches to achieve finer semantic disentanglement and more robust representation learning across spectrally similar and spatially complex classes.

The framework operates by extracting features from both Sentinel-1 (SAR) and Sentinel-2 (optical) images using separate encoder networks. It then employs a multi-faceted contrastive learning strategy:

  • Intra-modal Contrastive Learning: This part focuses on learning robust representations within each modality. It uses techniques similar to SimCLR, where different augmented views of the same image from a single modality are encouraged to be similar, while views from different images are pushed apart.
  • Inter-modal Contrastive Learning: This is crucial for multi-modal data. MoSAiC treats co-registered image patches from Sentinel-1 and Sentinel-2 (i.e., patches covering the exact same geographical location) as positive pairs. This helps align the representations of different modalities in a shared semantic space, effectively performing implicit data fusion.
  • Multi-label Supervised Contrastive Loss: This is where MoSAiC truly shines. It incorporates label information directly into the contrastive learning process. By using a Multi-Label Supervised Contrastive (MulSupCon) loss, MoSAiC encourages representations with similar labels to be close in the latent space. This is applied both to fused representations of Sentinel-1 and Sentinel-2, and to augmented views within each modality, ensuring that the learned features are not only robust but also semantically meaningful with respect to land cover labels.
  • Multi-label Binary Cross-Entropy Loss: A standard classification loss is also included, applied to the fused feature representations, to predict the multi-label probability distribution for land cover types.

MoSAiC offers two main hybrid training strategies, MoSAiC-1 and MoSAiC-2, which combine these loss terms in different ways. A key advantage of these hybrid approaches is the joint optimization of all components—feature encoders, projection heads, and supervised classification parts—allowing for a synergistic interaction between self-supervised and supervised learning.

Also Read:

Experimental Validation and Impact

The researchers rigorously evaluated MoSAiC on two benchmark ESO datasets: BigEarthNet V2.0 and SENT12MS. These datasets contain co-registered Sentinel-1 and Sentinel-2 imagery with multi-label annotations. To simulate real-world data scarcity, experiments were conducted using only 10% of the available training data.

The results were compelling. MoSAiC consistently outperformed both fully supervised models (like various ResNet architectures and ConvNeXt-V2) and existing self-supervised contrastive learning baselines (like SimCLR and IaI). MoSAiC-1, in particular, achieved the highest performance across key metrics such as Macro Average Precision, Macro F1-score, Micro Average Precision, and Micro F1-score, demonstrating superior accuracy and robustness, especially in low-data conditions.

Further analysis using t-SNE projections showed that MoSAiC-1 produced much better-clustered feature representations compared to baselines. This means it could more effectively differentiate between spectrally similar classes, such as Broad-leaved Forest and Coniferous Forest, which are often challenging to distinguish. Per-class analysis using Hamming loss and Brier score also confirmed MoSAiC’s ability to generate more discriminative and semantically aligned features, particularly for classes that are visually very similar.

While MoSAiC represents a significant step forward, the authors acknowledge areas for future improvement, such as exploring more advanced fusion techniques (e.g., cross-attention) and deeper architectures. Nevertheless, MoSAiC provides a strong foundation for combining supervised and self-supervised contrastive learning in remote sensing, paving the way for more scalable and label-efficient Earth observation applications crucial for monitoring environmental changes.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -