TLDR: A new self-supervised machine learning framework, SIT-FUSE, is presented for detecting and mapping harmful algal bloom (HAB) severity and speciation using multi-sensor satellite data. By fusing data from instruments like VIIRS, MODIS, Sentinel-3, PACE, and TROPOMI SIF, SIT-FUSE generates HAB products without requiring extensive labeled datasets. The framework employs self-supervised representation learning and hierarchical deep clustering, validated against in-situ data from the Gulf of Mexico and Southern California, showing strong agreement and potential for scalable, efficient HAB monitoring.
Harmful algal blooms, often called HABs, are significant outbreaks of algae in oceans and lakes that can pose serious threats to human health, marine life, and local economies. These blooms can appear and spread rapidly, making timely and accurate monitoring crucial for public safety and environmental protection. Traditionally, monitoring HABs has been a complex and resource-intensive task, relying on satellite data that often requires unique algorithms and extensive labeled datasets for each type of instrument. This process is both time-consuming and costly to develop and maintain.
A new approach, detailed in the research paper Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning by Nicholas LaHaye, Kelly M. Luis, and Michelle M. Gierach, introduces an innovative solution called SIT-FUSE. This framework leverages artificial intelligence, specifically self-supervised learning and hierarchical deep learning, to automatically detect, track, and map HABs with greater efficiency and flexibility.
Overcoming Traditional Challenges with SIT-FUSE
SIT-FUSE stands out because it can generate HAB severity and speciation products without needing vast, pre-labeled datasets for each satellite instrument. This is a major advantage in environments where labeled data is scarce. The system achieves this by fusing reflectance data from various operational instruments like VIIRS, MODIS, Sentinel-3, and NASA’s PACE, along with TROPOMI’s solar-induced fluorescence (SIF) measurements. This multi-sensor data integration allows for more comprehensive and frequent observations of bloom events.
The core of SIT-FUSE involves several advanced techniques:
-
Self-Supervised Representation Learning: Instead of relying on human-labeled data, the system learns meaningful patterns directly from large amounts of unlabeled satellite imagery. This makes it highly adaptable to different sensors and resolutions.
-
Hierarchical Deep Clustering: This method segments phytoplankton concentrations and species into understandable classes. It uses a tree-like structure, starting with broad categories and then refining them into more specific subclasses, allowing for detailed analysis at different levels.
-
Context Assignment: After segmentation, the system assigns real-world meaning to these clusters by validating them against in-situ (on-site) data collected from areas like the Gulf of Mexico and Southern California. This step translates abstract clusters into identifiable HAB types and severity levels.
-
Data Stream Combination: SIT-FUSE combines outputs from individual instruments and fused datasets (e.g., ocean color instruments plus TROPOMI SIF) to maximize coverage and provide a more complete picture of HABs daily and monthly.
Also Read:
- Enhancing Maritime Shaft Power Prediction with Cross-Frequency Transfer Learning
- Optimizing Container Stowage: A Deep Dive into Reinforcement Learning Benchmarks
Demonstrated Success and Future Potential
The framework was tested in the Gulf of Mexico and Southern California, using data from 2018-2019 and more recent PACE data from 2024-2025. Results showed strong agreement between SIT-FUSE’s predictions and in-situ measurements for total phytoplankton, as well as specific harmful species like Karenia brevis, Alexandrium spp., and Pseudo-nitzschia spp. Qualitative comparisons with existing operational tools like the California Harmful Algae Risk Mapping (C-HARM) system and chlorophyll-a products also indicated significant alignment.
The authors highlight that SIT-FUSE’s flexibility means it’s not tied to a particular remote sensing instrument. It can identify and track geophysical objects across different datasets and can be applied to various environmental challenges beyond HABs, such as wildfire monitoring. Its ability to work effectively in data-scarce environments is particularly valuable.
Looking ahead, the research team plans to expand SIT-FUSE’s capabilities by incorporating per-pixel uncertainty measurements, extending its application to the entire U.S. coastline and inland water bodies, and integrating data from additional instruments like EMIT and GOES satellites. This ongoing work aims to create a denser, more dynamic, and highly specific monitoring system for HABs, ultimately enhancing our ability to protect public health and ecosystems from these harmful events.


