TLDR: SuryaBench is a new, high-resolution, machine learning-ready dataset derived from NASA’s Solar Dynamics Observatory (SDO). It provides preprocessed solar imagery (AIA and HMI) spanning a solar cycle, along with six benchmark application datasets for tasks like solar flare prediction, active region segmentation, and solar wind forecasting. The dataset aims to standardize data, enhance reproducibility, and accelerate AI model development for heliophysics and space weather.
The Sun, our closest star, constantly emits energy and particles that create what we know as space weather. This space weather can have significant impacts on Earth’s critical infrastructure, affecting everything from communication systems and navigation to power grids. Understanding and predicting these solar phenomena is therefore crucial for safeguarding our technological society. However, leveraging the vast amounts of solar observational data for machine learning applications has historically presented challenges due to the need for specialized preprocessing and homogenization.
Understanding the Sun’s Influence: Why SuryaBench Matters
A new benchmark dataset, SuryaBench, has been introduced to bridge the gap between solar physics, machine learning, and operational space weather forecasting. This high-resolution, machine learning-ready dataset is derived from NASA’s Solar Dynamics Observatory (SDO), a mission that continuously captures extensive, high-quality solar data. SuryaBench aims to standardize data collection, enhance reproducibility, and accelerate the development of AI-driven models for critical space weather prediction tasks.
The Core of SuryaBench: High-Resolution Solar Data
SuryaBench is notable for preserving the full 4096×4096 native spatial resolution of SDO observations, offering a consistent 12-minute temporal cadence. This makes it the largest curated and homogenized dataset of its kind to date, enabling high-fidelity analysis for data-driven heliophysics research. The dataset includes processed imagery from SDO’s Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI), spanning a solar cycle from May 2010 to July 2024.
To ensure the data is suitable for machine learning tasks, it has undergone extensive preprocessing. This includes corrections for spacecraft roll angles, orbital adjustments, exposure normalization, and compensation for instrument degradation. For instance, AIA data, initially at Level 1, is promoted to Level 1.5, which involves aligning the solar north-south axis, scaling images to a common pixel size, centering the Sun, and normalizing exposure times. HMI data, though already Level 1.5, is re-projected to align spatially with AIA images. Both datasets are also corrected for the SDO’s elliptical orbit to maintain a fixed solar disk size. Furthermore, a rigorous temporal alignment process ensures that AIA and HMI data correspond accurately at a 12-minute cadence, with quality checks to exclude problematic data.
Applications for Space Weather and Solar Physics
Beyond the core SDO dataset, SuryaBench provides auxiliary application benchmark datasets for six key tasks in heliophysics, each with rigorous evaluation protocols and baseline implementations of state-of-the-art machine learning architectures:
-
Active Region Segmentation: This task involves identifying active regions (ARs) that contain polarity inversion lines (PILs), which are critical sites for energy storage and release and are strongly associated with solar eruptive activity. The dataset provides 2D binary masks of these regions.
-
Active Region Emergence Forecasting: This focuses on predicting the continuum intensity of emerging active regions. It involves tracking areas of magnetic flux, Doppler velocity, and continuum intensity, then creating acoustic power maps and downsampling data into timelines for prediction.
-
Coronal Field Extrapolation: This task models the 3D structure of the coronal magnetic field, which is essential for understanding how solar magnetism shapes the extended solar atmosphere. It uses data from ADAPT-WSA simulations, encoded in spherical harmonic coefficients.
-
Solar Flare Prediction: Aims to forecast the occurrence of strong solar flares (M- or X-class) within a 24-hour window. Labels are based on both maximum flare intensity and cumulative flare intensity, addressing the challenge of class imbalance due to the scarcity of strong events.
-
Solar Wind Speed Estimation: This involves predicting the solar wind speed at the L1 point in the Sun-Earth system. Accurate solar wind forecasting is vital for mitigating adverse effects on satellites and power grids.
-
Solar EUV Spectra Prediction: Focuses on forecasting Extreme Ultraviolet (EUV) irradiance across a spectrum of 1343 channels. EUV irradiance plays a key role in shaping Earth’s ionospheric and thermospheric conditions, influencing satellite drag and communication systems.
The dataset also acknowledges important considerations for solar data, such as solar rotation, the 11-year solar cycle, the Earth’s ecliptic angle, and the limitation of observations to only the visible side of the Sun.
Also Read:
- Surya: An AI Breakthrough for Understanding Solar Activity
- Evaluating Feature Acquisition: Introducing AFABench
Looking Ahead
SuryaBench represents a significant step forward by providing a unified, standardized, and high-fidelity data resource for diverse heliophysics tasks. By integrating data from AIA and HMI with consistent preprocessing, it offers a clear view of solar activity, enhancing data quality and reproducibility. The inclusion of curated benchmarks and baseline model results for various tasks underscores its value to the machine learning community, setting clear performance baselines and encouraging the development of advanced models. This dataset is publicly available on Huggingface as a data collection called Suryabench, and the code for its preparation and baseline model training can be found on GitHub. For more detailed information, you can refer to the full research paper: SuryaBench: Benchmark Dataset for Advancing Machine Learning in Heliophysics and Space Weather Prediction.


