TLDR: S2ML (Spatio-Spectral Mutual Learning) is a new framework for depth completion that addresses incomplete depth images from sensors. It uniquely integrates information from both the spatial and frequency domains, treating amplitude and phase spectra differently to leverage their distinct properties. By combining these insights with a spatial fusion module that captures local and global features, S2ML significantly improves the accuracy and robustness of depth map reconstruction, outperforming previous state-of-the-art methods on major benchmarks and under various challenging conditions.
Depth sensing is crucial for many 3D applications, from autonomous driving to robot navigation. However, the raw depth images captured by cameras often have missing or invalid depth values. These gaps can be caused by factors like reflective surfaces or challenging lighting conditions, severely limiting the use of these images in various tasks. Existing methods try to fill these gaps, known as depth completion, but often overlook the unique physical characteristics of these raw depth images, especially how missing data affects their frequency patterns.
A new research paper introduces a novel approach called Spatio-Spectral Mutual Learning (S2ML) to tackle this problem. The S2ML framework aims to combine the strengths of both spatial (pixel-based) and frequency (pattern-based) domains for more accurate depth completion. The core idea is to understand that invalid depth areas change how frequencies are distributed in an image. For instance, sharp edges from missing data introduce high-frequency components, while the overall smoothness of a scene is lost, affecting low-frequency components.
The S2ML method recognizes that the amplitude and phase spectra, which are components of an image in the frequency domain, have distinct properties and degradation patterns. The amplitude spectrum relates to the energy distribution across different spatial frequencies, while the phase spectrum preserves structural and semantic information. The researchers devised a specialized spectral fusion module that handles these two components differently. For the amplitude spectrum, it rescales low-frequency parts to restore the overall shape and filters high-frequency components to reduce artifacts. For the phase spectrum, it integrates semantic information from the RGB image’s phase spectrum using a pixel-to-pixel fusion method, guided by an attention mechanism.
Beyond the frequency domain, the framework also includes a spatial fusion module. This module takes the fused spectral features (transformed back into the spatial domain) and combines them with the original depth spatial features. It uses a Swin-Convolution module, which is effective at capturing both local details (like object edges) and global contextual relationships across the depth map. This dual-domain approach allows for a comprehensive refinement of the depth map, addressing both fine-grained details and broader scene structures.
The S2ML framework employs a gradual mutual representation and refinement process, where frequency and spatial domain fusions are conducted recursively. This iterative process progressively refines the depth features, leading to enhanced depth completion accuracy. The final high-dimensional features, which now contain integrated information from both depth and RGB modalities, are then used to predict the complete depth map.
Extensive experiments were conducted on two widely-used datasets, NYU-Depth V2 and SUN RGB-D, which include diverse indoor scenes and data from various sensors. The results show that S2ML significantly outperforms existing state-of-the-art methods. For example, it surpassed the CFormer method by 0.828 dB and 0.834 dB on the NYU-Depth V2 and SUN RGB-D datasets, respectively. The method also demonstrated strong robustness against common RGB image degradations like occlusions, noise, and poor lighting conditions, and proved effective in outdoor environments as well.
Also Read:
- FLASH: Advancing Real-Time LiDAR Super-Resolution with Dual-Domain Processing
- Advancing Hyperspectral Imaging with Hierarchical Spatial-Frequency Aggregation
The researchers also performed ablation studies to confirm the effectiveness of each component, particularly highlighting the importance of their distinct amplitude and phase fusion strategies. The choice of how many spatio-spectral fusion pairs to use was also analyzed, with two pairs found to offer the best balance between performance and complexity. This research establishes a new, strong baseline for depth completion, offering a robust and accurate solution for generating complete depth maps from incomplete sensor data. You can read the full paper for more details: S2ML: Spatio-Spectral Mutual Learning for Depth Completion.


