spot_img
HomeResearch & DevelopmentDistortion-Aware Video Inpainting for Immersive Omnidirectional Content

Distortion-Aware Video Inpainting for Immersive Omnidirectional Content

TLDR: DAOVI is a novel deep learning model for omnidirectional video inpainting that effectively removes unwanted objects while preserving spatial and temporal consistency. It addresses the unique geometric distortion of 360-degree videos through two key modules: Geodesic Flow-Consistent Image Propagation (GFCIP), which evaluates optical flow validity using geodesic distance, and Omnidirectional Depth-Assisted Feature Propagation (ODAFP), which propagates features using distortion-guided modulation, specialized convolutions, and depth maps. Experimental results demonstrate that DAOVI outperforms existing state-of-the-art methods in both quantitative and qualitative evaluations.

Omnidirectional videos, which capture a complete 360-degree view of surroundings, are becoming increasingly popular in applications like virtual reality (VR), augmented reality (AR), and remote sensing. While these videos offer an immersive experience, their wide field of view often leads to the capture of unwanted objects or regions. The process of removing these undesired elements and seamlessly filling the gaps is known as video inpainting.

However, a significant challenge arises because most existing video inpainting methods are designed for conventional, narrow field-of-view videos. They struggle to handle the unique geometric distortions inherent in omnidirectional videos, particularly those projected using the equirectangular projection (ERP) format. Applying these standard methods to 360-degree content often results in noticeable artifacts and visually unconvincing reconstructions, as they fail to account for the varying distortion across the spherical view.

To address this critical limitation, researchers Ryosuke Seshimo and Mariko Isogawa from Keio University, Japan, have introduced a novel deep learning model called Distortion-Aware Omnidirectional Video Inpainting (DAOVI). This innovative framework is specifically engineered to tackle the geometric distortion in omnidirectional videos, enabling the natural removal of objects while preserving both spatial and temporal consistency.

How DAOVI Works: Two Core Modules

DAOVI’s effectiveness stems from two primary modules, each designed to handle distortion in different aspects of the video inpainting process:

1. Geodesic Flow-Consistent Image Propagation (GFCIP): This module operates in the image space, focusing on propagating pixel values from adjacent frames. Traditional flow-based methods often use Euclidean distance to evaluate the reliability of optical flow (motion information). However, in omnidirectional videos, Euclidean distance in ERP pixel coordinates does not accurately represent true distances, especially near the poles where distortion is most severe. GFCIP overcomes this by evaluating flow validity using geodesic distance on a unit sphere. This ensures that only truly reliable motion vectors are used for initial pixel propagation, leading to more accurate and distortion-aware inpainting.

2. Omnidirectional Depth-Assisted Feature Propagation (ODAFP): Working in the feature space (a more abstract representation of video content), ODAFP propagates information from adjacent frames using deformable convolutional networks (DCN). To specifically address ERP distortion, this module incorporates several key innovations. It utilizes convolutions and padding schemes, such as circular padding, that are tailored for 360-degree images, maintaining continuity across the video’s edges and poles. Furthermore, ODAFP employs a distortion map, which quantifies the amount of ERP distortion at each pixel, to weight the DCN offsets and modulation masks. This allows the propagation to adapt dynamically to the spatially varying distortion. Crucially, ODAFP also integrates depth maps as an additional input. This depth guidance provides a more stable and reliable source of information compared to relying solely on optical flow, which can be prone to errors in masked or highly dynamic regions.

Also Read:

Performance and Impact

The DAOVI model was rigorously evaluated on the ODV360 omnidirectional video dataset and compared against several state-of-the-art video inpainting methods, including FuseFormer, STTN, and ProPainter. The experimental results demonstrated that DAOVI consistently outperformed these existing methods across all quantitative metrics, including PSNR, SSIM, and specialized omnidirectional metrics like WS-PSNR and WS-SSIM, as well as perceptual quality (VFID).

Qualitative comparisons further highlighted DAOVI’s superiority, producing visually plausible results with improved structural consistency and significantly fewer artifacts compared to methods not designed for omnidirectional content. This indicates that by explicitly accounting for geometric distortion, DAOVI can generate much more realistic and seamless video completions.

In conclusion, DAOVI represents a significant advancement in the field of video inpainting for immersive media. By intelligently addressing the unique challenges posed by omnidirectional video distortion, it offers a robust and effective solution for content creators and researchers alike. For more technical details, you can refer to the full research paper here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -