spot_img
HomeResearch & DevelopmentAdvancing Video Compression with EHVC: A New Approach to...

Advancing Video Compression with EHVC: A New Approach to Neural Video Codecs

TLDR: EHVC is a novel neural video codec that significantly improves compression efficiency and stability by introducing an efficient hierarchical reference and quality structure. It addresses limitations in existing neural video codecs through a hierarchical multi-reference scheme, a lookahead strategy, and a layer-wise quantization scale with random quality training. Experimental results show EHVC outperforms state-of-the-art traditional and neural video codecs in rate-distortion performance and robustness.

A new research paper introduces EHVC, an Efficient Hierarchical Neural Video Codec, marking a significant advancement in video compression technology. This work, detailed in the paper available here, addresses critical challenges in existing neural video codecs (NVCs) by focusing on the fundamental design of how video frames are referenced and how their quality is managed.

Traditional video codecs have benefited from decades of optimization in their hierarchical structures, which dictate how frames relate to each other in terms of prediction and quality. However, neural video codecs, despite their powerful end-to-end learning capabilities, have often relied on implicit hierarchical structures. These implicit structures can be inefficient, especially when dealing with long video sequences, leading to issues like error propagation and inconsistent quality over time.

The team behind EHVC, including Junqi Liao, Yaojun Wu, Chaoyi Lin, Zhipin Deng, Li Li, Dong Liu, and Xiaoyan Sun, proposes three key innovations to overcome these limitations:

Hierarchical Multi-Reference Scheme

One of the core problems identified is the “reference-quality mismatch” in NVCs. In traditional codecs, high-quality frames are strategically used as references for subsequent frames because they provide more reliable information. EHVC adopts this principle by designing an explicit hierarchical reference structure that aligns with the quality structure. It defines “key frames” as those with high reconstruction quality. Other frames can then reference not only their immediate preceding frame but also a previous high-quality key frame. This dual-reference approach creates a more stable and robust reference structure. For instance, if an adjacent frame is of low quality or corrupted, the key frame reference can still provide crucial, high-quality information, preventing widespread quality degradation. Experiments demonstrate that while other NVCs suffer prolonged quality drops after a corrupted frame, EHVC quickly recovers, showcasing its enhanced stability.

Lookahead Strategy for Enhanced Quality

To further refine the quality structure, EHVC introduces a novel lookahead strategy. This allows the encoder to incorporate context from a future frame – specifically, the very next frame – into its decision-making process. By having this forward-looking information, the network can learn a more efficient and predictive quality structure. This is achieved while maintaining a low-delay setting, meaning the codec doesn’t need to wait for many future frames, making it practical for real-world applications.

Also Read:

Layer-wise Quantization Scale with Random Quality Training

Drawing inspiration from traditional video codecs, which use both hierarchical Lagrange multipliers and quantization parameters to control quality, EHVC introduces learnable layer-wise quantization scales. These scales are applied differently to frames based on their position within the hierarchical structure. To ensure the quality structure remains stable and adaptable during inference, a random quality training strategy is also implemented. This involves randomly scaling the quantization scale of the first key frame during training, which helps the model become more resilient to fluctuations in quality and generalize better to diverse video content.

The performance of EHVC is impressive. In extensive experiments across various video datasets, EHVC consistently outperforms both state-of-the-art traditional video codecs, such as VTM-23.4 (the reference software for H.266/VVC), and leading neural video codecs like DCVC-FM. For example, under a common intra-period 32 configuration, EHVC achieves an average of 17.14% bitrate saving over VTM-23.4. More notably, it saves 10.98% more bitrate than DCVC-FM when compared against VTM-23.4. In a more challenging scenario with intra-period -1 (where only the very first frame is intra-coded), EHVC still delivers an average of 17.04% bitrate saving over VTM-23.4 and an even greater 12.88% more bitrate saving than DCVC-FM.

Beyond raw compression numbers, EHVC also demonstrates superior stability in its quality and reference structures. It exhibits significantly less quality degradation within context generation refresh cycles compared to previous NVCs. While EHVC does have a slightly higher computational complexity than some prior neural codecs, the substantial improvements in compression efficiency and robustness make this a worthwhile trade-off.

This research highlights the importance of explicit hierarchical design in neural video coding, proving that by carefully integrating principles from traditional codecs with the power of deep learning, it’s possible to achieve unprecedented levels of video compression performance and stability.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -