Advancing Video Compression with EHVC: A New Approach to Neural Video Codecs

TLDR: EHVC is a novel neural video codec that significantly improves compression efficiency and stability by introducing an efficient hierarchical reference and quality structure. It addresses limitations in existing neural video codecs through a hierarchical multi-reference scheme, a lookahead strategy, and a layer-wise quantization scale with random quality training. Experimental results show EHVC outperforms state-of-the-art traditional and neural video codecs in rate-distortion performance and robustness.

A new research paper introduces EHVC, an Efficient Hierarchical Neural Video Codec, marking a significant advancement in video compression technology. This work, detailed in the paper available here, addresses critical challenges in existing neural video codecs (NVCs) by focusing on the fundamental design of how video frames are referenced and how their quality is managed.

Traditional video codecs have benefited from decades of optimization in their hierarchical structures, which dictate how frames relate to each other in terms of prediction and quality. However, neural video codecs, despite their powerful end-to-end learning capabilities, have often relied on implicit hierarchical structures. These implicit structures can be inefficient, especially when dealing with long video sequences, leading to issues like error propagation and inconsistent quality over time.

The team behind EHVC, including Junqi Liao, Yaojun Wu, Chaoyi Lin, Zhipin Deng, Li Li, Dong Liu, and Xiaoyan Sun, proposes three key innovations to overcome these limitations:

Hierarchical Multi-Reference Scheme

One of the core problems identified is the “reference-quality mismatch” in NVCs. In traditional codecs, high-quality frames are strategically used as references for subsequent frames because they provide more reliable information. EHVC adopts this principle by designing an explicit hierarchical reference structure that aligns with the quality structure. It defines “key frames” as those with high reconstruction quality. Other frames can then reference not only their immediate preceding frame but also a previous high-quality key frame. This dual-reference approach creates a more stable and robust reference structure. For instance, if an adjacent frame is of low quality or corrupted, the key frame reference can still provide crucial, high-quality information, preventing widespread quality degradation. Experiments demonstrate that while other NVCs suffer prolonged quality drops after a corrupted frame, EHVC quickly recovers, showcasing its enhanced stability.

Lookahead Strategy for Enhanced Quality

To further refine the quality structure, EHVC introduces a novel lookahead strategy. This allows the encoder to incorporate context from a future frame – specifically, the very next frame – into its decision-making process. By having this forward-looking information, the network can learn a more efficient and predictive quality structure. This is achieved while maintaining a low-delay setting, meaning the codec doesn’t need to wait for many future frames, making it practical for real-world applications.

Also Read:

Layer-wise Quantization Scale with Random Quality Training

Drawing inspiration from traditional video codecs, which use both hierarchical Lagrange multipliers and quantization parameters to control quality, EHVC introduces learnable layer-wise quantization scales. These scales are applied differently to frames based on their position within the hierarchical structure. To ensure the quality structure remains stable and adaptable during inference, a random quality training strategy is also implemented. This involves randomly scaling the quantization scale of the first key frame during training, which helps the model become more resilient to fluctuations in quality and generalize better to diverse video content.

The performance of EHVC is impressive. In extensive experiments across various video datasets, EHVC consistently outperforms both state-of-the-art traditional video codecs, such as VTM-23.4 (the reference software for H.266/VVC), and leading neural video codecs like DCVC-FM. For example, under a common intra-period 32 configuration, EHVC achieves an average of 17.14% bitrate saving over VTM-23.4. More notably, it saves 10.98% more bitrate than DCVC-FM when compared against VTM-23.4. In a more challenging scenario with intra-period -1 (where only the very first frame is intra-coded), EHVC still delivers an average of 17.04% bitrate saving over VTM-23.4 and an even greater 12.88% more bitrate saving than DCVC-FM.

Beyond raw compression numbers, EHVC also demonstrates superior stability in its quality and reference structures. It exhibits significantly less quality degradation within context generation refresh cycles compared to previous NVCs. While EHVC does have a slightly higher computational complexity than some prior neural codecs, the substantial improvements in compression efficiency and robustness make this a worthwhile trade-off.

This research highlights the importance of explicit hierarchical design in neural video coding, proving that by carefully integrating principles from traditional codecs with the power of deep learning, it’s possible to achieve unprecedented levels of video compression performance and stability.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Advancing Video Compression with EHVC: A New Approach to Neural Video Codecs

Hierarchical Multi-Reference Scheme

Lookahead Strategy for Enhanced Quality

Layer-wise Quantization Scale with Random Quality Training

Gen AI News and Updates

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates