TLDR: LeMiCa is a novel, training-free framework that significantly accelerates diffusion-based video generation while maintaining high visual quality. Unlike traditional caching methods that focus on local errors, LeMiCa uses a global outcome-aware error formulation and a Lexicographic Minimax Path Optimization strategy on a directed acyclic graph. This approach explicitly bounds worst-case errors, leading to improved global content consistency and style across generated frames, achieving up to 2.9x speedup and superior quality compared to prior techniques.
Creating high-quality videos with artificial intelligence has seen incredible progress, especially with the rise of diffusion models. These models can generate stunning visuals, but they often come with a significant drawback: they are incredibly demanding on computational resources, requiring a lot of memory, processing power, and time to generate even short videos. This makes them challenging to use in applications where speed is crucial, like interactive tools.
To tackle this challenge, researchers have explored various methods to speed up these models. Some approaches involve redesigning the model’s architecture or retraining it on vast datasets, but these can be costly and complex. A more appealing alternative is using ‘caching mechanisms,’ which essentially involve reusing parts of the model’s work from previous steps to avoid redundant calculations. This method doesn’t require retraining the model, making it a more efficient solution.
However, existing caching strategies aren’t perfect. They often focus on minimizing small, local errors between consecutive steps in the video generation process. While this sounds logical, it overlooks how these small errors can accumulate over time, leading to noticeable degradation in the overall video quality and consistency. Imagine building a long chain: if each link has a tiny flaw, the entire chain might eventually break. This ‘local greedy’ approach can result in videos that deviate from the original quality or lose fine details.
Introducing LeMiCa: A Smarter Way to Cache
A new framework called LeMiCa, which stands for Lexicographic Minimax Path Caching, offers a fresh perspective on this problem. Developed by researchers from Data Science & Artificial Intelligence Research Institute and Unicom Data Intelligence, LeMiCa is a training-free and highly efficient acceleration framework specifically designed for diffusion-based video generation. Instead of focusing on local errors, LeMiCa takes a ‘global outcome-aware’ approach.
LeMiCa rethinks cache scheduling as a global path planning problem. It constructs a ‘directed acyclic graph’ (DAG) where each possible caching decision is represented as an edge, weighted by its potential impact on the final video quality. This graph is built offline, using various prompts and full video generation trajectories to understand the long-term effects of caching at different points. This helps LeMiCa understand that errors in early stages of video generation can have a much larger impact than errors in later stages.
The core of LeMiCa’s innovation lies in its ‘Lexicographic Minimax Path Optimization’ strategy. Instead of simply minimizing the total error (which might still allow for a few very large errors), this strategy explicitly aims to bound the worst-case error along any path. It finds the path that has the smallest maximum error. If multiple paths have the same maximum error, it then compares the next largest error, and so on. This ensures that the generated video maintains high global content and style consistency, preventing significant degradation caused by unstable local caching decisions.
Also Read:
- Reg-DPO: A New Framework for Stable and High-Quality Video Generation
- FLoC: Efficient Visual Token Compression for Extended Video Analysis
Impressive Performance and Versatility
Extensive experiments on popular text-to-video benchmarks, including Open-Sora, Latte, and CogVideoX, demonstrate LeMiCa’s superior performance. It delivers dual improvements in both inference speed and generation quality. For instance, LeMiCa achieved a remarkable 2.9 times speedup on the Latte model and an LPIPS score of 0.05 on Open-Sora, significantly outperforming previous caching techniques like TeaCache.
The framework offers two variants: LeMiCa-slow, which prioritizes visual fidelity, and LeMiCa-fast, which focuses on maximizing inference speed. Both variants consistently outperform existing methods, proving LeMiCa’s ability to balance quality and speed effectively. Importantly, these gains come with minimal perceived quality degradation, making LeMiCa a robust and adaptable solution for accelerating video generation.
LeMiCa also shows strong generalization capabilities, performing well even on out-of-distribution datasets and across different denoising trajectories. Its offline graph construction process is highly efficient, incurring negligible overhead while yielding substantial acceleration during actual video generation.
This innovative approach provides a strong foundation for future research in efficient and reliable video synthesis, and its principles could potentially extend to other generative modeling domains like 3D, multi-view, or multi-modal generation. The code for LeMiCa is publicly available for researchers and developers to explore and build upon. You can find the full research paper here.


