TLDR: Microsoft Research Asia has introduced NUWA-XL, an advanced multimodal generative AI model capable of producing 16-minute video content from just 11 descriptive sentences. This innovation leverages a ‘diffusion over diffusion’ architecture to ensure efficiency and continuity in content creation, marking a significant leap in AI-powered animation and video production.
Microsoft Research Asia has announced the development of NUWA-XL, a groundbreaking multimodal automatic generative artificial intelligence model. This new AI is capable of generating extensive video content, specifically up to 16 minutes in length, using a mere 11 sets of descriptive sentences. This advancement represents a substantial step forward in AI-powered content creation, particularly for applications like animation production.
The NUWA-XL model is built upon an innovative ‘diffusion over diffusion’ operational architecture. This sophisticated design incorporates a global diffusion model responsible for generating key frames across the entire temporal span of a video. Complementing this, a local diffusion model then meticulously adds adjacent content to these key frames. This dual-diffusion approach is crucial for accelerating the overall content generation efficiency while simultaneously ensuring the continuity and integrity of the produced video content.
This latest iteration follows Microsoft Research Asia’s earlier successes in multimodal AI. In 2021, the original NUWA (Nuwa) model was introduced, demonstrating the ability to generate text, images, and video content from natural language descriptions. Subsequently, the NUWA-Infinity version further enhanced the resolution capabilities for generated images and videos. NUWA-XL builds on these foundations, pushing the boundaries of video length and coherence from textual input.
Also Read:
- Microsoft Commits $80 Billion to AI Development, Intensifying Rivalry with OpenAI and Google
- Breakthrough: Generative AI Powers Minecraft at 30 FPS Without a Game Engine
Industry observers anticipate that the NUWA-XL model will significantly impact various sectors, most notably by streamlining and accelerating the production of animation and other video-based content.


