spot_img
HomeNews & Current EventsGenmo AI Unveils Mochi 1: A New Era for...

Genmo AI Unveils Mochi 1: A New Era for Open-Source AI Video Generation with Advanced Fidelity and Control

TLDR: Genmo AI has launched Mochi 1, an open-source, 10-billion-parameter diffusion model built on its novel Asymmetric Diffusion Transformer (AsymmDiT) architecture. This update sets a new standard for AI-powered video creation, offering unparalleled prompt adherence, high-fidelity motion at 30 frames per second for up to 5.4 seconds, and realistic physics simulation. Mochi 1 is released under the Apache 2.0 license, fostering community collaboration, and Genmo AI plans future enhancements including Mochi 1 HD for 720p resolution and advanced motion controls.

Genmo AI, a leading frontier AI laboratory, has announced the release of Mochi 1, a groundbreaking open-source video generation model designed to significantly enhance user creativity and control in AI-powered video production. This update introduces a 10-billion-parameter diffusion model, making it the largest video generative model ever openly released, and is built upon Genmo’s innovative Asymmetric Diffusion Transformer (AsymmDiT) architecture. The company emphasizes its commitment to open-source innovation, releasing Mochi 1 under the Apache 2.0 license to encourage broad adoption, research, and community collaboration.

Mochi 1 distinguishes itself with several key advancements. It boasts ‘superior prompt adherence,’ demonstrating exceptional alignment with textual prompts. This allows users precise control over generated video content, including characters, settings, and actions, ensuring outputs accurately reflect detailed instructions. Benchmarking indicates strong performance, with evaluations using Gemini-1.5-Pro-002 confirming its ability to translate complex textual descriptions into visual narratives. Furthermore, the model delivers ‘unmatched motion quality,’ generating smooth and realistic videos at 30 frames per second for durations up to 5.4 seconds. It achieves high temporal coherence and realistic motion dynamics, capable of simulating complex physics such as fluid dynamics, fur, and hair, and producing consistent, fluid human actions that are beginning to ‘cross the uncanny valley.’

Underpinning Mochi 1’s capabilities is its sophisticated architecture. The AsymmDiT efficiently processes user prompts alongside compressed video tokens. It employs a single T5-XXL language model for encoding prompts, jointly reasoning over a context window of 44,520 video tokens with full 3D attention. To localize each token, learnable rotary positional embeddings (RoPE) are extended to three dimensions, allowing the network to end-to-end learn mixing frequencies for space and time axes. Efficiency is further enhanced through features like SwiGLU feedforward layers, query-key normalization for stability, and sandwich normalization to manage internal activations. Accompanying Mochi 1 is an open-sourced video VAE (Variational Autoencoder) that causally compresses videos to a 96x smaller size, utilizing an 8×8 spatial and 6x temporal compression into a 12-channel latent space.

Genmo AI is not stopping here, with plans already in motion for future developments. An upcoming ‘Mochi 1 HD’ upgrade is set to support 720p video generation, promising enhanced fidelity and even smoother motion, specifically addressing edge cases like warping in complex scenes. Beyond resolution improvements, the company is actively working on image-to-video capabilities and improving the overall controllability and steerability of the models, aiming to provide users with even more precise command over their creative outputs. Users can experience Mochi 1 firsthand through a hosted playground available on Genmo AI’s website, offering a user-friendly environment to generate videos from their own prompts.

Also Read:

This release marks a significant step forward in democratizing access to state-of-the-art AI video generation, empowering creators, developers, and researchers with powerful, flexible, and openly available tools.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -