NVIDIA Unveils Expansive Open-Source Speech AI Dataset and Advanced Models for European Languages

TLDR: NVIDIA has launched ‘Granary,’ the largest open-source speech AI dataset for European languages, alongside two state-of-the-art AI models, Canary-1b-v2 and Parakeet-tdt-0.6b-v3. This initiative aims to significantly advance speech recognition and translation across 25 European languages, enabling developers to build more accurate and scalable AI applications.

NVIDIA Corporation has announced a significant leap in artificial intelligence for European languages with the release of ‘Granary,’ a massive open-source speech AI dataset, and two accompanying cutting-edge AI models. The announcement, made on Friday, August 15, 2025, marks a pivotal moment for multilingual AI development.

The ‘Granary’ dataset is touted as one of the largest speech corpora available for European languages, encompassing approximately 1 million hours of multilingual audio. This includes around 650,000 hours specifically for speech recognition and over 350,000 hours for speech translation. The dataset covers 25 European languages, including nearly all of the European Union’s 24 official languages, along with Russian and Ukrainian.

Accompanying ‘Granary’ are two powerful AI models: NVIDIA Canary-1b-v2 and NVIDIA Parakeet-tdt-0.6b-v3. Canary-1b-v2 is specifically optimized for transcribing European languages, leveraging the extensive ‘Granary’ dataset. Parakeet-tdt-0.6b-v3, on the other hand, is engineered for real-time transcription, supporting all languages included in ‘Granary.’

This release is set to empower developers globally. As stated by NVIDIA in a press release, ‘These tools will help developers scale AI applications globally, providing fast and accurate speech capabilities for real-world use cases like multilingual chatbots, voice-based customer service agents, and near-instant translation tools.’ The initiative aims to foster high-quality speech recognition and translation AI, making it easier for developers to create production-scale applications.

Also Read:

The development of the ‘Granary’ dataset was a collaborative effort, with the NVIDIA speech AI team working alongside researchers from Carnegie Mellon University and Fondazione Bruno Kessler. They utilized an innovative processing pipeline powered by the NVIDIA NeMo Speech Data Processor toolkit, transforming unlabelled audio into structured, high-quality data. This meticulous process ensures that ‘Granary’ provides clean, ready-to-use data, giving developers a significant head start in building models for transcription and translation tasks across the diverse linguistic landscape of Europe.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

NVIDIA Unveils Expansive Open-Source Speech AI Dataset and Advanced Models for European Languages

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Gabriel Marketing Group Introduces Generative Engine Optimization (GEO) Content Services for B2B Technology Companies Amidst AI Evolution

OpenAI Unveils ‘Friendlier’ GPT-5.1 for ChatGPT, Emphasizing Enhanced User Experience and Adaptive Intelligence

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

OneShield Achieves Landmark Registration Under Cloud Security Alliance AI Controls Matrix, Setting New Industry Standard

SeedAI Leads Utah’s Proactive Initiative for Ethical AI Integration in Business

Bahrain Commended for AI Preparedness in New UNESCO Global Report

U.S. Air Force Secures Skydio Drone Technology for Enhanced Autonomous Operations

Malaysia Forges Ahead with AI Development, Prioritizing Governance and Ethical Frameworks

Contractify Honored as Top Contract Management Solution Provider for 2025 by LegalTech Breakthrough Awards

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

EPAM Honored with Microsoft’s 2025 Innovate with Azure AI Platform Partner of the Year Award for Pioneering AI Solutions

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Netherlands Unveils Ambitious AI Strategy to Shape Global Governance Frameworks

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Prepify AI and ZoraSafe, Inc. Honored with ‘Panelists’ Choice’ Awards at UF Innovate’s GatorPitch in Miami

Subscribe to get the latest news and updates