New Benchmarks in ASR for Impaired Speech: Insights from the Interspeech 2025 Challenge

TLDR: The Interspeech 2025 Speech Accessibility Project (SAP) Challenge aimed to improve Automatic Speech Recognition (ASR) for individuals with speech disabilities. Utilizing over 400 hours of diverse impaired speech data, the challenge evaluated systems based on Word Error Rate (WER) and Semantic Score (SemScore). The top team achieved a WER of 8.11% and a SemScore of 88.44%, significantly outperforming the Whisper-large-v2 baseline. The results highlight the effectiveness of fine-tuning existing ASR models on large, specialized datasets and advanced techniques like audio segmentation and error correction in making ASR more accessible.

Automatic Speech Recognition (ASR) systems have seen incredible progress over the last decade, largely due to advancements in deep neural networks and the availability of vast amounts of training data. These systems are now integrated into many daily applications. However, a significant challenge remains: ASR performance for individuals with speech disabilities still lags behind, primarily because there isn’t enough public training data that accurately represents diverse speech impairments.

To address this critical gap, the Interspeech 2025 Speech Accessibility Project (SAP) Challenge was launched. This pioneering initiative utilized over 400 hours of unique SAP data, collected and transcribed from more than 500 individuals with various speech disabilities. The challenge was hosted on EvalAI, employing a remote evaluation pipeline to ensure data privacy and secure testing.

The SAP Challenge evaluated submissions based on two primary metrics: Word Error Rate (WER) and Semantic Score (SemScore). WER is a traditional measure of ASR accuracy, quantifying how many words are incorrectly recognized compared to the original transcript. SemScore, on the other hand, focuses on the semantic fidelity of the transcription, assessing how well the system preserves the intended meaning and context of the speech. This dual evaluation approach provides a more comprehensive understanding of an ASR system’s real-world utility for impaired speech.

The dataset used for the challenge, known as SAP-240430, comprises approximately 415 hours of impaired speech from 524 participants. These individuals were diagnosed with one of five etiologies: Parkinson’s Disease (PD), Down Syndrome (DS), amyotrophic lateral sclerosis (ALS), cerebral palsy (CP), or stroke. A significant portion of the dataset, particularly in the training and test sets, consists of speech from individuals with Parkinson’s Disease. The data was carefully partitioned into training, development, and test sets, ensuring no speaker overlap between them. The test set was further divided into public (Test1) and private (Test2) subsets, with final rankings based on the private leaderboard results.

For the challenge, a baseline system was established using the open-source Whisper-large-v2 model. This model, part of OpenAI’s Whisper family, is a transformer-based encoder-decoder architecture pre-trained on an extensive 680,000 hours of audio data. The Whisper-large-v2 model achieved a WER of 14.97% on Test1 and 17.82% on Test2, and SemScores of 82.26% and 75.85% respectively, serving as the benchmark for participants.

The SAP Challenge attracted 22 teams who submitted valid results. A remarkable outcome was that 12 out of these 22 teams surpassed the whisper-large-v2 baseline in terms of WER, and 17 teams achieved a higher SemScore. The top-performing team achieved an impressive WER of 8.11% and a SemScore of 88.44% simultaneously. These results represent significant improvements over the baseline, demonstrating the effectiveness of fine-tuning existing speech foundation models on specialized impaired speech data.

Further analysis revealed a strong negative correlation between WER and SemScore, indicating that as word errors decrease, semantic understanding generally improves. Interestingly, some systems showed a preference for transcribing disfluencies (like hesitations or self-corrections), while others tended to omit them. Etiology-specific analysis showed that while Parkinson’s Disease speech dominated the dataset, the relative improvement for ALS speakers was larger, possibly due to less variability in their speech patterns within the Test2 split.

The success of the top teams can be attributed to several advanced strategies. Many teams built upon existing large foundation ASR models like NVIDIA’s Parakeet and OpenAI’s Whisper, fine-tuning them with the SAP data. Other key techniques included audio segmentation (dividing long audio files into shorter clips), model merging (combining multiple checkpoints for robustness), hallucination reduction (addressing instances where the ASR system generates non-existent words), curriculum learning, post-ASR error correction using large language models, and personalization strategies that adapt to individual speaker characteristics.

Also Read:

In conclusion, the Interspeech 2025 Speech Accessibility Project Challenge has made significant strides in advancing ASR for individuals with speech disorders. The challenge underscored the critical role of large-scale, speaker-independent datasets of impaired speech in improving ASR performance and generalization for unseen speakers. The achievements of the participating teams set new benchmarks and highlight promising avenues for future research, particularly in exploring within- and across-group similarities and differences among various etiology-based or impairment severity-based populations. This effort aims to foster the development of more inclusive and effective speech recognition technologies accessible to a broader audience. You can find the full research paper here: The Interspeech 2025 Speech Accessibility Project Challenge.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

New Benchmarks in ASR for Impaired Speech: Insights from the Interspeech 2025 Challenge

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates