spot_img
HomeResearch & DevelopmentNew Benchmarks in ASR for Impaired Speech: Insights from...

New Benchmarks in ASR for Impaired Speech: Insights from the Interspeech 2025 Challenge

TLDR: The Interspeech 2025 Speech Accessibility Project (SAP) Challenge aimed to improve Automatic Speech Recognition (ASR) for individuals with speech disabilities. Utilizing over 400 hours of diverse impaired speech data, the challenge evaluated systems based on Word Error Rate (WER) and Semantic Score (SemScore). The top team achieved a WER of 8.11% and a SemScore of 88.44%, significantly outperforming the Whisper-large-v2 baseline. The results highlight the effectiveness of fine-tuning existing ASR models on large, specialized datasets and advanced techniques like audio segmentation and error correction in making ASR more accessible.

Automatic Speech Recognition (ASR) systems have seen incredible progress over the last decade, largely due to advancements in deep neural networks and the availability of vast amounts of training data. These systems are now integrated into many daily applications. However, a significant challenge remains: ASR performance for individuals with speech disabilities still lags behind, primarily because there isn’t enough public training data that accurately represents diverse speech impairments.

To address this critical gap, the Interspeech 2025 Speech Accessibility Project (SAP) Challenge was launched. This pioneering initiative utilized over 400 hours of unique SAP data, collected and transcribed from more than 500 individuals with various speech disabilities. The challenge was hosted on EvalAI, employing a remote evaluation pipeline to ensure data privacy and secure testing.

The SAP Challenge evaluated submissions based on two primary metrics: Word Error Rate (WER) and Semantic Score (SemScore). WER is a traditional measure of ASR accuracy, quantifying how many words are incorrectly recognized compared to the original transcript. SemScore, on the other hand, focuses on the semantic fidelity of the transcription, assessing how well the system preserves the intended meaning and context of the speech. This dual evaluation approach provides a more comprehensive understanding of an ASR system’s real-world utility for impaired speech.

The dataset used for the challenge, known as SAP-240430, comprises approximately 415 hours of impaired speech from 524 participants. These individuals were diagnosed with one of five etiologies: Parkinson’s Disease (PD), Down Syndrome (DS), amyotrophic lateral sclerosis (ALS), cerebral palsy (CP), or stroke. A significant portion of the dataset, particularly in the training and test sets, consists of speech from individuals with Parkinson’s Disease. The data was carefully partitioned into training, development, and test sets, ensuring no speaker overlap between them. The test set was further divided into public (Test1) and private (Test2) subsets, with final rankings based on the private leaderboard results.

For the challenge, a baseline system was established using the open-source Whisper-large-v2 model. This model, part of OpenAI’s Whisper family, is a transformer-based encoder-decoder architecture pre-trained on an extensive 680,000 hours of audio data. The Whisper-large-v2 model achieved a WER of 14.97% on Test1 and 17.82% on Test2, and SemScores of 82.26% and 75.85% respectively, serving as the benchmark for participants.

The SAP Challenge attracted 22 teams who submitted valid results. A remarkable outcome was that 12 out of these 22 teams surpassed the whisper-large-v2 baseline in terms of WER, and 17 teams achieved a higher SemScore. The top-performing team achieved an impressive WER of 8.11% and a SemScore of 88.44% simultaneously. These results represent significant improvements over the baseline, demonstrating the effectiveness of fine-tuning existing speech foundation models on specialized impaired speech data.

Further analysis revealed a strong negative correlation between WER and SemScore, indicating that as word errors decrease, semantic understanding generally improves. Interestingly, some systems showed a preference for transcribing disfluencies (like hesitations or self-corrections), while others tended to omit them. Etiology-specific analysis showed that while Parkinson’s Disease speech dominated the dataset, the relative improvement for ALS speakers was larger, possibly due to less variability in their speech patterns within the Test2 split.

The success of the top teams can be attributed to several advanced strategies. Many teams built upon existing large foundation ASR models like NVIDIA’s Parakeet and OpenAI’s Whisper, fine-tuning them with the SAP data. Other key techniques included audio segmentation (dividing long audio files into shorter clips), model merging (combining multiple checkpoints for robustness), hallucination reduction (addressing instances where the ASR system generates non-existent words), curriculum learning, post-ASR error correction using large language models, and personalization strategies that adapt to individual speaker characteristics.

Also Read:

In conclusion, the Interspeech 2025 Speech Accessibility Project Challenge has made significant strides in advancing ASR for individuals with speech disorders. The challenge underscored the critical role of large-scale, speaker-independent datasets of impaired speech in improving ASR performance and generalization for unseen speakers. The achievements of the participating teams set new benchmarks and highlight promising avenues for future research, particularly in exploring within- and across-group similarities and differences among various etiology-based or impairment severity-based populations. This effort aims to foster the development of more inclusive and effective speech recognition technologies accessible to a broader audience. You can find the full research paper here: The Interspeech 2025 Speech Accessibility Project Challenge.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -