spot_img
HomeResearch & DevelopmentChiReSSD: A Generative AI Approach to Reconstruct Disordered Speech...

ChiReSSD: A Generative AI Approach to Reconstruct Disordered Speech in Children

TLDR: The research introduces ChiReSSD, a novel speech reconstruction framework designed for children with speech sound disorders (SSD). Unlike previous models trained on healthy adult speech, ChiReSSD preserves a child’s unique voice identity while correcting mispronunciations, specifically adapting to their pitch and prosody. Evaluated on the STAR dataset, it significantly improves lexical accuracy and speaker identity. The framework also shows a strong correlation (ρ=0.63) between its automatic consonant accuracy metric and human expert annotations, suggesting a potential to reduce manual transcription burden in clinical evaluations. Furthermore, ChiReSSD effectively generalizes to reconstruct adult dysarthric speech, demonstrating its broad applicability across diverse clinical populations.

Speech disorders present significant challenges to daily communication, affecting individuals across all age groups. These conditions can lead to deviations in acoustic, phonetic, and prosodic dimensions of speech, reducing intelligibility and impacting quality of life. While speech reconstruction, the process of generating more intelligible utterances while preserving a speaker’s unique identity, offers a promising solution, existing methods have largely focused on healthy adult speech, leaving children with speech sound disorders (SSD) underserved.

Addressing this critical gap, a new research paper introduces ChiReSSD, a groundbreaking speech reconstruction framework specifically designed for children with SSD. This innovative approach aims to preserve a child’s distinct speaker identity while effectively suppressing mispronunciations. Unlike its predecessors, ChiReSSD adapts to the unique voices of children, with a particular emphasis on their characteristic pitch and prosody.

How ChiReSSD Works

ChiReSSD builds upon the StyleTTS2 framework, a sophisticated text-to-speech (TTS) system. Its core innovation lies in its ability to disentangle acoustic and prosodic style embeddings. This allows the system to selectively reduce the influence of pathological acoustic patterns—which often encode mispronunciations—while meticulously retaining the child’s natural prosody and speaker identity. The framework fine-tunes specific modules, including acoustic and prosodic style encoders and a pitch extractor, to better capture the higher pitch ranges and unique prosodic patterns of child speech.

During the inference process, ChiReSSD extracts style embeddings from a brief reference sample of an unseen child with SSD. This enables one-shot style transfer, meaning a recording of just four seconds is sufficient to generate reconstructed speech for any target text, even if the reference doesn’t contain the exact target utterance. The system then encodes the target lexical content at the phoneme level, guiding the generation of clear, identity-preserving speech.

Key Findings and Clinical Impact

The evaluation of ChiReSSD on the STAR dataset demonstrated substantial improvements across several key areas. In terms of speaker identity preservation, ChiReSSD achieved a similarity score of 0.62, surpassing other baseline methods and meeting acceptable thresholds for speaker verification tasks. It also showed the lowest fundamental frequency (F0) difference, indicating a close alignment with original pitch distributions and maintaining child-matched prosody within perceptually acceptable bounds.

For lexical and phonetic accuracy, ChiReSSD significantly reduced error rates. It achieved a 48% relative reduction in character error rate (CER) and a 42% relative reduction in word error rate (WER) compared to original disordered speech samples. This indicates a marked reduction in phonetic distortions, making the reconstructed speech much more intelligible.

Crucially, ChiReSSD shows strong potential for automating clinical evaluations. The researchers automatically predicted the phonetic content in original and reconstructed speech pairs, and the proportion of corrected consonants was comparable to the Percentage of Correct Consonants (PCC), a standard clinical speech assessment metric. A Pearson correlation of ρ=0.63 was observed between these automatic predictions and human expert annotations, highlighting a significant opportunity to reduce the manual transcription burden currently faced by speech-language therapists. For more in-depth technical details, you can refer to the full research paper here.

Generalization to Adult Dysarthric Speech

Beyond childhood SSDs, ChiReSSD also demonstrated impressive generalization capabilities. Experiments on the TORGO dataset, which features adult speakers with dysarthria, showed that the approach substantially improved lexical accuracy across all severity levels of dysarthria. Character error rates were consistently reduced to below 0.03, even for severe cases, and speaker identity preservation remained robust with high similarity values. This cross-disorder adaptability underscores the versatility of ChiReSSD and its potential to provide personalized, intelligible, and identity-preserving speech reconstruction for a wide range of clinical populations.

Also Read:

Conclusion

ChiReSSD represents a significant advancement in generative speech reconstruction, offering a tailored solution for children with speech sound disorders and demonstrating effective generalization to adult dysarthric speech. By preserving speaker identity while correcting mispronunciations and showing a strong correlation with clinical assessment metrics, this framework paves the way for more efficient automated clinical evaluations and improved communication aids for diverse individuals facing speech challenges.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -