ChiReSSD: A Generative AI Approach to Reconstruct Disordered Speech in Children

TLDR: The research introduces ChiReSSD, a novel speech reconstruction framework designed for children with speech sound disorders (SSD). Unlike previous models trained on healthy adult speech, ChiReSSD preserves a child’s unique voice identity while correcting mispronunciations, specifically adapting to their pitch and prosody. Evaluated on the STAR dataset, it significantly improves lexical accuracy and speaker identity. The framework also shows a strong correlation (ρ=0.63) between its automatic consonant accuracy metric and human expert annotations, suggesting a potential to reduce manual transcription burden in clinical evaluations. Furthermore, ChiReSSD effectively generalizes to reconstruct adult dysarthric speech, demonstrating its broad applicability across diverse clinical populations.

Speech disorders present significant challenges to daily communication, affecting individuals across all age groups. These conditions can lead to deviations in acoustic, phonetic, and prosodic dimensions of speech, reducing intelligibility and impacting quality of life. While speech reconstruction, the process of generating more intelligible utterances while preserving a speaker’s unique identity, offers a promising solution, existing methods have largely focused on healthy adult speech, leaving children with speech sound disorders (SSD) underserved.

Addressing this critical gap, a new research paper introduces ChiReSSD, a groundbreaking speech reconstruction framework specifically designed for children with SSD. This innovative approach aims to preserve a child’s distinct speaker identity while effectively suppressing mispronunciations. Unlike its predecessors, ChiReSSD adapts to the unique voices of children, with a particular emphasis on their characteristic pitch and prosody.

How ChiReSSD Works

ChiReSSD builds upon the StyleTTS2 framework, a sophisticated text-to-speech (TTS) system. Its core innovation lies in its ability to disentangle acoustic and prosodic style embeddings. This allows the system to selectively reduce the influence of pathological acoustic patterns—which often encode mispronunciations—while meticulously retaining the child’s natural prosody and speaker identity. The framework fine-tunes specific modules, including acoustic and prosodic style encoders and a pitch extractor, to better capture the higher pitch ranges and unique prosodic patterns of child speech.

During the inference process, ChiReSSD extracts style embeddings from a brief reference sample of an unseen child with SSD. This enables one-shot style transfer, meaning a recording of just four seconds is sufficient to generate reconstructed speech for any target text, even if the reference doesn’t contain the exact target utterance. The system then encodes the target lexical content at the phoneme level, guiding the generation of clear, identity-preserving speech.

Key Findings and Clinical Impact

The evaluation of ChiReSSD on the STAR dataset demonstrated substantial improvements across several key areas. In terms of speaker identity preservation, ChiReSSD achieved a similarity score of 0.62, surpassing other baseline methods and meeting acceptable thresholds for speaker verification tasks. It also showed the lowest fundamental frequency (F0) difference, indicating a close alignment with original pitch distributions and maintaining child-matched prosody within perceptually acceptable bounds.

For lexical and phonetic accuracy, ChiReSSD significantly reduced error rates. It achieved a 48% relative reduction in character error rate (CER) and a 42% relative reduction in word error rate (WER) compared to original disordered speech samples. This indicates a marked reduction in phonetic distortions, making the reconstructed speech much more intelligible.

Crucially, ChiReSSD shows strong potential for automating clinical evaluations. The researchers automatically predicted the phonetic content in original and reconstructed speech pairs, and the proportion of corrected consonants was comparable to the Percentage of Correct Consonants (PCC), a standard clinical speech assessment metric. A Pearson correlation of ρ=0.63 was observed between these automatic predictions and human expert annotations, highlighting a significant opportunity to reduce the manual transcription burden currently faced by speech-language therapists. For more in-depth technical details, you can refer to the full research paper here.

Generalization to Adult Dysarthric Speech

Beyond childhood SSDs, ChiReSSD also demonstrated impressive generalization capabilities. Experiments on the TORGO dataset, which features adult speakers with dysarthria, showed that the approach substantially improved lexical accuracy across all severity levels of dysarthria. Character error rates were consistently reduced to below 0.03, even for severe cases, and speaker identity preservation remained robust with high similarity values. This cross-disorder adaptability underscores the versatility of ChiReSSD and its potential to provide personalized, intelligible, and identity-preserving speech reconstruction for a wide range of clinical populations.

Also Read:

Conclusion

ChiReSSD represents a significant advancement in generative speech reconstruction, offering a tailored solution for children with speech sound disorders and demonstrating effective generalization to adult dysarthric speech. By preserving speaker identity while correcting mispronunciations and showing a strong correlation with clinical assessment metrics, this framework paves the way for more efficient automated clinical evaluations and improved communication aids for diverse individuals facing speech challenges.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

ChiReSSD: A Generative AI Approach to Reconstruct Disordered Speech in Children

How ChiReSSD Works

Key Findings and Clinical Impact

Generalization to Adult Dysarthric Speech

Conclusion

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates