EGGCodec: Advancing F0 Extraction Through Robust EGG Signal Reconstruction

TLDR: EGGCodec is a new neural framework designed for precise electroglottography (EGG) signal reconstruction and fundamental frequency (F0) extraction. It improves upon existing models by simplifying its architecture (removing the GAN discriminator) and introducing specialized loss functions for both frequency and time domains. By focusing on reconstructing EGG signals, which more accurately reflect vocal fold vibrations, EGGCodec achieves superior F0 extraction accuracy and robustness compared to current state-of-the-art methods, significantly reducing errors in F0 estimation and voicing decisions.

Understanding the nuances of human speech is a complex field, and a crucial element within it is the fundamental frequency, or F0. F0 reflects the rate at which our vocal folds vibrate, carrying vital information about prosody and speaker characteristics. Accurate F0 extraction is essential for a wide range of applications, from speech recognition and synthesis to speaker identification and even music research.

Traditionally, F0 has been extracted from microphone-captured speech signals. However, this presents significant challenges due to the intricate vibration mechanisms of vocal folds and the variability of recording conditions. These factors can make precise F0 extraction a formidable task. A more reliable alternative comes in the form of electroglottography (EGG) signals. EGG signals offer higher accuracy and stability because they more directly reflect the periodic nature of vocal fold vibrations, making them ideally suited for F0 extraction.

In this context, a new framework called EGGCodec has been introduced. EGGCodec is a robust neural Encodec framework specifically designed for EGG signal reconstruction and F0 extraction. It builds upon the existing Encodec model, which is known for its ability to compress and reconstruct speech signals effectively. While Encodec is powerful, directly applying it to reconstruct EGG signals has been challenging due to its structural complexity, limitations in its loss function, and the inherent instability of its Generative Adversarial Network (GAN) discriminator.

EGGCodec addresses these challenges with several key innovations. One significant change is the removal of the conventional GAN discriminator, which streamlines the training process without compromising efficiency. Instead of extracting F0 directly from features, EGGCodec leverages reconstructed EGG signals, which have a closer correspondence to F0. To ensure high fidelity between the reconstructed and target EGG signals, EGGCodec employs a multi-scale frequency-domain loss function that captures the subtle relationships across different frequencies. This is complemented by a time-domain correlation loss, which improves the model’s ability to generalize and maintain accuracy over time.

The process within EGGCodec involves encoding speech signals into compact representations, quantizing them, and then reconstructing them into waveforms. Crucially, EGGCodec shifts its reconstruction target from speech signals to EGG signals. This means the model learns to generate outputs that closely match EGG signals, allowing it to focus on reconstructing the vocal cord vibration signal from speech input, thereby capturing the fine details of F0 more accurately. This approach not only enhances F0 extraction accuracy but also improves EGGCodec’s ability to characterize the dynamics of vocal fold opening and closing.

For F0 extraction, EGGCodec differentiates the reconstructed EGG signal to create a differential EGG (dEGG) signal. The peaks in the dEGG signal correspond to vocal fold closure instants. By using a peak detection algorithm, these peaks are identified as periodic markers to calculate vibration periods and derive F0. An important preprocessing step for EGG signals, especially from datasets like PTDB-TUG, involves applying a 50 Hz high-pass filter. This filter removes low-frequency components that originate from throat muscle artifacts rather than vocal fold vibrations, preventing interference with model training and ensuring cleaner, more reliable F0 estimation.

Extensive evaluations have demonstrated EGGCodec’s superior performance compared to state-of-the-art F0 extraction schemes. For instance, it reduces the mean absolute error (MAE) from 14.14 Hz to 13.69 Hz and improves the voicing decision error (VDE) by 38.2%. The model was trained on the PTDB-TUG corpus, which includes synchronized speech and EGG recordings, and evaluated on the CSTR-FDA dataset, a gold-standard pitch determination corpus. The results show that EGGCodec’s reconstructed EGG signals exhibit a high degree of consistency with the original signals, especially in the vibrating regions of the vocal cords. Noise augmentation during training also proved crucial, yielding perfect reconstructed EGG signals and enabling accurate vocal fold cycle detection.

Also Read:

Ablation studies, which systematically evaluate each component’s contribution, further validate EGGCodec’s design. The optimal configuration, integrating various loss functions and noise augmentation, achieved the best balance between accuracy and robustness. Even without the GAN discriminator, EGGCodec maintained strong performance, confirming the effectiveness of its simplified training process. This innovative framework not only enhances the accuracy of EGG reconstruction but also significantly contributes to the stability and reliability of F0 extraction, paving the way for more precise speech analysis. For more in-depth technical details, you can refer to the original research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

EGGCodec: Advancing F0 Extraction Through Robust EGG Signal Reconstruction

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates