New AI Model QAMO Enhances Deepfake Speech Detection by Understanding Speech Quality

TLDR: QAMO (Quality-Aware Multi-Centroid One-Class Learning) is a new framework for detecting speech deepfakes. Unlike traditional methods that use a single model for genuine speech, QAMO employs multiple ‘quality-aware’ centroids, each representing a distinct speech quality level (e.g., high or low quality). This approach allows the system to better model the natural variations within real speech and distinguish it more effectively from deepfakes, even unseen ones. QAMO also features an ensemble scoring strategy that improves detection without needing quality labels during inference, leading to better performance and robustness across various datasets.

In an era where artificial intelligence can generate highly realistic speech, distinguishing between genuine human speech and sophisticated deepfake audio has become a critical challenge. Traditional methods for detecting speech deepfakes often struggle with new, unseen attacks because they are trained to classify between known real and fake speech. This approach can lead to models that are too specialized and less effective against novel deepfake techniques.

A promising alternative is one-class learning, which focuses solely on understanding the characteristics of real, or “bona fide,” speech. Instead of learning what fake speech looks like, it builds a compact model of genuine speech, flagging anything that deviates significantly as potentially fake. While effective, conventional one-class learning often simplifies the diverse nature of human speech by representing it with a single central point, or centroid. This single-centroid approach can overlook important nuances, such as variations in speech quality.

Researchers from Nanyang Technological University, National University of Singapore, and The Hong Kong Polytechnic University have introduced a novel framework called QAMO: Quality-Aware Multi-Centroid One-Class Learning for speech deepfake detection. QAMO addresses the limitations of single-centroid models by introducing multiple centroids, each specifically designed to represent different levels of speech quality. This allows the system to better capture the natural variability within genuine speech, acknowledging that real speech can exist across a spectrum of qualities, from high-fidelity recordings to lower-quality audio.

The core idea behind QAMO is to assign a discrete quality level (e.g., high or low quality) to each genuine speech sample during training, based on its Mean Opinion Score (MOS). These MOS values, which reflect perceived speech quality, are obtained using existing speech quality assessment models. Each centroid in QAMO is then optimized to represent a distinct quality subspace. This explicit encoding of quality information helps the model preserve intra-class variability – the natural differences within genuine speech – while still maintaining a clear distinction from deepfake audio.

A significant advantage of QAMO is its multi-centroid ensemble scoring strategy during inference. Unlike some methods that might require knowing the quality of an incoming speech sample, QAMO can operate without explicit quality labels. It computes a final detection score by averaging the similarities across all its quality-aware centroids. This ensemble approach has been shown to stabilize decision boundaries and improve the robustness of detection, making it more practical for real-world deployment where obtaining quality labels for every incoming audio might be computationally expensive.

Also Read:

Extensive experiments demonstrated QAMO’s effectiveness. When integrated with advanced speech processing backbones like XLSR-Conformer-TCM, QAMO achieved an Equal Error Rate (EER) of 5.09% on the challenging In-the-Wild dataset, outperforming previous one-class and quality-aware systems. This indicates its strong generalization capability to unseen deepfake attacks and diverse acoustic conditions. The research highlights that explicitly modeling speech quality within a multi-centroid one-class learning framework significantly enhances the robustness and performance of speech deepfake detection systems. You can find more details about this innovative approach in the full research paper available here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

New AI Model QAMO Enhances Deepfake Speech Detection by Understanding Speech Quality

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates