SingMOS-Pro: A New Benchmark Dataset for Assessing Singing Voice Quality

TLDR: SingMOS-Pro is a new, comprehensive dataset designed for automatic singing quality assessment. It expands on previous work by offering detailed annotations for lyrics, melody, and overall singing quality across 7,981 clips generated by various models. The dataset aims to address the challenges of evaluating singing quality, which traditionally relies on costly human subjective assessments or limited objective metrics. It provides a robust benchmark for developing and testing new models, highlighting the need for specialized approaches beyond speech quality assessment and suggesting future directions for integrating melodic and lyrical information.

Evaluating the quality of generated singing voices has long been a complex challenge in the rapidly advancing field of singing voice generation. While human listening tests are considered the ‘gold standard,’ they are often expensive and time-consuming. Existing objective metrics, on the other hand, frequently fail to capture the nuanced aspects of perceived singing quality. This gap has highlighted a critical need for more efficient, reliable, and universal methods for assessing singing quality.

Addressing this challenge, a new research paper introduces SingMOS-Pro, a groundbreaking dataset designed to facilitate automatic singing quality assessment. Building upon its predecessor, SingMOS, which offered only overall quality ratings, SingMOS-Pro significantly expands its scope. The dataset now includes detailed annotations for lyrics, melody, and overall quality, providing a much broader and more diverse evaluation framework.

SingMOS-Pro is a substantial resource, comprising 7,981 singing clips. These clips were generated by 41 different models across 12 datasets, showcasing a wide spectrum of singing voice generation technologies, from earlier systems to the latest advancements. To ensure the highest level of reliability and consistency, each clip in the dataset has received at least five ratings from professional annotators.

The researchers behind SingMOS-Pro have also explored effective strategies for utilizing Mean Opinion Score (MOS) data annotated under varying standards. They benchmarked several widely used evaluation methods from related tasks on SingMOS-Pro, establishing robust baselines and practical references for future research in this domain. The dataset itself is publicly accessible, providing a valuable tool for the community. You can find more details about this work in the research paper.

The dataset is the first multilingual and multi-task-focused MOS dataset for singing quality assessment. It includes samples from singing voice synthesis (SVS), singing voice conversion (SVC), singing voice resynthesis (SVR), and ground-truth recordings. The clips are annotated along three dimensions: overall quality, lyrics clarity, and melody naturalness. This fine-grained annotation allows for a more comprehensive understanding of singing performance.

The annotation process involved 78 experienced annotators who conducted evaluations online in quiet environments. To maintain quality, each batch of evaluations included ‘trap clips’ (noise or silence) and ‘golden clips’ (carefully selected high-quality samples). If an annotator’s ratings on these control clips fell outside acceptable parameters, their entire batch of annotations was re-evaluated.

Experiments conducted using SingMOS-Pro revealed interesting insights into model performance. Speech MOS models, such as UTMOS and DNSMOS, performed poorly on singing tasks, underscoring the significant domain gap between speech and singing. While the original SingMOS model showed strong performance on in-domain data, it struggled with out-of-domain samples, indicating a need for broader data coverage to prevent overfitting. Models like SHEET-ssqa, which integrate additional speech MOS data, demonstrated an ability to mitigate this overfitting, suggesting that combining speech and singing data could be a promising direction.

The research also explored the integration of pitch information, using methods like MIDI pitch and pitch histograms. While these approaches showed marginal improvements over a plain self-supervised learning baseline, the findings highlight the ongoing need for more effective ways to incorporate melodic cues into singing quality assessment models. Future work will also focus on leveraging the detailed melody and lyric scores provided by SingMOS-Pro to further enhance automatic SQA.

Also Read:

In conclusion, SingMOS-Pro represents a significant step forward in the field of automatic singing quality assessment. By offering a reliable, multilingual, multi-task, and fine-grained dataset, it provides essential resources and benchmarks that will undoubtedly accelerate the development of more effective and robust SQA models, ultimately contributing to the advancement of high-quality singing voice generation.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

SingMOS-Pro: A New Benchmark Dataset for Assessing Singing Voice Quality

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates