TLDR: SonicMaster is the first unified, text-guided AI model for music restoration and mastering. It uses a generative flow-matching approach trained on a new dataset of degraded and clean music with text prompts. It can simultaneously fix issues like reverb, distortion, and tonal imbalances, either automatically or via natural language commands, significantly improving audio quality and listener preference.
In the world of music production, achieving pristine audio quality can be a significant challenge, especially for recordings made outside professional studios. Many music tracks suffer from common issues such as excessive reverberation, distortion, clipping, imbalanced tones, or a narrowed stereo image. Traditionally, correcting these problems has involved using multiple specialized tools and extensive manual adjustments, a process that is both labor-intensive and requires expert skill.
Enter SonicMaster, a groundbreaking development introduced by Jan Melechovsky, Ambuj Mehrish, and Dorien Herremans from the Singapore University of Technology and Design. This innovative system is the first unified generative model designed for comprehensive music restoration and mastering. It addresses a wide array of audio imperfections within a single framework, offering a streamlined solution for creators.
What makes SonicMaster truly unique is its text-based control. Users can provide natural language instructions, such as “reduce the hollow room sound” or “increase the brightness,” to guide the model in applying targeted enhancements. For those who prefer a hands-off approach, SonicMaster also features an automatic mode for general restoration, leveraging learned perceptual heuristics to produce a balanced master.
How SonicMaster Works
At its core, SonicMaster operates as a single flow-based generative framework. Unlike conventional methods that treat audio artifacts in isolation, this model simultaneously performs several crucial tasks: dereverberation (removing echoes), equalization (balancing frequencies), declipping (reconstructing saturated peaks), dynamic-range expansion (restoring volume contrast), and stereo enhancement (widening the soundstage). This integrated approach eliminates the need for a series of separate, potentially error-prone modules, simplifying the entire mastering process to a single pass.
To train this sophisticated model, the researchers constructed the SonicMaster dataset, a large collection of paired degraded and high-quality music tracks. This dataset was created by simulating common audio degradations using nineteen distinct degradation functions, categorized into five main groups: equalization, dynamics, reverb, amplitude, and stereo. Each degraded sample is accompanied by a natural-language prompt describing the specific artifact or the required fix, enabling the model to learn from diverse scenarios.
The model leverages a flow-matching generative training paradigm. In simple terms, it learns an audio transformation that maps degraded inputs directly to their cleaned, mastered versions, guided by the text prompts. The audio is first encoded into a compact latent representation using a VAE codec, and text instructions are embedded using a FLAN-T5 encoder, allowing the restoration to occur efficiently in this learned space.
Also Read:
- EmoSteer-TTS: Precise Emotion Control in Synthesized Speech Without Retraining
- ErasePro: A New Approach for Removing Unwanted Concepts from AI Image Generators
Performance and Impact
Extensive evaluations demonstrate SonicMaster’s effectiveness. Objective audio quality metrics show significant improvements across all artifact categories when compared to original degraded inputs and other baseline methods. For instance, it notably outperforms baselines like Text2FX for equalization and WPE/HPSS for dereverberation.
Beyond technical metrics, subjective listening tests confirm that human listeners consistently prefer SonicMaster’s enhanced outputs over the original degraded audio. Listeners rated the model highly for text relevance, quality improvement, and consistency, particularly noting significant perceptive improvements in declipping, volume increase, and dereverberation. This highlights the practical effectiveness and user appeal of this unified approach.
SonicMaster represents a significant leap forward in music restoration and mastering, offering an accessible and powerful tool for both amateur and professional creators. By unifying complex audio enhancement tasks under a single, text-guided generative model, it promises to make high-quality audio production more attainable for everyone. For more technical details, you can refer to the full research paper available here.


