TLDR: A new research paper introduces a contrastive learning method for unsupervised abnormal sound detection in machines. By augmenting high-frequency spectrum information, the model learns to focus on low-frequency normal operational patterns, outperforming existing methods on DCASE 2020 and DCASE 2022 datasets and demonstrating strong generalization capabilities.
Detecting abnormal sounds in machines is crucial for early fault diagnosis and maintaining industrial equipment. Traditional methods often struggle with challenges like data imbalance, the complexity of sound signals, and the generalization capability of models across different machine types and operating conditions.
A new research paper, “Contrastive Learning with Spectrum Information Augmentation in Abnormal Sound Detection,” introduces an innovative approach to tackle these issues. The authors, Xinxin Meng, Jiangtao Guo, Yunxiang Zhang, and Shun Huang, propose a data augmentation method specifically designed for high-frequency information within a contrastive learning framework. This method helps models focus on the low-frequency information, which typically represents the normal operational mode of a machine, while anomalous sounds and noise often manifest in higher frequencies.
The core idea stems from the observation that anomalies and noises in machine audio tend to appear predominantly in the high-frequency ranges of sound spectrograms. By augmenting high-frequency information in a contrasting manner, the model is encouraged to learn the stable, low-frequency patterns associated with normal machine operation. This is particularly effective in unsupervised anomaly sound detection, where labeled abnormal data is scarce.
The Proposed Method
The researchers developed a contrastive learning framework that generates two audio recordings with significant differences in high-frequency information from a single input. This is achieved through a series of transformations:
- Pre-Normalization: Standardizes the data to stabilize calculations.
- Mixup for High-Frequency Information: A unique “Log-mixup-exp” technique is applied to audio features. It mixes small proportions of past randomly selected input audio, focusing on creating contrast in background sounds to promote learning of invariant foreground acoustic event representations.
- Random Resize Crop (RRC): Approximates pitch changes and time extensions to help the model learn robust representations.
- Post-Normalization: Corrects any data drift caused by previous enhancements, ensuring a standard normal distribution for the final outputs.
These augmented samples are then used to construct positive and negative pairs for contrastive learning. Positive pairs consist of samples from different domains but the same machine class, while negative pairs combine an anchor sample with data from other classes. The goal is to minimize the distance between positive pairs and maximize the distance between negative pairs, effectively teaching the model to recognize the normal operational patterns.
Also Read:
- Navigating Industrial Monitoring: A Look at Rule-Based Versus Data-Driven Systems
- TISDiSS: A Scalable Framework for Adaptive Audio Source Separation
Performance and Generalizability
The effectiveness of this method was rigorously evaluated on two prominent datasets: DCASE 2020 Task 2 and DCASE 2022 Task 2. On the DCASE 2020 Task 2 evaluation dataset, the proposed method achieved an impressive Area Under the Curve (AUC) of 93.83% and a partial AUC (pAUC) of 87.6%. These results significantly outperformed the top-ranked system in the challenge, which had an AUC of 90.47% and pAUC of 83.61%.
Furthermore, the method demonstrated strong generalization capabilities on the DCASE 2022 Task 2 dataset, which focuses on domain generalization tasks where the test data belongs to unseen domains. The approach showed substantial improvements in both source and target domains, highlighting its ability to generalize across diverse acoustic environments and machine operating parameters.
This research marks a significant step forward in unsupervised machine abnormal sound detection, offering a robust and generalizable solution for industrial monitoring. For more in-depth technical details, you can refer to the full paper available here.


