TLDR: Researchers have introduced DRONEAUDIOSET, a new 23.5-hour audio dataset designed to improve drone-based search and rescue by enabling better detection of human sounds amidst loud drone noise. The dataset features diverse recordings across various drone types, microphone setups, and environments, allowing for the development and testing of advanced noise suppression and sound classification technologies. Initial evaluations show that while current methods struggle in extreme noise, the dataset provides crucial insights for designing more effective drone audio systems.
Unmanned Aerial Vehicles, commonly known as drones, have become indispensable tools for search and rescue (SAR) missions, especially in challenging environments like collapsed buildings or disaster zones. Traditionally, these missions rely heavily on visual data, but this approach often fails in conditions with poor visibility, such as smoke, fog, or cluttered spaces. This is where audio perception comes in, offering a complementary way to detect human presence through sounds like speech, screams, cries, or even non-verbal cues like banging and footsteps.
However, using microphones on drones presents a significant challenge: the drone’s own intense noise, known as ego-noise, combined with wind noise. This drone noise can be so loud that it completely masks the faint sounds indicating human presence, making detection incredibly difficult. Existing audio datasets for drones are often limited in their diversity or are purely synthetic, meaning they don’t capture the complex, real-world acoustic interactions.
Introducing DRONEAUDIOSET
To address these critical limitations, a team of researchers from the National University of Singapore has introduced DRONEAUDIOSET, a comprehensive new audio dataset specifically designed for drone-based search and rescue. This dataset is a major step towards enabling the design and deployment of effective drone-audition systems.
DRONEAUDIOSET is an extensive collection, featuring 23.5 hours of carefully annotated recordings. It covers a wide spectrum of signal-to-noise ratios (SNRs), ranging from extremely low (-57.2 dB) to moderately low (-2.5 dB), reflecting the challenging conditions faced in real-world scenarios. The dataset incorporates various drone types, different throttle settings, multiple microphone configurations, and diverse indoor environments, providing a rich resource for researchers.
How the Data Was Collected
The researchers employed a systematically controlled experimental setup. The drone was securely mounted on a fixed aluminum frame, mimicking a hovering drone in a static position. This setup allowed for consistent and repeatable conditions while capturing a wide range of audio samples. The data collection varied several key parameters:
- Drone Types: Two quadcopters of different sizes, a larger DJI F450 (Dlarge) and a smaller DJI F330 (Dsmall), were used to capture varied ego-noise profiles.
- Throttle Settings: Recordings were made at both ‘low’ and ‘high’ throttle speeds to simulate different operational modes.
- Microphone Configurations: A total of 17 microphones were deployed. This included two 8-channel circular arrays (Mup and Mdown, placed above and below the drone, respectively) and a central standalone microphone (Mcenter). These were positioned at 25 cm and 50 cm distances from the drone.
- Sound Sources: Three categories of sounds relevant to search and rescue were used: human vocal sounds (speech, screams, cries), human non-vocal sounds (door knocks, clapping, footsteps), and ambient non-human sounds (fire crackling, water dripping). These sounds were played through a speaker at different loudness levels (60 dB and 90 dB).
- Environments: Data was collected in three different indoor rooms – a small conference room and two large multi-purpose halls – to introduce diversity in reverberation and multi-path effects.
The dataset includes recordings where drone noise and source sounds were captured simultaneously, as well as separate recordings of drone-only noise and source-only sounds. This allows for detailed analysis and computation of signal-to-noise ratios.
Key Findings and Challenges
The research paper also benchmarks state-of-the-art noise suppression and audio classification models using DRONEAUDIOSET. The evaluations revealed significant insights:
- Noise Suppression: Neural network-based noise suppression methods generally outperformed traditional techniques, especially in extremely noisy conditions (below -20 dB SNR). However, all methods struggled when the SNR dropped below -30 dB, highlighting the need for more advanced solutions. Human vocal sounds showed the most improvement after noise suppression, while non-vocal human sounds and non-human ambient sounds remained challenging.
- Sound Classification: After noise suppression, human vocal sounds were classified with much higher accuracy compared to non-vocal human sounds and non-human ambient sounds. Many non-vocal and non-human sounds were often misclassified as silence, indicating that improving noise suppression for these sound types is crucial for better detection.
Designing Better Drone Audio Systems
Based on the empirical analysis, the researchers derived several actionable recommendations for designing more effective drone-audition systems:
- Microphone Placement: Microphones placed above the drone generally performed better than those below, as they were less exposed to direct wind noise from the propellers. Increasing the distance between the microphone and the drone also improved performance.
- Microphone Arrays: While multi-channel microphone arrays offer advantages like beamforming (which can help focus on sounds from a specific direction), they also demand higher processing power. System designers must balance these trade-offs with mission requirements.
- Drone Throttle Adjustments: Operating the drone at lower throttle levels significantly improved acoustic performance. This suggests that drones could incorporate adaptive throttle reduction strategies during critical listening periods.
- Drone Size: Smaller drones generated less ego-noise and thus achieved better acoustic performance. However, larger drones can carry more advanced recording equipment, presenting a trade-off between payload capacity and acoustic clarity.
Also Read:
- Understanding Pedestrians Near Vehicles: Introducing the Valeo Near-Field Dataset
- Decoding the Past: ClapperText and Low-Resource Text Recognition
Looking Ahead
DRONEAUDIOSET opens up new research opportunities for developing next-generation noise suppression algorithms and audio classification models that can operate effectively in extreme low-SNR drone environments. The dataset’s controlled variations can help train models to adapt to different noise profiles and could also be useful for sound localization and speech recovery in mobile robotics.
The positive societal impact of this work is significant, promising improved search and rescue capabilities in disaster scenarios where visual systems are inadequate. However, the researchers also acknowledge potential negative applications, such as unauthorized surveillance, and recommend safeguards like access controls for sensitive data and clear ethical usage guidelines.
While the dataset represents a significant leap, future work will aim to expand it to include outdoor recordings, capture the micro-dynamics of real hovering drones, and detect other emergency-relevant auditory cues like fire or structural collapse. For more in-depth information, you can read the full research paper here: DRONEAUDIOSET: An Audio Dataset for Drone-based Search and Rescue.


