OpenSoundscape: Machine learning for scalable acoustic surveys

Rhinehart, Tessa; Rhinehart, Tessa; Rhinehart, Tessa

Background/Question/Methods

Large-scale biodiversity surveys are necessary to predict and mitigate biodiversity loss as habitat conversion and climate change alter landscapes. However, it is difficult for humans to perform systematic, targeted, large-scale surveys of the type needed to make these predictions.

Autonomous acoustic recorders are a promising technique for the generation of ecological occurrence data at the necessary scales. Many taxa of conservation concern can be surveyed acoustically, including birds, frogs, bats, and insects. To realize the potential of large-scale acoustic surveys, data will need to be rapidly assessed using software that detects vocalizations and classifies their species identity. While some acoustic classification software exists, no software is both open-source and scalable to large amounts of data.

We set out to create the first open-source, scalable acoustic classification software for ecology and conservation. Our goal was to create software that, in a cluster, cloud, or supercomputing environment, can identify the presence of several hundred species in up to 100,000 hours of recordings each month.

Results/Conclusions

We created OpenSoundscape, a platform for sound classification written in Python. OpenSoundscape can be used to train machine learning models that detect presence or absence of a single species; a suite of models can be trained for complete soundscape classification.

The classification algorithm uses ideal “templates” of species’ sounds and labeled “training files,” soundscapes where an annotator has determined the presence or absence of the focal species. Each template’s spectrogram is cross-correlated against each training file’s spectrogram, and cross-correlations are used to train a random forest model. The trained model can then predict species presence in new soundscape files. In our benchmarking tests, OpenSoundscape generated spectrograms at the rate of 23.4 hours of audio data per core hour. During model training, 8.6 hours of training audio were processed per core hour; trained models predict species identity in soundscapes at approximately the same rate. Users can share trained models via OpenSoundscape’s “import/export” feature. Our next goals are to optimize OpenSoundscape for speed; release trained models for 600 of the most common bird species in the USA and Canada; and triangulate sounds to identify individual vocalizing organisms.

OpenSoundscape is found at https://github.com/kitzeslab/opensoundscape.

PS 46-94 - OpenSoundscape: Machine learning for scalable acoustic surveys