2020 ESA Annual Meeting (August 3 - 6)

PS 48 Abstract - Automatic detection of periodic vocalizations in audio recordings

Samuel Lapp, Biological Sciences, University of Pittsburgh, Pittsburgh, PA and Justin A. Kitzes, Energy and Resources Group, University of California, Berkeley, PA
Background/Question/Methods

Automated recognition of vocal and non-vocal sounds in audio recordings could greatly expand the temporal and spatial scales of monitoring biodiversity and phenology. With the development of scalable and open-source recording hardware, collecting large-scale audio datasets is increasingly feasible. Harnessing the power of these datasets requires automated species recognition technology because the datasets are on the order of thousands to tens-of-thousands of hours of audio. Progress has been made in the automated recognition of bats and birds; however, far less effort has been devoted to other taxonomic groups. Some groups, such as frogs and insects, tend to produce periodic or quasi-periodic sounds that are comprised of simple, short units that repeat several times. Existing identification methods for birds and bats have difficulty identifying vocalizations characterized by simple, repeated units.

The goal of this research was to develop an automated method specifically for identifying repeating calls, such as those produced by many frogs and insects. We hypothesized that these calls could be characterized by a combination of their frequency bandwidth and interval of repetition. We examined several models intended to specifically leverage these two features to identify unknown species in audio recordings.

Results/Conclusions

The model we developed for detecting repeating vocalizations involves bandpassing an audio file to a specific frequency range and constructing a one-dimensional signal of the sound’s amplitude. We then create a power spectrum of this signal and score the file by the highest value of the spectrum in a specific range of repeat intervals. Unlike template matching, our method can handle variation in vocalizations and background noise, and unlike neural network approaches, our method does not require large amounts of training data. To demonstrate the capabilities of the model, we tested its ability to detect the presence of a rodent with a slow repeating call and an anuran with a fast repeating call.

In preliminary results, our model successfully identified the presence of Amazon Bamboo Rats (Dactylomys dactylinus) in unlabeled field recordings. Human listeners confirmed that in a sample of top-scoring results, 71% contained vocalizations of Amazon Bamboo Rats. The model was also able to detect the Boreal Chorus Frog (Pseudacris maculata) in unlabeled field recordings from the American Prairie Reserve. Human listening confirmed that 70% of the top-scoring calls from a sample of the field data contained vocalizations of the Boreal Chorus Frog.