2020 ESA Annual Meeting (August 3 - 6)

COS 109 Abstract - Free, open-source machine learning classifiers for acoustic recognition of 500 common North American bird species

Tessa Rhinehart1, Samuel Lapp1, Barry E. Moore II2 and Justin A. Kitzes3, (1)Biological Sciences, University of Pittsburgh, Pittsburgh, PA, (2)Center for Research Computing, University of Pittsburgh, Pittsburgh, PA, (3)Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA
Background/Question/Methods

Autonomous acoustic recorders are an increasingly popular method for surveying sound-producing animals. Each minute of recording can require 2-3 minutes for a human to analyze, meaning that automated methods are needed to analyze recordings in large-scale studies. But most automated analysis methods are closed-source, available for only a small number of species, and limited by the amount of labeled training data available. Therefore, we have developed a method to create machine learning models for large numbers of species despite limited training data. We demonstrated this pipeline by creating free, open-source machine learning classifiers for the vocalizations of 508 common North American bird species.

We tested several approaches, including creating a single classifier for all species and creating individual classifiers for each species. Each model was a convolutional neural network trained using simulated labeled acoustic data. Simulated data were created by overlaying and modifying recordings obtained from xeno-canto, an open bird sound database. Models predict whether each species is present or absent in a given acoustic recording.

Results/Conclusions

The single- and multi-species models were assessed using two metrics, precision and recall. Precision is the percentage of the model’s “species present” predictions that were correct. Recall is the percentage of total true vocalizations that the model correctly identified. Increasing the classifier’s precision usually results in lower recall, and vice-versa.

We assessed the performance of single-species models (n=508) using a validation set of simulated recordings. Precision of these models was 76.2 ± 10.8% (mean ± SD). Recall was 34.4 ± 16.4% (mean ± SD). We assessed the performance of the multi-species model (n=1) on three non-simulated datasets: sparse field recordings from Ithaca, NY; high-quality single-species recordings; and dense field recordings from Rector, PA. We demonstrate the trade-off between precision and recall by comparing model performance in two scenarios. First, to survey common species, recall rate is set to 5% to reduce misidentifications. In this scenario, precisions for each dataset were respectively 36.2%, 89.6%, and 81.9%. Second, to survey rare species, precision was set to 5% to reduce the possibility of missing a rare vocalization. In this scenario, recalls for each dataset were 14.5%, 55.6%, and 54.3%.

We packaged trained models in a portable container that can be run on any computer. Although demonstrated on birds, the data simulation and model training pipeline is applicable to a wide variety of sound-producing taxa. This method enables large-scale automated surveys of these animals.