2020 ESA Annual Meeting (August 3 - 6)

COS 125 Abstract - Automatic detection for passive acoustic monitoring of the African elephant

Jonathan Michael Gomes Selman1, Nikita N. Demir2, Andreas Paepcke2 and Peter Wrege3, (1)Stanford University, (2)Computer Science, Stanford, Stanford, CA, (3)Cornell Lab of Ornithology, Cornell, Ithaca, NY
Background/Question/Methods

Poaching, illegal logging, and other human activities present great threats to wildlife, demanding increased conservation and monitoring efforts. For species that inhabit vast territories or remote areas, monitoring presents a great challenge. One promising solution is Passive Acoustic Monitoring, which involves placing autonomous recording devices within a habitat to passively record the vocalizations of species. We focus on the African Elephant whose population decreased by 60% over the last century. The Cornell Lab of Ornithology has collected extensive acoustic data with the goal of better understanding population distributions, behaviors, and threats to survival. The data comprise continuous 24 hour logs, resulting in a massive amount of audio, in which elephant calls are intermixed with many other events. Volunteers have identified elephant rumbles in a small subset of data but manually sifting through the entire trove is infeasible. Can we effectively automate elephant call detection?

We employ state of the art deep neural network learning and signal processing techniques to automate the identification of elephant calls. In addressing this problem, we also explore new methods for tackling heavy data imbalance in the number of audio examples containing elephant calls as opposed to unrelated events, a challenge prevalent in passive acoustic monitoring.

Results/Conclusions

Our proposed convolutional bidirectional long short-term memory (LSTM) recurrent neural network (RNN) model improves on the state of the art for detecting elephant calls. Specifically, our model cuts down on the amount of data that human annotators need to inspect by 98%. This is achieved with a recall of 96% and a precision of 21%. While this corresponds to about 80% being false positives, this is extremely useful to conservationists resulting in a significant reduction in expensive and time-intensive human labelling.

To further improve our results and cut on the amount of human supervision for false positive elimination, we look primarily to tackle issues in large scale data imbalance. We employ a digital signal preprocessing approach to eliminate background noise, simplifying the learning task for our model. Additionally, we explore two methods for addressing the massive data imbalance. On one hand we consider the use of an adapted loss function that dynamically weighs training examples that challenge the classifier. On the other hand we propose a novel two stage approach that judiciously highlights challenging negative samples for learning, while maintaining a balanced training regime not to bias classification results. Overall our approach significantly improves the automation of elephant call detection.