2021 ESA Annual Meeting (August 2 - 6)

Developing and evaluating a robust image recognition system based on deep learning for monitoring wildlife ecology in Japanese forests

On Demand
Yusuke Uehara, Kanazawa Institute of Technology;
Background/Question/Methods

Wildlife camera images are useful for monitoring wildlife ecology without frequent habitat visits. However, it has been a serious problem that ecologists take much time to recognize (detect, classify and enumerate) wildlife from a huge number of images. In particular, it is difficult to recognize images of wildlife taken in the forest because the appearance of forests and wildlife changes greatly depending on the season, time, and weather even in the same place. Deep learning-based image recognition is the most powerful technology that helps ecologists by reducing the time it takes to recognize images. Although several studies have proposed systems based on this technology, we are focusing on developing robust image recognition systems, especially for wildlife in forests those appearance is significantly changed depending on the season, time and weather. Our system has three functions: presence/absence recognition, species classification and number enumeration. Presence/absence recognition and species classification are realized with ResNet-50, a kind of CNN(Convolutional Neural Network). Number enumeration is realized by region detection of individual wildlife and enumeration of those regions. The region detection is implemented with Mask R-CNN, a CNN-based instance segmentation technique. We conduct an experiment to evaluate the effectiveness of our system using camera images of wildlife taken in different seasons, times, and weather.

Results/Conclusions

The system is evaluated using approximately 15,000 wildlife camera images taken in the forest at the foot of Mt. Hakusan in Japan. These images were taken every day, from day to night and in a variety of weather conditions to evaluate the robustness for recognizing significantly changing appearance. Ten major species of large and medium-sized wildlife inhabiting the forest are selected for recognition. In the experiment, the system is trained based on deep learning algorithms using about 12,000 images, and about 3,000 images not used in the training are evaluated. As a result, the accuracy of presence/absence recognition is 96%, meaning that our system can reduce about 96% of the wasted work of checking images in the absence of wildlife. Regarding the species classification, the accuracy is 87%. Analysis of 13% misclassified images found that most images are difficult to classify even for humans. Regarding the number enumeration, the exact match accuracy is 79%, and 99% if a plus or minus 1 error is acceptable. This evaluation confirmed that the effectiveness of our system for recognizing camera images of wildlife taken in the forest changing significantly.