2020 ESA Annual Meeting (August 3 - 6)

LB 5 Abstract - The price of admission: Considerations for machine learning methods in camera trapping

Will Rogers1, Scott Creel2, Elias Rosenblatt3 and Matthew S. Becker3, (1)Department of Ecology, Montana State University, Bozeman, MT, (2)Wildlife, Fish and Environmental Studies, Sveriges Lantbruksuniversitet, Umea, Sweden, (3)Zambian Carnivore Programme, Mfuwe, Zambia
Background/Question/Methods

Because camera trap surveys provide huge amounts of data (and because motion sensitive cameras yield many images that are not useful), manual image classification creates a large time burden. While image classification can be expedited through citizen science, such resources are not ubiquitously available. Convolutional neural networks (CNNs) have been used as tools to replace manual classification with machine learning. Despite the demonstrated success of CNNs in classifying camera trap images, broad adoption of CNNs for ecological monitoring has been slow. New camera trapping studies can use the transfer learning ability of published CNNs to obtain far better performance than would be possible if a CNN was trained only with new data. To evaluate the performance of transfer learning, we retrained two CNNs originally developed with images from North America (NA) and Serengeti National Park (SNP), Tanzania by using camera trap images (mainly ungulates and carnivores) from Kafue National Park (KNP) and South Luangwa National Park (SLNP), Zambia. We retrained CNNs with 2.5% to 25% of the available images per species and evaluated model performance with 25% of the images per species (pooled across KNP and SLNP). Models were then compared using the accuracy and confidence associated with species classifications.

Results/Conclusions

CNNs can learn to recognize new species, but the retrained SNP model performed better than the retrained NA model, highlighting the effect of species overlap and complexity in transfer learning performance. Both models performed better with larger training datasets, and accuracy was poor for both models (<85% correct) for species with less than 500 training images. Constraining the confidence that was accepted for image classifications, the best NA and SNP models were 95% accurate for 9.6% and 36% of testing data, respectively. The application of machine learning methods here demonstrates two important considerations for monitoring programs: first, the utility of these models is dependent upon the structure and prior training of the CNN to be used, and the amount of data one is willing or able to use for training (and thus sacrifice from analysis); second, even for modest camera trapping efforts like that presented here, retraining existing CNNs provides a tremendous tool to reduce the labor of manual classification with little (or no) loss of accuracy. We urge that retrained CNNs be considered much more broadly by the ecological community – their efficacy will only increase while the opportunity cost of not using these methods grows.