COS 95-5 - Insights and approaches using deep-learning to classify wildlife

Thursday, August 15, 2019: 2:50 PM
L005/009, Kentucky International Convention Center
Zhongqi Miao1, Kaitlyn M Gaynor2, Jiayun Wang3, Ziwei Liu4, Oliver C. Muellerklein1, Mohammad S. Norouzzadeh5, Alex McInturff6, Rauri Bowie7, Ran Nathan8, Stella X. Yu3 and Wayne M. Getz1, (1)Environmental Science, Policy, and Management, University of California Berkeley, Berkeley, CA, (2)National Center for Ecological Analysis and Synthesis, University of California, Santa Barbara, Santa Barbara, CA, (3)Vision Science, University of California, Berkeley, Berkeley, CA, (4)The Chinese University of Hong Kong, Hong Kong, CA, China, (5)Computer Science, University of Wyoming, Laramie, WY, (6)Environmental Science, Policy, and Management, University of California, Berkeley, Berkeley, CA, (7)Museum of Vertebrate Zoology, University of California, Berkeley, Berkeley, CA, (8)Department of Ecology, Evolution and Behavior, The Hebrew University of Jerusalem, Jerusalem, Israel
Background/Question/Methods

Machine learning, particularly deep-learning algorithms that employ convolutional neural networks (CNNs), is the breakthrough technology of the current half-decade when it comes to identifying all sorts of objects from images. To most scientists outside of the artificial intelligence realm, this identification process is somewhat mysterious: a black-box method that provides no insights on how the identification is being achieved. In the interests of scientists being able to apply deep-learning methods more effectively and more efficiently to their particular fields of investigation, some understanding of the mechanisms involved is needed.

This is certainly true for ecologists who have captured millions of images remotely, using satellites or movement-triggered cameras installed in the field; and processing of these images may involve tens of thousands of man-hours executed at great expense. To demystify aspects of artificial intelligence and hence facilitating automated visual image processing, we deconstruct the features that are used by a CNN that we trained to identify animal species in more than 100,000 annotated wildlife images obtained in Mozambique. This is the first time, to the best of our knowledge, that this kind of deconstruction has been undertaken on wildlife classification.

Results/Conclusions

Here we outline the current state of the art methods and present results obtained in training a CNN to classify 20 African wildlife species with an overall accuracy of 87.5% from a dataset containing 111,467 images. We demonstrate the application of a gradient-weighted class-activation-mapping (Grad-CAM) procedure to extract the most salient pixels in the final convolution layer. We show that these pixels highlight features in particular images that in some cases are similar to those used to train humans to identify these species. Further, we used mutual information methods to identify the neurons in the final convolution layer that consistently respond most strongly across a set of images of one particular species, and we then interpret the features in the image where the strongest responses occur. We also used hierarchical clustering of feature vectors associated with each image to produce a visual similarity dendrogram of identified species and to provide a cogent view of how a machine seems to “perceive” similarities and differences among species. Finally, we evaluated how images that were not part of the training set fell within our dendrogram when these images were one of the 20 species “known” to our CNN in contrast to where they fell when these images were “unknown” to our CNN.