2022 ESA Annual Meeting (August 14 - 19)

COS 281-2 Creating and Using Synthetic Data Sets to Train Machine Learning Models For Species Level Identification of Herpetofauna In-Situ

3:45 PM-4:00 PM
516D
Seth A. Frazer, University of California Santa Barbara / Cheadle Center for Biodiversity and Ecological Restoration;Chris Evelyn, PhD,University of California Santa Barbara / Cheadle Center for Biodiversity and Ecological Restoration (CCBER);Constance Woodman, PhD,Department of Pathobiology, College of Veterinary Medicine and Biomedical Sciences Texas A&M University;
Background/Question/Methods

Machine Learning (ML) has the potential to greatly expand the diversity of species available for ecological field studies. By sorting through field images from a trail camera or citizen scientists, ML models can sort tens of observations of target species from millions of images. This is especially powerful for studying species which are rare or only intermittently active, factors that limit sample size in ecological studies. Image Classifier models, a subset of ML, can recognize a number of species with a single model, even those with physical similarities., given the right quantity and quality of training data. Here we report a method for generating a training dataset which is robust to problems that arise from species which have similar morphology, and which are rarely seen, collected, or photographed. A library of 2,000 training images was created from dorsal photographs of a small number of preserved museum specimens. Taxa chosen, three lizards and one salamander, are seasonally active and range in physical similarity. A training library of 2,000 images was created using a combination of animation, reorientation, and background effects in Adobe Photoshop and Adobe After Effects. Ninety percent (1,800) of training images were then used to train the ML model.

Results/Conclusions

The Google Vertex ML model was then tested against a validation set made of the remaining 10% of the training images (200), then secondly against 50 dorsal images of the target species in the field from researchers and citizen scientists. The final ML model was trained using approximately 400 images per species and 400 background-only images. Each image contained synthetically generated and randomly arranged dirt, grass, leaves, and lighting effects. Once trained, the Google Vertex ML model identified species and background-only images from validation set of 200 images with 98.98% precision and 98.48% recall. These numbers indicate a high degree of accuracy, precision, and reliability. Using the 50 images of target taxa and background-only images from researchers and citizen scientists, model accuracy was 92%. A primary source of error was the lack of available dorsal (top-down) photographs for our target salamander. We will address this with our own field studies by using automated trail cameras that are pointed downward. Results from this method, generated using museum specimens and widely-available software, can make a significant impact on the field of ecology and the conservation of rare taxa.