2020 ESA Annual Meeting (August 3 - 6)

PS 48 Abstract - Effects of sample size and network depth on a deep learning approach to species distribution modeling

Donald Benkendorf and Charles P. Hawkins, Watershed Sciences and Ecology Center, Utah State University, Logan, UT
Background/Question/Methods

Deep learning algorithms have improved predictive model performance in a variety of disciplines because of their ability to approximate complex functions. However, the amount of data and depth of the neural network needed to improve model performance is not well understood and may depend on many factors associated with the specific field of research. Ecologists rely on accurate species distribution models to inform conservation and management efforts. Here, we present the first study to systematically examine the effects of sample size and network depth on the performance of species distribution models built with artificial neural networks. Specifically, our objective was to use a large freshwater macroinvertebrate dataset to assess the effects of sample size and neural network depth on model performance (measured by the true skill statistic, TSS). A secondary objective was to compare the performance of neural network and random forest models.

Results/Conclusions

We found that, on average, deeper network architectures (2-6 hidden layers) consistently led to slightly higher model performance (mean TSS=0.39 for all depths) than a 1 hidden layer neural network (mean TSS=0.36) on validation data when trained with a large sample size (10,000 sites). However, random forest models generally performed as well or slightly better (mean TSS=0.40) than deep network models. There was no clear or consistent benefit of using deep neural networks with smaller sample sizes (100 and 1,000 sites). Our results suggest that, given sufficiently big data, increasing the number of hidden layers in a neural network can potentially improve species distribution model performance. As datasets become larger and high performance computing resources become more available, a deep learning approach to species distribution modeling is likely to be used more frequently.