Mon, Aug 15, 2022: 5:00 PM-6:30 PM
ESA Exhibit Hall
Background/Question/MethodsUnderstanding species range dynamics is central to maintaining biodiversity in the anthropocene. Quantifying how the distributions of rare and endangered species are affected by climate change and human activities is particularly important because these species are at higher risk of extinction. Species distribution models (SDM) have become a common and powerful tool for analyzing species-environment relationships across geographic space. Although evaluating the distribution of rare species is integral for their conservation, this can be difficult when limited distributional data has been collected. Community science platforms such as iNaturalist have emerged as alternative sources for species occurrence data. Although these observations are often thought to be of lower quality than natural history collections, they may have potential for improving SDMs when there is limited data available for a species. Here we investigate the utility of iNaturalist data for developing SDMs for a rare high elevation plant, Telesonix jamesii. Because methods for modeling rare species are limited in the literature, five different modeling techniques were considered, including profile methods, statistical models, and machine learning algorithms. We compared the performance of SDMs that use only natural history training data and SDMs that use a combination of natural history (herbarium) and iNaturalist training data.
Results/ConclusionsWe found that using an ensemble of iNaturalist and herbarium occurrences as training data proved to be the most accurate approach for all models. Specifically, a random forest model using ensemble training data performed the highest of any model (Kappa = 0.908). All models relied heavily on climate data (Bio9, Bio18), which indicates that this species could come under threat as climate continues to change. Validation datasets affected model evaluation as well. Models using only herbarium data performed better when evaluated with cross validation than validated externally with iNaturalist data. While the ensemble dataset performed the best, it is still important to consider the novelty of an independent testing dataset. External validation is a great tool for model evaluation, but it presents issues when there are limitations in both the training and testing data. As platforms like iNaturalist continue to develop, community-sourced datasets like the one used in this study will only increase in value to the scientific community. This study can serve as a framework for future SDM studies of species with similar data limitations.
Results/ConclusionsWe found that using an ensemble of iNaturalist and herbarium occurrences as training data proved to be the most accurate approach for all models. Specifically, a random forest model using ensemble training data performed the highest of any model (Kappa = 0.908). All models relied heavily on climate data (Bio9, Bio18), which indicates that this species could come under threat as climate continues to change. Validation datasets affected model evaluation as well. Models using only herbarium data performed better when evaluated with cross validation than validated externally with iNaturalist data. While the ensemble dataset performed the best, it is still important to consider the novelty of an independent testing dataset. External validation is a great tool for model evaluation, but it presents issues when there are limitations in both the training and testing data. As platforms like iNaturalist continue to develop, community-sourced datasets like the one used in this study will only increase in value to the scientific community. This study can serve as a framework for future SDM studies of species with similar data limitations.