Wed, Aug 04, 2021:On Demand
Background/Question/Methods
Our overall goal is to predict the distribution of arthropod communities using only remotely sensed environmental covariates, to make it possible to carry out large-scale, continuous mapping of arthropod biodiversity. In this project, we collected 121 Malaise-trap samples from 96 sample points in and around the HJ Andrews Experimental Forest, Oregon. We shotgun-sequenced each Malaise-trap sample and used Kelpie software to carry out in-silico PCR of the COI DNA barcode gene (BF3BR2 primer pair) to extract 889 Operational Taxonomic Units (OTU), which we filtered to 303 OTUs with ≥ 6 incidences. The resulting sample by species (OTU) table was paired with Landsat, GIS, and Lidar environmental covariates in a joint species distribution model (multivariate probit model). This model (sjSDM package in R) is one of the first to apply deep neural networks (DNN) on ecological community data. The model uses a DNN on the environmental covariates, paired with a linear model on spatial position and a sophisticated species-species correlation matrix imposed to direct regularization. Model tuning was carried out using crossvalidation, and we measured model performance in terms of explanatory power, and predictive power.
Results/Conclusions The model currently has a mean explanatory AUC (area under ROC curve) over all species of 0.82, and a mean predictive AUC of 0.73. Our results show that it is possible to fit a model with reasonable predictive performance on large numbers of data-poor species from a mass-sampling campaign. Preliminary tests with linear models fit with sjSDM showed that elevation and forest age are important covariates for predictive performance. Next steps will be continued technical improvements to model fitting and to use explanable AI (xAI) to derive mechanistic understanding of the fitted DNN model.
Results/Conclusions The model currently has a mean explanatory AUC (area under ROC curve) over all species of 0.82, and a mean predictive AUC of 0.73. Our results show that it is possible to fit a model with reasonable predictive performance on large numbers of data-poor species from a mass-sampling campaign. Preliminary tests with linear models fit with sjSDM showed that elevation and forest age are important covariates for predictive performance. Next steps will be continued technical improvements to model fitting and to use explanable AI (xAI) to derive mechanistic understanding of the fitted DNN model.