Tue, Aug 16, 2022: 10:30 AM-10:45 AM
516A
Background/Question/MethodsMultiple statistical algorithms have been applied in species distribution modeling (SDM) for decades. Due to the shortcomings of species distribution datasets, presence-only methods (such as MaxEnt) have become widely used. However, sampling errors and bias remain challenging issues, particularly for profile-based models. Because these methods are optimized to normal instances, spatial sampling bias or errors could exacerbate the over-representation of some regions in environmental feature space and then cause environmental bias. Here we present itsdm, an R package for species distribution modeling with the Isolation Forest (iForest) algorithm, a presence-only approach that is less affected by sampling errors and bias than other profile-based modeling approaches. Like other anomaly detection methods, iForest distinguishes outliers from normal instances with a decision tree structure, but is unique in that it uses the path in the tree structure to calculate scores based on the assumption that anomalies are few and different so they have shorter paths than normal instances. Because iForest is optimized to outliers, it is less sensitive to the noises in normal samples.
Results/Conclusionsitsdm provides a wrapper and complete workflow for running iForest-based species distribution models, including convenient tools for model explanation and post-modeling analysis. Among the provided tools are routines for evaluating the importance of environmental variables using Shapley values, including the ability to evaluate the relative importance of environment variables to individual presence observations. Another set of features enables the calculation of response curves for environmental variables, and the generation of spatial partial dependence maps, which indicate the variability in environmental responses over space. We illustrate how the package can be applied using a virtual species as an example, and in the process demonstrate the robustness of the iForest modeling approach to sampling bias, as well as the potential for using Shapley values to explain and compare between model variants including ensemble models. itsdm, its documentation, and example data are hosted and available on CRAN.
Results/Conclusionsitsdm provides a wrapper and complete workflow for running iForest-based species distribution models, including convenient tools for model explanation and post-modeling analysis. Among the provided tools are routines for evaluating the importance of environmental variables using Shapley values, including the ability to evaluate the relative importance of environment variables to individual presence observations. Another set of features enables the calculation of response curves for environmental variables, and the generation of spatial partial dependence maps, which indicate the variability in environmental responses over space. We illustrate how the package can be applied using a virtual species as an example, and in the process demonstrate the robustness of the iForest modeling approach to sampling bias, as well as the potential for using Shapley values to explain and compare between model variants including ensemble models. itsdm, its documentation, and example data are hosted and available on CRAN.