Species distribution modeling has witnessed advancements over the last two decades as a result of the increasing availability of species occurrence records and environmental datasets as well as the development of sophisticated algorithms which aids computation. Correlative species distribution models (hereafter SDMs) are used to forecast a species’ potential distribution by relating occurrence data to environmental data. Despite their widespread use, it remains unclear whether different forms of SDM models have a major effect on predictive performance. We used assembled historical occurrence records of four amphibian species [California red-legged frog (n=103), foothill yellow-legged frog (n=300), Arroyo toad (n=102), and Western Spadefoot toad (n=155)] and 19 bioclimatic variables from publicly available databases to assess the performance of two SDM approaches [Maximum entropy modeling (MaxEnt) and random forest (RF)]. Both models require information on areas without occurrence records to fit, so we generated “pseudo-absence” data in equal numbers to occurrence records for each of the modeled species. We used the area under the curve (AUC) and true skill statistics values to evaluate the predictive performance of models (TSS). We used variable importance plots and partial dependence plots to visualize what bioclimatic variables are most important in making predictions.
Results/Conclusions
Results indicate that both approaches predict reasonably well (AUC > 0.9 for most of the models and TSS ranging between 0.5 and 0.7). However, RF appeared to outperform MaxEnt. Critically, the two approaches differed in the environmental factors they identified as most important in predicting distributions. Although both methods performed comparatively well, the choice between the two approaches depend on the available occurrence data. MaxEnt models are often used where there are only presence records available and RF is best used for modeling when both presence and “true absence” of species locations are available. Our results highlight the need to use multiple approaches to predict distributions and tailor the overall approach to the species and data being used.