Databases such as the Global Biodiversity Information Facility (GBIF) give users access to massive amounts of species occurrence data. The aggregation, and public dissemination, of such data provide the opportunity to investigate spatial distributions of species, higher-level taxa or ecological communities at previously impossible scales. Such databases are particularly useful when investigating patterns amongst ecological communities, where large volumes of data on multiple species are required. However, much of the data in these databases consist of records of known occurrences of species and do not contain any direct information about where a species has been searched for but not found. Normally extensive spatial filtering and generalisation of these data must occur before aggregating species observations to community-level species lists.
Here we present two new methods for estimating macroecological properties directly from sparse presence-only species occurrence data, thereby sidestepping the need for spatial filtering and generalisation. Both are based around a shift in the way we view these kinds of data for macroecological purposes. Rather than aggregating to species lists we work directly from binary classifications of randomly drawn pairs of species observations. From these we are able to derive estimates of pair-wise turnover in species composition and of species richness.
Results/Conclusions
We contrast the predictions from these new methods with more traditional methods of estimating community turnover and species richness. Comparisons show good agreement between the predictions from each method and cross-validation with independent datasets also shows good results, suggesting that these new methods are an effective solution to modelling community-level attributes using presence-only data such as those accessible through GBIF.
We subsequently used one of these new methods and the entire GBIF database to model the community turnover of three biological groups (vertebrates, plants and invertebrates) at 1km2 resolution, globally. These models provided the foundation for two new global biodiversity indicators which, once paired with time-series habitat change information, are capable of generating predictions of biodiversity change through time. The new indicators, leveraging massive amounts of biological information, were made possible only through these advances in modelling methods, highlighting the power of this new way of inferring macroecological properties when using sparse presence-only biological datasets.