Our goal is to develop, and make freely available, a generic pipeline capable of (i) linking biodiversity occurrence data to species ranges and (ii) forecasting and hindcasting those distributions. While such links for one or a few species is now trivial (indeed a few lines of code is all that is needed), scaling these computations to forecast species distributions 1000s or 100,000s of species remains logistically prohibitive for most researchers. A lack of appropriate tools and a failure to combine tools into an integrated pipeline prevent such scaling. Key challenges include: 1) appropriately scrubbing data to remove taxonomic and geographic errors, 2) identifying clear best practice methods for range modelling applicable across diverse species, 3) innovating range modelling methods that integrate diverse data such as presence only museum collections and abundance-based plot data 4) scaling computationally-intensive range modelling in an HPC environment, and 5) placing the outputs of the products in a phylogenetic context, which is increasingly important to conservation efforts. The talk will focus on our progress toward developing such a pipeline using the Botanical Information and Ecology Network (BIEN) , which has assembled a database of 110,000,000 observations of 300,000+ species of plants and the new world. The BIEN project provides a pre-existing user community spanning museum directors, plot ecologists, trait data, and biodiversity scientists.
Results/Conclusions
The BIEN project contains enough data and ecoinformatics tools to demonstrate scalability, and is used to test key assumptions in conservation biology about the phylogenetic conservatism of species climatic niches and the geographic constancy of diversity hotspots over time.The ongoing research is striving to contribute to (i) scientific infrastructure through the development of a scientific codebase for integrating and standardizing heterogeneous sources of observation data in biodiversity science and (ii) to the production of high-quality species ranges from primary biodiversity data. Refinements of the BIEN pipeline will further the development and enable the public release of a massive and freely accessible database compiling occurrence, community, phylogeny, and trait data for all plants. These products are increasingly becoming available online and will benefit both basic and applied research in biodiversity science particularly in conservation and application to citizen science.