Mon, Aug 02, 2021:On Demand
Background/Question/Methods
The volume and accessibility of biodiversity data have surged in recent decades, particularly driven by widespread museum specimen digitization initiatives and the birth of large-scale community science platforms. Although open access to large volumes of biodiversity data is a logical step towards biodiversity synthesis from local to global scales, the real value of big data is in its use, not volume. Effective use of these digitized data requires the integration of disconnected datasets (in the many thousands) and disparate data streams (such as observation-only vs. specimen-based data). A growing number of studies focus on gaps and biases in the data themselves, but the scientific impacts and patterns of data use have not been well quantified. We carried out a computational text analysis and review of 4,035 studies published from 2003 to 2019 that used data mediated by the world’s largest biodiversity data network, the Global Biodiversity Information Facility (GBIF). We asked: 1) Is data growth matched by research use?; 2) Are certain data types more used and by whom?; and 3) What research topics are studies and have research foci changed through time?
Results/Conclusions GBIF-mediated data has grown by 1,150% since 2007, with 1.6 billion records currently available. Increases strongly driven by rapid growth in community science-generated data (11% of data in 2007, currently 65%). Museum digitization efforts also played a strong role, with 187.7 million specimens mobilized since 2007. Research use has similarly risen, with 723 publications published in 2019 alone compared to 148 total publications from 2003-2009. Citable digital object identifiers (DOIs) were issued with each GBIF data download since 2016 to facilitate both data attribution and reproducibility, though a minority of publications properly cite a DOI (38% in 2019). Data integration facilitated global science, both in study scope (69% of studies from 2016-2019 spanned more than one continent) and in authorship (affiliations from across the world). However, legacies of scientific colonialism remain, with proportionally more research on biodiversity of Global South authored by researchers in the Global North. Although centered on biological subfields, GBIF-enabled research extended across all major scientific disciplines. Continued developments are needed, including continued data creation, digitization, and tools for effectively linking disparate data into an extended framework that integrates a constellation of data streams.
Results/Conclusions GBIF-mediated data has grown by 1,150% since 2007, with 1.6 billion records currently available. Increases strongly driven by rapid growth in community science-generated data (11% of data in 2007, currently 65%). Museum digitization efforts also played a strong role, with 187.7 million specimens mobilized since 2007. Research use has similarly risen, with 723 publications published in 2019 alone compared to 148 total publications from 2003-2009. Citable digital object identifiers (DOIs) were issued with each GBIF data download since 2016 to facilitate both data attribution and reproducibility, though a minority of publications properly cite a DOI (38% in 2019). Data integration facilitated global science, both in study scope (69% of studies from 2016-2019 spanned more than one continent) and in authorship (affiliations from across the world). However, legacies of scientific colonialism remain, with proportionally more research on biodiversity of Global South authored by researchers in the Global North. Although centered on biological subfields, GBIF-enabled research extended across all major scientific disciplines. Continued developments are needed, including continued data creation, digitization, and tools for effectively linking disparate data into an extended framework that integrates a constellation of data streams.