2018 ESA Annual Meeting (August 5 -- 10)

PS 29-71 - Environmental Data Initiative (EDI): Enabling reproducible ecology and environmental science

Wednesday, August 8, 2018
ESA Exhibit Hall, New Orleans Ernest N. Morial Convention Center
Kristin Vanderbilt1, Corinna Gries2, Mark S. Servilla3, Duane Costa3, Susanne Grossman-Clarke2, Paul Hanson2, Margaret O'Brien4, Colin A. Smith2 and Robert Waide3, (1)University of New Mexico, Albuquerque, NM, (2)Center for Limnology, University of Wisconsin, Madison, WI, (3)Biology, University of New Mexico, Albuquerque, NM, (4)Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA
Background/Question/Methods

In support of open and reproducible science, funding agencies and journal publishers require that ecologists make their research data products accessible. To an ecologist facing this requirement for the first time, selecting a data repository, preparing the data for archive, and appropriately documenting their data may be a challenge. To support this process, the National Science Foundation (NSF) funded the Environmental Data Initiative (EDI), which maintains the EDI Data Repository. EDI serves the LTER Network (from which EDI grew), LTREB, MSB, and OBFS grant recipients, and others receiving NSF funding. With a combined 60+ years of experience curating and archiving ecological data, EDI’s staff of curators are a great resource to ecologists striving to accelerate environmental science by enabling data re-use.

Results/Conclusions

Education and outreach is a core piece of EDI’s mission. EDI conducts frequent training events in person or as webinars. To make data documentation easier, EDI has developed an open source tool, the EML Assembly Line, which is widely used by participants in EDI’s training sessions. Webinar topics have included tutorials for git, R for cleaning and manipulating datasets, and web services for programmatic access to data from the Repository. EDI has also engaged synthesis working groups to create a common data model for species community data to ease the burden of harmonizing datasets for meta-analysis.

The EDI Repository contains over 42,000 datasets. It stands apart from other repositories by requiring that datasets being uploaded into the Repository pass a small suite of basic checks to assure data and metadata are complete and in agreement. Its support for dataset versions is well-suited to ongoing time-series collections. Data entered in to the Repository are documented using Ecological Metadata Language (EML), which facilitates the capture of detailed metadata. Data from the Repository can be easily ingested into analytical tools for exploratory analysis, a feature appreciated by researchers. The high quality of EDI data and metadata ensures its reusability.