In support of open and reproducible science, funding agencies and journal publishers require that ecologists make their research data products accessible. To an ecologist facing this requirement for the first time, selecting a data repository, preparing the data for archive, and appropriately documenting their data may be a challenge. To support this process, the National Science Foundation (NSF) funded the Environmental Data Initiative (EDI), which maintains the EDI Data Repository. EDI serves the LTER Network (from which EDI grew), LTREB, MSB, and OBFS grant recipients, and others receiving NSF funding. With a combined 60+ years of experience curating and archiving ecological data, EDI’s staff of curators are a great resource to ecologists striving to accelerate environmental science by enabling data re-use.
Results/Conclusions
Education and outreach is a core piece of EDI’s mission. EDI conducts frequent training events in person or as webinars. To make data documentation easier, EDI has developed an open source tool, the EML Assembly Line, which is widely used by participants in EDI’s training sessions. Webinar topics have included tutorials for git, R for cleaning and manipulating datasets, and web services for programmatic access to data from the Repository. EDI has also engaged synthesis working groups to create a common data model for species community data to ease the burden of harmonizing datasets for meta-analysis.
The EDI Repository contains over 42,000 datasets. It stands apart from other repositories by requiring that datasets being uploaded into the Repository pass a small suite of basic checks to assure data and metadata are complete and in agreement. Its support for dataset versions is well-suited to ongoing time-series collections. Data entered in to the Repository are documented using Ecological Metadata Language (EML), which facilitates the capture of detailed metadata. Data from the Repository can be easily ingested into analytical tools for exploratory analysis, a feature appreciated by researchers. The high quality of EDI data and metadata ensures its reusability.