2018 ESA Annual Meeting (August 5 -- 10)

PS 28-69 - Making ecology more reproducible: Case studies and lessons from across the data-intensive sciences

Wednesday, August 8, 2018
ESA Exhibit Hall, New Orleans Ernest N. Morial Convention Center
Justin Kitzes, Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, Daniel Turek, Mathematics and Statistics, Williams College, Williamstown, MA and Fatma Deniz, Berkeley Institute for Data Science, University of California Berkeley
Background/Question/Methods

Researchers across ecology and other data-intensive sciences increasingly recognize the importance of making their published research easily reproducible by their colleagues. The use of open software, open access publication, and the release of data and code are important steps in this direction, and are likely familiar to most ecologists. These simple steps, however, are not sufficient to guarantee reproducibility for more complex research projects involving many collaborators, massive data sets, multiple interacting codebases, legal restrictions on data and code, and other similar challenges. The practices necessary for achieving reproducibility in these scenarios are understood in broad terms. However, there has been little effort to gather concrete evidence on how researchers have actually succeeded, or failed, in overcoming such challenges.

To help fill this gap, we surveyed the state of reproducible research practices across the data-intensive sciences, including ecology, with the goal of better understanding common research approaches, major challenges, and prospects for the future. We invited narrative case study contributions from academic researchers across disciplines, at all career stages from graduate students through full professors. Researchers were asked to consider a single recent project, approximately the scale of one published paper or software product, and describe how they organized their data, code, and other inputs into a coherent workflow that supported reproducibility.

Results/Conclusions

We collected 31 detailed case studies from contributors across ecology, biology, environmental science, neuroscience, engineering, economics, and other disciplines. Each unique case study provided specific suggestions for researchers engaged in similar work, which cannot easily be summarized. However, many common themes also emerged. All contributors wrote their own computer code in support of their analyses, with 55% using Python and 42% using R, and over 80% of contributors used a formal version control system for their work, most commonly git.

Six themes were repeatedly raised as critical criteria for reproducibility: version control code, share data openly, automate analyses, document processes, test everything, and use free and open tools. Researchers frequently encountered similar challenges in achieving reproducibility across disciplines, leading to requests for better training in reproducibility-robust tools, improved configuration and build systems for portably packaging software, greater adoption of software testing, greater academic adoption of industry-led data science tools, and standards around scrubbed or representational data.

The 31 case studies, along with a guide to reproducible research practices, were published in The Practice of Reproducible Research (UC Press, 2018), which is also available online at http://practicereproducibleresearch.org/.