2020 ESA Annual Meeting (August 3 - 6)

PS 48 Abstract - R from start to finish: Organizing your dissertation work with a reproducibility mindset using R and RStudio

Javiera Rudolph, Department of Biology, University of Florida, Gainesville, FL
Background/Question/Methods

As ecologists, and generally as scientists, we are trained to ask questions and seek for answers. We focus on collecting data and sometimes, we receive training for analyzing such data. Reproducibility, however, is not always the focus and is sometimes a topic that comes up much later in the analysis and writing phase. Over the last few years as a graduate student, I’ve helped multiple researchers set up reproducible workflows and environments, always being faced by the same question: where do I even start? Most of us already use R for our data analysis, but how many times have we tried running code in someone else’s computer or even our own, only to find that it doesn’t run anymore? Indeed, lack of reproducibility is a growing concern and multiple tools and frameworks have been developed to aid in this crisis. Yet, the overwhelming number of resources for reproducible research is intimidating and can actually drive scientists to just try the next time instead of now. From my personal experience as a collaborator and author, I will use my own work as an example for writing a reproducible dissertation and showcase the top 5 tools in R and RStudio to get your project started in a reproducible workflow.

Results/Conclusions

The first and hardest step in any reproducible research workflow is finding the right name for your project. With that name, we can move forward and create an actual project or package. The first tool is using the `usethis` package to create a project and a default directory structure. The same package is used to link an online git repository for version control. Up next, the `renv` package will manage dependencies and make sure that upgrading or installing new packages won’t break your code. Third, the `here` package manages file path referencing across platforms. Starting your manuscript writing with `rmarkdown` will allow you to keep track of different versions of text and incorporate your analysis to the manuscript. As for the fifth and last reproducibility item in our toolkit, you can use a spreadsheet for data entry, but any sort of management and cleaning should be done with a script and the raw data should be included in your project. Collaborating with researchers at various stages, I’ve optimized this process as a simple introduction to start building your reproducible project. It has been implemented in several projects, including my own dissertation.