PS 41-69 - Open data and accessible services and tools foster science and software collaboration

Wednesday, August 14, 2019
Exhibit Hall, Kentucky International Convention Center
Michele Thornton1, Bruce E. Wilson2, Rupesh Shrestha3, Yaxing Wei1 and Hannah L. Blanco4, (1)Environmental Sciences Division & Climate Change Science Institute, Oak Ridge National Laboratory, Oak Ridge, TN, (2)Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, (3)Environmental Sciences Division, Oak Ridge National Laboratory, (4)Environmental Sciences Division, ORNL DAAC, Oak Ridge, TN
Background/Question/Methods

The objectives of the study are to 1.) compare the usage of varied access methods to open data of ecological significance and to 2) describe the impact of these access methods on the data needs of a broad ecological sciences community. While computer and data archive scientists construct the systems providing access methods, the goal is for a diverse, multidisciplinary community to focus on their science objectives and lessen the time spent with complex file formats and access mechanisms. We examine access patterns for two highly-used sets of access tools provide by the NASA-funded ORNL DAAC. One provides access to an almost 40-year, daily, gridded dataset over North America of several surface weather variables (Daymet) used within the ecological community to understand biometeorological and bioclimatological conditions. The other set of tools provides access to land-based data products from the MODIS/VIIRS satellite instruments (MODIS Subsets). The methods that provide multi-dimensional, multivariate data are committed to be inclusive to accommodate a diverse scientific community. We examine the impact of how varied data access fosters data usage, agency applications, shared source code development, and journal publications relevant to ecologists and educators who leverage these data and services.

Results/Conclusions

The usage categories include download information by format and download type, journal citation and category, additional federal agencies that leverage services, and community development of open software packages. Analysis of download file type show support for providing data in a range of formats. Daymet files in netCDF format have yearly direct downloads of ~35,000 files while pre-derived climatologic files formatted as netCDF and geoTIFF have ~42,000 downloads. CSV file formats for a single geographic location see yearly downloads of over 10 million single granules. MODIS Subsets for a total of ~1 million unique locations were retrieved by ~3,500 unique users for 2018. Downloads by specialized tools, API’s, and servers demonstrate that capabilities provided (subsetting, automation, programmatic access) are valuable to accommodating user needs. The interdisciplinary audience is widened by agencies that develop higher level applications (e.g. GeoDataPortal, spatialEco, zonalDaymet, National Phenology Network, AmeriFlux, FLUXNET), contribute tools that facilitate data service (e.g., DaymetR, MODISTools R, MODISPythonClient, phenor) and educators (GEOG 576, MODIS/VIIRS Classroom Exercises). Impact is assessed by the number and category of journal publications that cite the dataset DOI’s. The dataset citation has increased each year for a total of 230 citations for Daymet and 473 for MODIS Subsets.