COS 64-6 - Data democratization and NEON: Improving data quality with digital tools

Wednesday, August 14, 2019: 3:20 PM
L004, Kentucky International Convention Center
Cody Flagg, NEON, Matt R.V. Ross, Natural Resource Ecology Laboratory, Colorado State University, Fort Collins, CO, Robert H. Lee, Terrestrial Instrumented Systems, National Ecological Observatory Network, Boulder, CO and Joshua A. Roberti, National Ecological Observatory Network (NEON), Boulder, CO
Background/Question/Methods

The National Ecological Observation Network (NEON) is the first federally supported observatory with a “data-first” mission. In delivering over 80 different data products, it is more important than ever that NEON not only provide high quality data to the community, but also that we empower internal and external users to work with, understand, and analyze these data. During peak sampling periods NEON employs several hundred field staff across 16 different offices to collect, process, and review data prior to publication. Field staff experience can vary widely in this regard. Several internal initiatives focused on controlling and increasing data quality have resulted in a diverse set of digital tools that “democratize” data access and understanding for field staff, the most common internal data user. By “democratizing” data through the development of digital applications, this has moved NEON towards an observatory built on the four pillars of data literacy: finding data, understanding data, analyzing data, and communicating with those data. These concepts are relevant both externally, where we want NEON data users to be data literate, but also internally, where literate data collectors can improve data quality and provide better outreach experiences. Here, we examine the effects on data quality of deploying digital tools and applications for field staff.

Results/Conclusions

In one example discussed, ground beetle capture, introduction of digital applications resulted in a 21% reduction in the total number of mislabeled specimens (pre-digitization error rate: 22%, post-digitization: 0.38%). The number of records with duplicate or conflicting sample identifiers also decreased from 1.8% of total data records (21,128 records entered, before) to 1% of total data records (26,348 records entered, after). In 2018, field staff used digital data quality control tools to identify, report, and correct 16,669 records (out of 138,001) before publication. Automated data quality detection tools identified a further 5,461 records (out of 138,001) that were corrected by field staff before publication. Internally developed data visualization tools have allowed for the identification and correction of hundreds of mapped trees.This type of quality control tool leads to better alignment between NEON’s in-situ and airborne data. Similarly, a digital tool for assessing photographic metadata has prevented hundreds of manually collected digital photos, used for estimating leaf area index, from being published before data can be corrected.