PS 88-185 - Theory-guided data science improves understanding and predictions of lake phosphorus dynamics

Friday, August 16, 2019
Exhibit Hall, Kentucky International Convention Center
Paul Hanson1, Cayelan Carey2, Xiaowei Jia3 and Vipin Kumar3, (1)Center for Limnology, University of Wisconsin, Madison, WI, (2)Biological Sciences, Virginia Tech, Blacksburg, VA, (3)Computer Science, University of Minnesota, Minneapolis, MN
Background/Question/Methods

The traditions of modeling in ecology can be viewed as falling on two axes – one that is data driven and one that is mechanistic. Each has their advantages and limitations. Machine learning techniques, such as “deep learning”, show promise for pattern recognition and classification in fields with extremely large volumes of data. However, data volumes in ecology tend to be relatively small, resulting in over-parameterization by machine learning techniques. In contrast, process models have a long tradition in ecology as a means of instantiating ecological knowledge. However, process models are challenged to capture complex ecological interactions that result in complex signals in observational data, and as a result are easily biased or over fit to the data. Theory Guided Data Science (TGDS) is a rapidly growing technique for melding machine learning with process modeling. We used TGDS to predict 35 years of lake phosphorus dynamics in Lake Mendota, Wisconsin. We used a simple process model that includes phosphorus loading to the lake, sedimentation and burial, recycling, and export to downstream ecosystems. We compared the skill of the process model with that of a recurrent neural network (RNN) and the TGDS approach, implemented as a Process Guided Recurrent Neural Network (PGRNN).

Results/Conclusions

Of the three models, the PGRNN provided the most accurate predictions for lake phosphorus dynamics. PGRNN also discovered bias in the process model, as well as trends and patterns that were missed by the RNN alone. PGRNN indicated that three processes were missing from the process model, and these were related to decreasing phosphorus load to the lake through time, an annual temperature-dependent component of phosphorus recycling, and more abrupt changes driven by seasonal mixing. With the more accurate model, we estimated phosphorus retention to by 73% for the lake. To reduce surface water phosphorus concentrations from eutrophic to mesotrophic levels, a 70% reduction in phosphorus loading for three or more decades.