Bacterial pathogens in surface water present disease risks to aquatic communities and for human recreational activities. Sources of these pathogens include runoff from urban, suburban, and agricultural point and non-point sources, but hazardous microbial levels can occur a significant distance away from the source. The ability to identify runoff source areas for corrective action and to predict where and when problematic pathogen levels will occur is inherently based on predictive models, however, the population dynamics of bacterial pathogens in surface water and some of their influential covariates are often nonstationary, confounding model prediction and interpretation. We implement a model selection approach to compare and contrast time series methods based on stationary processes (e.g., multivariate regression with appropriate time lags) with nonstationary methods that test for an error-correction mechanism indicating whether random walk-type processes are cointegrated.
Results/Conclusions
We analyze fecal indicator bacteria data sets routinely monitored over multiple summers (2003-2009) at South Shore Park in Milwaukee, Wisconsin and associated potential explanatory variables that also have high temporal resolution. Over the course of the summer, pathogen concentrations tend to increase in near-shore freshwater systems due to a variety of factors that are presumed to increase runoff levels and subsequent survival/reproduction of the pathogens (e.g., increase in fertilizer use, fecal inputs, higher rainfall intensity associated with convective storms, warmer air temperatures, warmer water temperatures, etc.). These concurrent trends lead to the potential for false positive associations in the multiple linear regression between the explanatory and the response variables. Compared to the multiple linear regression model, the cointegration approach weights rainfall variables higher in importance for predicting near-shore water pathogen concentrations, although model performance was similar. Both models were very sensitive to lag selection for the explanatory variables.