Wed, Aug 04, 2021:On Demand
Background/Question/Methods
Effective near-term forecasting facilitates evaluation of model predictions against observations and is of pressing need to inform environmental decision making and effect societal change. Despite this imperative, we lack a set of robust, standardized, and general mathematical tools for evaluating probabilistic forecasts in ecology, impeding quantitative model comparison.
We address this gap by bringing to bear an extensive literature on probabilistic forecast evaluation from diverse fields including climatology, economics, and epidemiology. Recognizing the breadth of ecological data and appreciating the variety of tools developed, rather than lobby for a specific singular metric for evaluation, we cover the range of options, highlight mathematical concepts to follow, and note decision points for practitioners to allow easy application of general principles to specific forecasting endeavors. We discuss six functions for evaluating (scoring) forecasts as well as Frequentist, Bayesian, and likelihood approaches to analyzing the data and models.
We exemplify these forecasting concepts with over 30 years of data on the desert pocket mouse (Chaetodipus penicillatus) in Portal, AZ by building and comparing three models: random walk, first-order autocorrelation (AR(1)), and cyclic AR(1).
Results/Conclusions We leverage studies in other domains to guide our decision making in evaluating the desert pocket mouse forecasts. Given our questions and system (e.g., recurrent collections), we implement Bayesian prequential modeling and evaluate forecast performance with the log and rank probability scores. These scores allow for comparison to likelihood methods and incorporation of full predictive distributions, respectively. Using simulations, we highlight patterns (e.g., bias, excess precision) that graphical analyses are helpful in diagnosing, as in our desert pocket mouse study. Throughout the validation period testing, the random walk and cyclic AR(1) were both well calibrated to the rodent data, albeit with a slight excess of variance, as shown by the peaks in their Probability Integral Transform (PIT) histograms. Comparatively, the AR(1)’s PIT histogram showed strong modality at the upper range, indicating negative bias. The cyclic AR(1) was also the best model across the suite of rolling-origin evaluations, according to both scores. For the final test, however, the AR(1) performed best because its negative bias better matched the realized data over the final test period. This provides an important lesson: the best long-term model was not the best in the this specific short-term evaluation. We conclude by highlighting how ecology can continue to learn from and help drive forecasting science.
Results/Conclusions We leverage studies in other domains to guide our decision making in evaluating the desert pocket mouse forecasts. Given our questions and system (e.g., recurrent collections), we implement Bayesian prequential modeling and evaluate forecast performance with the log and rank probability scores. These scores allow for comparison to likelihood methods and incorporation of full predictive distributions, respectively. Using simulations, we highlight patterns (e.g., bias, excess precision) that graphical analyses are helpful in diagnosing, as in our desert pocket mouse study. Throughout the validation period testing, the random walk and cyclic AR(1) were both well calibrated to the rodent data, albeit with a slight excess of variance, as shown by the peaks in their Probability Integral Transform (PIT) histograms. Comparatively, the AR(1)’s PIT histogram showed strong modality at the upper range, indicating negative bias. The cyclic AR(1) was also the best model across the suite of rolling-origin evaluations, according to both scores. For the final test, however, the AR(1) performed best because its negative bias better matched the realized data over the final test period. This provides an important lesson: the best long-term model was not the best in the this specific short-term evaluation. We conclude by highlighting how ecology can continue to learn from and help drive forecasting science.