COS 82-2 - Scientist's guide to developing explanatory hypotheses

Thursday, August 15, 2019: 8:20 AM
L010/014, Kentucky International Convention Center
James B. Grace, U.S. Geological Survey Wetland and Aquatic Research Center, Lafayette, LA and Kathryn M. Irvine, Northern Rocky Mountain Science Center, US Geological Survey, Bozeman, MT
Background/Question/Methods

Scientists commonly look to statisticians for instructions on how to conduct quantitative studies and analyze their data. What many fail to realize is that the responsibility for forming clearly interpretable explanatory hypotheses lies with the scientist, not the statistician. As an illustration of the issue, Burnham and Anderson (2002, pp), in their classic text on multimodel inference, admonish scientists for not doing enough “critical thinking”. The advice for the scientist is to “think hard” about their candidate models. However, the rules and procedures for how to think hard are not to be found in most statistics books (including Burnham and Anderson’s). There have been substantial advances in recent years that provide a reasonably complete system for evaluating the causal content of candidate models. In this talk, we illustrate the fundamental problems associated with drawing scientific inferences from regression models. We then go on to explicate the essential ingredients for developing explanatory hypotheses and the associated process of causal analysis. Data from a study of post-fire ecosystem recovery are used to illustrate our points.

Results/Conclusions

For the case where there is a set of possible explanatory variables and some response variable of primary interest, we illustrate the extreme difficulty associated with formulating a set of competing models with clearly interpretable causal content. We use graphical analysis to reveal the fundamental problem inherent in regression models of the form y = f(X). Using this equational framework, there is no capacity for representing testable hypotheses that explain the correlations among predictors. Using data from our example study, we show why the absence of this capacity creates unresolvable ambiguity and renders such models suitable only for describing associations (i.e., descriptive instead of explanatory inference). We go on to present a set of principles that provide scientists with guidance for how to specify hypotheses for explanatory studies. Methods of analysis appropriate for explanatory hypotheses are briefly discussed as well. Finally, the importance of sequential learning as a method of building mechanistic understanding is emphasized.