The statistical evaluation of ecological theory – A continuum of refutation/confirmation

Aho, Ken; Aho, Ken; Aho, Ken

Background/Question/Methods

Overarching perspectives in statistical hypothesis testing and model evaluation (frequentist/information-theoretic/Bayesian) have led to methods that vary widely with respect to their underlying philosophy and intended purpose. Unfortunately, these constraints are often poorly understood by ecologists, leading to model misapplication and misinterpretation. For example, many theoretical frameworks in ecology are attempts to mathematically depict particular natural conditions; for instance, “default”, or “no effect” patterns. This is done because an explicit effect (often 0) can be set for H₀, whereas H_A can define only “some effect” distinct from H₀. As a result, many ecologists have quantified the validity of said models/predictions by setting them as null hypotheses in frequentist significance tests. Frequentist significance tests, however, no not allow empirical confirmation of null (or alternative) hypotheses. Instead, under the conventional severe-falsificationist framework we “reject” or provisionally “fail to reject H_0.”

We can compare the perspectives of widely disparate statistical methods using the likelihood ratio test statistic, X² (two times the difference in H_A and H₀ log-likelihoods). Under a conventional significance testing perspective the line of demarcation between H₀ and H_A is the critical value X² = 1.96². Conversely, for AIC and BIC this line is X² = 2, and X²= log(n), respectively. For practical and comparative purposes, I. J. Good’s “Bayes/non-Bayes compromise” has the demarcation line X² = Φ^-1{1 – 0.25/ (n)^0.5}, where Φ^-1 (p) denotes the probit function at probability p. These lines can be used to graphically demonstrate the divergence of the aforementioned methods as n increases.

By considering effect size, and thus defining the distribution of X² under H_A, our approach can also be used to intuitively demonstrate the strong consistency of BIC and Good’s compromise in model selection. Strong consistency requires that as sample size approaches infinity the true model, from a group of models, will be selected. Our approach can also be extended to consider the behavior of metrics for models with widely differing numbers of parameters.

Results/Conclusions

The demarcation lines/surfaces for significance testing, AIC, BIC, and Good’s compromise represent locations along a conceptual continuum of hypothesis refutation/confirmation. AIC and significance testing do not consider sample size, thus as sample size grows large these methods will reject H₀ with probability 1. These methods are therefore strongly refutative. On the other hand, BIC and Good's compromise demand more evidence against H₀ for rejection as sample size increases. These methods were intended to confirm the correct hypothesis, H_A or H₀.

SYMP 9-4 - The statistical evaluation of ecological theory – A continuum of refutation/confirmation