Tue, Aug 16, 2022: 10:30 AM-10:45 AM
518B
Background/Question/Methods
The analysis of co-occurrence data is over a century old, resulting in ~80 metrics of association. These indices – Jaccard, Sørensen-Dice, and Simpson being the most common ones – are used to estimate species co-existence and beta diversity. However, a constant battle among researchers relates to their unruly behavior resulting in patterns that substantially contradict ecological theories, insights and expectations. These indices are sensitive to the prevalences of the entities they describe and that this invalidates their interpretability. The last few decades attempted to deal with such situations with the standardization of counts or the indices themselves. These approaches seem to address part of the problem, leaving us to wonder where the source of the real problem is.As generations of researchers have tested various approaches to fix the existing indices without complete success, it is time to recognize that our whole effort in estimating similarity in cooccurrence data has followed a wrong approach: we ignored the fact that a reliable metric should emerge by mapping the underlying ecological mechanisms to mathematical models and statistical distributions.
Results/Conclusions
We have now developed a novel metric of association (Science Advances 8, eabj9204, 2022) out of an established Hypergeometric distribution that corrects for the pervasive flaws in cooccurrence analysis and properly estimates the departure from nullity with an interpretable degree of association, called alpha. Being insensitive to prevalence, alpha correctly characterize positive and negative associations across the full spectrum of prevalence values. Mapping against the cumulative probability, we show that the same value of Jaccard’s index (J) can mean extreme positive affinity in one example and extreme negative affinity in another example. However, alpha exhibits the characteristics of a reliable statistic: null expectation always maps to alpha=zero, and a particular value always maps to the same cumulative probability. Published datasets reanalyzed with both alpha and J yield profoundly different biological inferences. For example, a published analysis using J contradicted predictions of the island biogeography theory finding that community stability increased with increasing physical isolation. Reanalysis of the same dataset with alpha MLE reversed that result and supported theoretical predictions. We found similarly marked effects in reanalyses of antibiotic cross-resistance and human disease biomarkers. Our index is not merely an improvement; its use changes data interpretation in fundamental ways.
The analysis of co-occurrence data is over a century old, resulting in ~80 metrics of association. These indices – Jaccard, Sørensen-Dice, and Simpson being the most common ones – are used to estimate species co-existence and beta diversity. However, a constant battle among researchers relates to their unruly behavior resulting in patterns that substantially contradict ecological theories, insights and expectations. These indices are sensitive to the prevalences of the entities they describe and that this invalidates their interpretability. The last few decades attempted to deal with such situations with the standardization of counts or the indices themselves. These approaches seem to address part of the problem, leaving us to wonder where the source of the real problem is.As generations of researchers have tested various approaches to fix the existing indices without complete success, it is time to recognize that our whole effort in estimating similarity in cooccurrence data has followed a wrong approach: we ignored the fact that a reliable metric should emerge by mapping the underlying ecological mechanisms to mathematical models and statistical distributions.
Results/Conclusions
We have now developed a novel metric of association (Science Advances 8, eabj9204, 2022) out of an established Hypergeometric distribution that corrects for the pervasive flaws in cooccurrence analysis and properly estimates the departure from nullity with an interpretable degree of association, called alpha. Being insensitive to prevalence, alpha correctly characterize positive and negative associations across the full spectrum of prevalence values. Mapping against the cumulative probability, we show that the same value of Jaccard’s index (J) can mean extreme positive affinity in one example and extreme negative affinity in another example. However, alpha exhibits the characteristics of a reliable statistic: null expectation always maps to alpha=zero, and a particular value always maps to the same cumulative probability. Published datasets reanalyzed with both alpha and J yield profoundly different biological inferences. For example, a published analysis using J contradicted predictions of the island biogeography theory finding that community stability increased with increasing physical isolation. Reanalysis of the same dataset with alpha MLE reversed that result and supported theoretical predictions. We found similarly marked effects in reanalyses of antibiotic cross-resistance and human disease biomarkers. Our index is not merely an improvement; its use changes data interpretation in fundamental ways.