Ecologists have long been interested in ordination, that is, ordering ecological data according to some underlying gradient. For instance, one could consider a species survey along the elevational gradient of a mountainside. Early on, it was realized that Principal Component Analysis (PCA) could be a powerful, agnostic tool in this effort: when data following such a gradient is analyzed with PCA, the first two axes invariably depict a characteristic arch (or "horseshoe"), which, for example, traces through the sites in order of elevation. For a period of time this arch was viewed as a statistical artifact, but due in part to an increasing understanding of the underlying mathematics, more recent efforts have shifted away from "correcting" the ordination coordinates and toward viewing the arch as informative in its own right. Nevertheless, the presence of the arch still presents challenges for extracting the optimal ordering of the data, especially when considering real ecological data, which is often noisy and small. We introduce a new approach for identifying the correct ordering of one-dimensional data with minimal assumptions, demonstrating its efficacy on both simulated and empirical datasets.
Results/Conclusions
The ability to easily identify one-dimensional gradients in ecological data using a PCA is a oft-overlooked use for this otherwise widespread technique. Unfortunately, moving from simply identifying such a gradient to actually ordering the data in question has proven more difficult. Utilizing a simulation framework that allows maximum flexibility in implementing levels of variation across site and species distributions, we introduce a method which is able to recover correct orderings for a wide range of simulated species distributions. We further demonstrate the usefulness of this technique with empirical datasets, extracting a known ordering from a nationwide plant survey and extracting an ordering for a previously un-analyzed ecological dataset for which we then explore potentially explanatory ecological correlates. Using the simulation framework, we further numerically demonstrate some bounds for order recoverability and consider possible extensions into data corresponding to gradients in more than one dimension.
As ecological datasets increase in resolution and size and are increasingly supplemented with meta-information (e.g. taxonomy, traits, etc.) about the species being observed, this approach to ordination could allow better inference of the factors driving gradients in species distributions.