With the ongoing threat of climate change and the cumulative impact of other disturbances, the management of ecosystems requires a sound understanding of how they respond to forcing over long time periods. Palaeoecological time series provide one of the few ways with which ecologists can track ecosystem responses to environmental change in relatively high temporal resolution. However, the irregular temporal spacing and highly multivariate nature of the data have resulted in tools that are not especially suited to addressing these questions: tools either assume gradual change or threshold-like responses, not both, and further analyses are required to interpret the fitted models to elucidate the effects of environmental change on species composition or community dynamics.
Here we illustrate Latent Dirichlet Allocation (LDA), one of several topic models introduced to the field of computational learning in the last decade. Topic Models categorise documents into groups on the basis of the topics found within them. The words that are most associated with each topic as well as the number of topics are learned from the data. When applied to palaeoecological data, the documents are the samples, the words the species, and the topics are associations of species that tend to co-occur. In LDA the relative frequency of species associations in each sample and the relative frequency of species in each association are described by Dirichlet distributions.
Results/Conclusions
We apply LDA to published palaeoecological time series and compare the results obtained with those reported in the original source. In Foy Lake (Montana) LDA recovers the main periods of change in the 7000-year diatom record and distinguishes between periods of gradual and rapid compositional change with a single model, capturing the period of instability prior to a regime shift in the diatom community at c. 1300 years BP. The resulting species associations simplify the description of the major compositional changes and allows an assessment of the degree to which the associations present post regime shift are novel or previously observed in record. We further investigate the emergence of novel ecosystems via LDA using holocene fossil pollen records from North America extracted from the Neotoma database.
We conclude by considering improvements to our models through the use of dynamic topic models and place these models in the broader class of hierarchical Dirichlet process models which should enable more formal statistical approaches to fitting models and selecting the number of associations in a data set.