Microbial communities amplify the familiar ecological problem of pattern & scale (sensu Simon Levin). To understand how microbial communities function in materials recycling and how microbial community structure and function respond to climate change, it's important to know the phylogenetic scale of investigation corresponding to the pattern of interest - how should microbial sequences be grouped for a coherent picture of microbial community structure & function? Here, I present a new method for factoring microbial community data - and, in fact, any ecological community dataset - by phylogenetic factorization. Phylogenetic factorization is a hierarchical clustering model which identifies the edges in the phylogeny along which putative functional ecological traits arose and iteratively splits the phylogeny along these edges into functional ecological groups with similar responses to environmental meta-data.
Results/Conclusions
Phylogenetic factorization of soil microbiome datasets reveals phylogenetic splits invisible to taxonomic analysis, providing the best-explained functional ecological groups associated with changes in pH, Nitrogen concentration, and more. Identified phylogenetic features can be compared and cross-validated across datasets, and reproducible groups of microbes corresponding to features on the phylogeny enable researchers to explore the genomic differences underlying the microbial functional groups. An R package, 'phylofactor', is available and is built around generalized linear modelling, allowing researchers the freedom to define their pattern of interest and objectively discover the phylogenetic scale of investigation.