COS 10-6 - On sampling and sampling effort in microbial ecology

Monday, August 12, 2019: 3:20 PM
L005/009, Kentucky International Convention Center
Mario E. Muscarella1, Ashkaan K. Fahimipour2 and James O'Dwyer1, (1)Department of Plant Biology, University of Illinois, Urbana, IL, (2)National Oceanic and Atmospheric Administration, Santa Cruz, CA
Background/Question/Methods

One major challenge in microbial ecology is our ability to fully sample communities. Molecular approaches rely on extracting and sampling genomes and fragments of nucleic acid from gut, soil, water, or other environmental samples. But molecular surveys often only capture a small percentage of the total organisms in a sample and two environments may have different total abundances. Likewise, observations may not be independent because organisms can be linked due to species interactions and genes are packaged within genomes. Furthermore, these samples often reflect mixed pools of organisms where we may be sampling from active and dormant populations or even intact organisms and relic DNA remaining from dead organisms. As a result, we need to better understand how sampling effort variation, sampling from communities which differ in abundances, and sampling from mixed pools affect our ability to describe microbial diversity. Here we use simulations based on microbial sequence abundance distributions to determine the impact of each of these sampling issues. We model our abundance distributions based on observed data across ecosystem types. In addition, we derive expectations for sampling individual taxa given their true abundances and various assumptions about organism and gene independence.

Results/Conclusions

We found that, regardless of diversity metric, sampling effort has a major impact on our ability to accurately compare communities from molecular censuses. For example, if we repeatedly sample the same community with a 10% effort, we should expect ca. 15 – 20% variation between replicate samples, and if we only sample with a 1% effort our expected variation increases to > 50%. Results differ, however, depending on the assumed independence of organisms or genes, and non-independence decreases the expected variation. While these sampling efforts seem low, they represent typical microbiome studies where 105 – 106 sequences are generated for microbial communities which often contain 106 – 108 organisms. Last, we show that due to the processes generating mixed pools of organisms, our downstream analyses can be either robust or highly sensitive to the sampling of mixed pools when we intend to base our conclusions on only the active or intact organisms. For example, we found that mixed pools generated under stochastic processes are more robust than those generated deterministically. Our results highlight the importance of sampling effort in molecular surveys of microbial communities and the trade-off between effort and the ability to accurately address biological questions.