2021 ESA Annual Meeting (August 2 - 6)

An image is worth a thousand species: Combining deep neural networks and high-resolution satellite imagery to predict plant biodiversity

On Demand
Lauren Gillespie, Stanford University;
Background/Question/Methods

Human-driven climate change and land use changes are causing worldwide declines in biodiversity and the deterioration of the world’s ecosystem services and natural capital. Current tools to map plant biodiversity risk assessment across space rely on either expert-drawn range maps or bioclimatic species distribution models. Despite decades of efforts to improve these models and maps, three challenges still remain, which we aim to address using deep learning and high-resolution imagery. First, records of plant observations only record plant presences, but not plant absences, with no clear consensus on what an absence would be. The de-facto species distribution model standard, MaxEnt, relies on potentially biased pseudo-absence data generation schemes to overcome this limitation, while our approach relies solely on presence-only data. Second, the vast majority of species distribution models use low-resolution climatic average maps (1-25 Km), failing to capture the environmental variables most influential on local plant community scales such as topography and land-use. Unlike traditional bioclimatic-based methods, our method learns directly from landscape information found in high-resolution satellite imagery to provide the critical micro-geographical information necessary to model plant communities and dispersal-limited species well. Third, while commonly-observed species are typically well-modeled by current methods, the majority of species are rare with few observations, and thus Wallacean shortfall effects lead to incorrect predicted ranges. Our model attempts to overcome the commonness of rarity through jointly modeling plant species and bootstrapping learning for rare species from co-occurring common species. Our model is less prone to overfitting and exhibits better generalization for the thousands of plant species predicted than the state-of-the-art multi-response Random Forest model. Here, we present a joint species distribution model for modeling plant biodiversity across California - a plant biodiversity hotspot - using presence-only data directly from high-resolution satellite imagery data. Specifically, we train a novel convolutional neural network on high-resolution satellite imagery data from NAIP paired with geolocated plant presence observations from GBIF to jointly predict plant species presence across California.

Results/Conclusions

Our “deepbiosphere” model exhibits with significantly higher precision than standard aggregated MaxEnt and on par with bioclimatic Random Forest baselines. However, our model shines with rare species and data-limited regions, such as the Cascades ecoregion, and predicts with significantly higher accuracy for certain poorly sampled environments and species. Finally, we use this model to create an updated plant biodiversity map of California at 250 meters resolution which we hope will aid future biodiversity modeling and conservation efforts.