Digital age of vegetation classification: Providing big data and tools to meet big questions in vegetation science

Evens, Julie

Background/Question/Methods

Vegetation science has wholeheartedly embraced the ecological data revolution, and vegetation
scientists have been at the forefront of ecology in developing data archives and databases of
ecological data. The Global Index of Vegetation Databases (GIVD) is one such metadata catalog
of vegetation plot data that at present contains more 3.6 million records. Analyses of more than
one million plot records are now observed in single vegetation publications. In many cases databases
of phylogenetic relatedness and functional traits are also available. Finally, a proliferation of
global and local raster data sets of climate variables, topography, and sometimes soil properties
is emerging to enable correlative analyses of such data. Nonetheless, significant issues remain in
accessing, managing, and analyzing such data to make the best use of such a prominent resource.

Results/Conclusions

One of the fundamental issues is the extreme geographic variability in data. Regional vegetation
data density varies continentally and by country from zero to thousands of plots/km^2. In many
analyses data need to subsampled to reduce the density from local hotspots and achieve a more
balanced sample; in other areas, data are simply inadequate to perform any regional assessment
based on plot data. A second significant problem is reconciling taxonomic treatments and identifying
unique taxa. At regional scales several authoritative treatments exist but are not compatible.
The third significant issue is the lack or inadequacy of geo-referencing for data plots.
This issue complicates correlative analyses and regionalization of data. A fourth challenge can
be to simply operate on data sets with tens of thousands of sample units; software often needs to
be specifically designed to manage data sets many times larger than the memory available even on
large computer networks. While relational databases can store and structure such data sets, they do
not enable the sophisticated analyses employed in modern vegetation analysis. A final problem is the
attribution and acknowledgement of data suppliers; formal methods of recognition, including co-
authorship on publishes analyses, are required to secure the active participation of vegetation
scientists in sharing hard-won data. In this presentation we address current best practices in the
solution of the many identified problems, with examples of both global and local applications.

OOS 48 Abstract - Digital age of vegetation classification: Providing big data and tools to meet big questions in vegetation science