Dynamical systems models (DSMs) (i.e. systems of differential equations) are frequently used mathematical models of ecological networks. For given initial conditions, these equations can be solved via numerical differentiation, yielding a ‘time-series’ - a sequence of state values (e.g. population counts of predator and prey species) over time.
Neural network models (NNs) have been successfully used in many domains dealing with time-series data (e.g. natural language processing or finance), by training a model on large numbers of sequences and testing generalization performance of the model to novel sequences. Our aim is to explore the generalization capabilities (interpolation and extrapolation) of representations of ecological networks learned by NNs. We perform this study with two simple DSMs: a 2-species predator-prey network (DSM-2) and a 3-species food chain (DSM-3). For each, we sample thousands of initial conditions and use numerical solvers to obtain corresponding time-series, which we separate into training and test sets. We measure interpolation on each DSM by fitting an NN on the training set and measuring performance on the test set, and extrapolation by training on one DSM and measuring performance on the other. We explore how performance varies with (1) variance of initial conditions (2) dataset size.
Results/Conclusions
For the dataset sizes and NN models tested, we find the interpolation performance for each DSM monotonically increases with training set size. Dependence on variance of initial conditions was more complicated, showing a ‘goldilocks’ degree of variance for a given dataset size. We found that extrapolation performance from DSM-2 to DSM-3 did not typically out-perform simple training on DSM-3 with the same dataset size, but we did find positive extrapolation performance from DSM-3 to DSM-2. This evidence suggests a benefit in terms of extrapolation performance to training on over-complicated DSMs. If these results hold in further studies, our work could provide useful information for pre-training NN models on large amounts of simulated data and extrapolating effectively to smaller amounts of real time-series data from actual ecosystems.