Two new publications improve the theoretical foundation of distribution modelling
Distribution modelling (DM), a core area of research by the GEco (geo-ecology) group at the Natural History Museum (Univ. of Oslo), has long been criticised for being tenuously rooted in theory. Two papers by Rune Halvorsen, recently published in the peer-reviewed open-access monograph series Sommerfeltia, address important issues in DM theory and methods.
In the first paper, 'A gradient analytic perspective on distribution modelling'
(Sommerfeltia 35, issued December 2012), the distribution modelling process is described as an inductive scientific process with 12 steps, organised into three composite steps: ecological model, data model, and statistical model. Step 8, modelling of the overall ecological response, places DM unambiguously among gradient analysis techniques and motivates for a gradient analytic (GA) perspective on DM. This perspective is taken as a basis for a review of DM concepts, leading to proposals for a revised, unified terminology and a new conceptual modelling framework for DM. This new framework, termed HED, can be used in initial phases of a DM study to formulate a meta-model for factors that influence distributions, and in the analytic phase to guide important choices of methods and options and to assist interpretation of modelling results. The presented theoretical foundation form the basis for a discussion of single steps in the DM process, ending with a list of seven (or so) challenges of particular importance for progress in DM: (1) that more knowledge of patterns of natural variation is needed; (2) that a better mechanistic understanding of causes of patterns of natural variation is needed; (3) that the availability of relevant rasterised explanatory variables needs to be improved; (4) that more studies of patterns at local and micro spatial scales, in addition to multiple-scale studies using DM methods, are needed; (5) that evaluation by independent data should be established as a standard in DM; (6) that further insights into statistical modelling methods and their options, with particular reference to appropriateness for different types of data and DM purposes, are needed; and (7) that DM methods should be incorporated in studies with a broader scope. A main conclusion of the paper is that potentials for improvement of DM methods and practice clearly exist which, if accomplished, are expected to increase the return from DM substantially in terms of contributions that improve our understanding of patterns of natural variation and their causes.
The second paper, 'A strict maximum likelihood explanation of MaxEnt, and some implications for distribution modelling' (Sommerfeltia 36, issued April 2013), provides a detailed examination of the maximum entropy modelling method for distribution modelling (MaxEnt). Since launched in 2004, MaxEnt has established itself as a state-of-the-art method for distribution modelling. A review of 87 recent publications with MaxEnt used for DM reveals existence of a ‘standard MaxEnt practice’ from which few users depart – a set of default options and settings in the user-friendly Maxent software developed by Steven Phillips and co-workers. The review also summarises viewpoints and indications that the current usage of MaxEnt as a black box for DM is likely to be suboptimal, strongly motivating the need for better understanding of how the method really works. The core of the paper consists of a detailed explanation of MaxEnt for ecologists, based upon the gradient analytic perspective on DM. Step by step, the MaxEnt method is derived from maximum likelihood principles and shown to be a sister method to the GLMs with which all ecologists are familiar. A discussion of key issues relating to the MaxEnt black box opens several new perspectives. Most importantly, inclusion of MaxEnt in the family of maximum-likelihood regression methods opens for use of standard tools for model selection and model performance assessment. It is explanined, theoretically and by examples, how the likelihood-ratio and F-ratio tests can be used to compare nested MaxEnt models. Furthermore, standard subset selection methods such as stepwise forward selection of variables, which are enabled by these tests, are outlined and by examples shown to be potentially superior to the shrinkage methods for model selection that are part of the ‘standard MaxEnt practice’.
The possibility for development of a generally applicable ‘consensus MaxEnt practice’ for DM, and elements of such a practice, are discussed and five main additions or amendments to the ʻstandard MaxEnt practiceʼ are suggested: (1) development of flexible, interactive tools to assist the deriving of variables from raw explanatory variables; (2) interactive tools that allow the user freely to combine model selection methods, methods and approaches for internal model performance assessment, and model improvement criteria, into a data-driven modelling procedure; (3) integration of independent presence/absence data into the modelling process, for external model performance assessment, for model calibration, and for model evaluation; (4) new output formats, notably a probability-ratio output format which directly expresses the ʻrelative suitability of one place vs. anotherʼ for the modelled target; and (5) development of options for discriminative use of MaxEnt, i.e., use of MaxEnt with presence/absence data.
Tools for (1–4) are currently being developed by S. Mazzoni together with GEco colleagues.
The paper concludes that the currently most important MaxEnt-related research needs are: (1) comparative studies of strategies for construction of parsimonious sets of derived variables for use in MaxEnt modelling; and (2) comparative tests on independent presence/absence data of the predictive performance of MaxEnt models obtained with different model selection strategies, different approaches for internal model performance assessment, and different model improvement criteria. These research questions are targeted by current GEco research.