R/do_any.R
, R/do_many.R
, R/model_fit.R
model_fit.Rd
do_any
reads the output from setup_sdmdata
and
computes ecological niche models for a species based on an algorithm
specified by the user. It fits the model, predicts it into the current
environmental layers and calculates basic statistics for model evaluation. In
addition to commonly adopted metrics such as AUC and TSS, this package also
calculates partial ROC
(Peterson et al. 2008; Cobos et al. 2019)
. For details
on model evaluation see
Phillips et al. (2006)
and
Peterson et al. (2011)
. do_any
performs one algorithm at a time. do_many
runs internally
do_any
and can be used to run multiple algorithms at a time.
Given that there are "no silver bullets in correlative ecological
niche modeling" (Qiao et al. 2015)
the choice of which
algorithm to run is on the user. See Details for a description of
how each algorithm supported in this package is implemented.
do_any(species_name, predictors, models_dir = "./models",
algorithm = c("bioclim"), project_model = FALSE,
proj_data_folder = "./data/proj", mask = NULL, write_rda = FALSE,
png_partitions = FALSE, write_bin_cut = FALSE,
dismo_threshold = "spec_sens", equalize = TRUE, sensitivity = 0.9,
proc_threshold = 0.5, ...)
do_many(species_name, bioclim = FALSE, domain = FALSE, glm = FALSE,
mahal = FALSE, maxent = FALSE, maxnet = FALSE, rf = FALSE,
svmk = FALSE, svme = FALSE, brt = FALSE, ...)
A character string with the species name. Because species
name will be used as a directory name, avoid non-ASCII characters, spaces and
punctuation marks.
Recommendation is to adopt "Genus_species" format. See names in
example_occs
as an example
A Raster or RasterStack object with the environmental raster layers
Folder path to save the output files. Defaults to
"./models
"
Character string of length 1 specifying the algorithm to
be fit: "bioclim
", "brt
",
"domain
", "glm
", "maxent
", "maxnet
", "mahal
",
"svme
", "svmk
", "rf
"
Logical, whether to project the models to variable sets
in proj_data_folder
directory
Path to directory with projections containing one or more folders with the projection datasets (e.g. "./env/proj/proj1"). This directory should only contain raster files corresponding to the environmental variables. If more than one projection, each projection should be at one directory (e.g. "./env/proj/proj1" and "./env/proj/proj2") and equivalent raster files at diferent subdirectories must have the same names (e.g. "./env/proj/proj1/layer1.asc" and "./env/proj/proj2/layer1.asc")
A SpatialPolygonsDataFrame to be used to mask the models. This mask can be used if the final area of interest is smaller than the area used for model fitting, to save disk space
Logical, whether .rda objects with the fitted models will be written
Logical, whether png files will be written
Logical, whether binary and cut model files(.tif, .png) should be written
Character string indicating threshold (cut-off) to
transform model predictions to a binary score as in
threshold
:
"kappa
", "spec_sens
", "no_omission
",
"prevalence
", "equal_sens_spec
",
"sensitivity
". Default value is "spec_sens
"
Logical, whether the number of presences and absences should be equalized in randomForest and brt
Numeric, value from 0 to 0.9 to indicate the sensitivity value to calculate the threshold. Defaults to 0.9 as in dismo package
Numeric, value from 0 to 100 that will be used as (E)
for partialROC calculations in kuenm_proc
. Default is
proc_threshold = 5
Other arguments from kuenm_proc
Execute bioclim algorithm from the dismo implementation
with bioclim
function
Execute domain from the dismo implementation with
domain
function
Execute GLM as suggested by the dismo documentation with
glm
and step
Execute Mahalanobis distance from the dismo implementation
with mahal
Execute Maxent algorithm from the dismo implementation
with maxent
function
Execute Maxent algorithm from the maxnet implementation
with maxnet
function
Execute Random forest algorithm from randomForest package
with function tuneRF
as suggested by the
dismo documentation
Execute Support Vector Machines (SVM) algorithm from
kernlab package with ksvm
function
Execute Support Vector Machines (SVM) algorithm from e1071
package with best.tune
function
Execute Boosted Regression Trees with
gbm.step
from dismo
Returns a data frame with some key threshold values and evaluation statistics of each algorithm (FNR, FPR, TSSmax, AUC, pROC, FScore, Jaccard dissimilarity etc.) for the selected threshold
Writes on disk a .tif model for each partition of each algorithm
Writes in disk a .csv file with thresholds and evaluation statistics of each algorithm for a given threshold #' @return Writes in disk a .csv file with evaluation statistics for all threshold values
See below for a description on the implementation of the algorithms supported in this package.
Specified by algo = "bioclim"
uses bioclim
function in dismo package (Hijmans et al. 2017)
.
Bioclim is the climate-envelope-model implemented by Henry Nix
(Nix 1986)
, the first species distribution
modelling package. It is based on climate interpolation methods and despite
its limitations it is still used in ecological niche modeling, specially for
exploration and teaching purposes
(see also Booth et al. 2014)
. In this package it is
implemented by the
function bioclim
, evaluated and predicted using
evaluate
and predict
also from
dismo package.
Specified by algo = "brt"
, it uses gbm.step
function from dismo package. Runs the cross-validation procedure of
Hastie et al. (2001)
(see also Elith et al. 2009)
. It consists in a
regression modeling technique combined with the boosting method, a method for
combining many simple models. It is implemented by the function
gbm.step
as a regression with the response variable set
to Bernoulli distribution, evaluated and predicted using
evaluate
and predict
from
dismo package.
Specified by algo = "domain"
uses domain
function
from dismo package. Computes point-to-point similarity based on Gower
distance between environmental variables
(Carpenter et al. 1993)
.
(Hijmans et al. 2017)
state that one should use it with
caution because it does not perform well compared to other algorithms
(Elith et al. 2006; Hijmans and Graham 2006)
. We add that it is
a slow algorithm. In this package it is implemented by the function
domain
, evaluated and predicted using
evaluate
and predict
also from
dismo package.
Specified by algo = "glm"
runs a GLM with modeling presences and
absences as a response variable following a binomial error distribution. It
runs a step-wise model selection based on AIC both backward and forward
considering all possible combinations of predictor variables in the
RasterStack. In this package it is implemented using functions glm
and
step
to fit a model and choose a model by AIC in a stepwise procedure.
Model is evaluated and predicted using evaluate
function from dismo and predict
function from
raster package both with argument type = "response"
to return
values in the scale of the response variable.
Specified by algo = "mahal"
uses mahal
function
from dismo package. Corresponds to a distribution model based on
Mahalanobis distance, a measure of the distance between a point P and a
distribution D (Mahalanobis 1936)
. In this
package it is implemented by the function mahal
,
evaluated and predicted using evaluate
and
predict
also from dismo package.
Specified either by algo = "maxent"
or algo = "maxnet"
corresponding to implementation by dismo
(Hijmans et al. 2017)
and maxnet
(Phillips 2017)
packages respectively. Maxent is a
machine learning method for modeling species distributions based in
incomplete data allowing ENM with presence-only data
(Phillips et al. 2006)
. If algo = "maxent"
model
is fit by the function maxent
, evaluated and predicted
using evaluate
and predict
also in
dismo package. If algo = "maxnet"
model is fit by the function
maxnet
from maxnet package, evaluated using
evaluate
from dismo package with argument
type = "logistic"
and predicted using predict
function from raster package.
Specified by algo = "rf"
uses tuneRF
function from randomForest package
(Liaw and Wiener 2002)
. Corresponds to machine
learning regression based on decision trees. In this package uses
tuneRF
function with the optimal number of
variables available for splitting at each tree node (i.e. mtry
) found
as set by parameter doBest = TRUE
. Random Forest model is evaluated
with evaluate
function from dismo and predicted
with predict
function from raster package.
Specified either by algo = "svme"
or algo = "svmk"
corresponding to implementation on e1071
(Meyer et al. 2017)
and kernlab
(Karatzoglou et al. 2004)
packages respectively. SVM are
supervised learning models that use learning algorithms for classification
and regression analysis. In e1071 package SVM is implemented through
function best.tune
with method set to "svm
"
which uses RBF-kernel (radial basis function kernel) for classification. In
kernlab package SVM is implemented through function
ksvm
also with RBF-kernel method (in this case the
default method "kbfdot
"). We expect both implementations to differ
only in performance. Both svme
and svmk
are evaluated with
evaluate
function from dismo and predicted with
predict
function from raster package.
Booth TH, Nix HA, Busby JR, Hutchinson MF (2014).
“Bioclim: The First Species Distribution Modelling Package, Its Early Applications and Relevance to Most Current MaxEnt Studies.”
Diversity and Distributions, 20(1), 1-9.
ISSN 1472-4642, doi:10.1111/ddi.12144
.
Carpenter G, Gillison AN, Winter J (1993).
“DOMAIN: A Flexible Modelling Procedure for Mapping Potential Distributions of Plants and Animals.”
Biodiversity & Conservation, 2(6), 667-680.
ISSN 1572-9710, doi:10.1007/BF00051966
.
Cobos ME, Peterson AT, Barve N, Osorio-Olvera L (2019).
“Kuenm: An R Package for Detailed Development of Ecological Niche Models Using Maxent.”
PeerJ, 7, e6281.
ISSN 2167-8359, doi:10.7717/peerj.6281
.
Elith J, H. Graham* C, P. Anderson R, Dudík M, Ferrier S, Guisan A, J. Hijmans R, Huettmann F, R. Leathwick J, Lehmann A, Li J, G. Lohmann L, A. Loiselle B, Manion G, Moritz C, Nakamura M, Nakazawa Y, McC. M. Overton J, Townsend Peterson A, J. Phillips S, Richardson K, Scachetti-Pereira R, E. Schapire R, Soberón J, Williams S, S. Wisz M, E. Zimmermann N (2006).
“Novel Methods Improve Prediction of Species' Distributions from Occurrence Data.”
Ecography, 29(2), 129-151.
ISSN 1600-0587, doi:10.1111/j.2006.0906-7590.04596.x
.
Elith J, Leathwick JR, Hastie T (2009).
“A Working Guide to Boosted Regression Trees.”
Journal of Animal Ecology, 77(4), 802-813.
ISSN 1365-2656, doi:10/fn6m6v
.
Hastie T, Tibshirani R, Friedman J (2001).
The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
Springer Heidelberg.
Hijmans RJ, Graham CH (2006).
“The Ability of Climate Envelope Models to Predict the Effect of Climate Change on Species Distributions.”
Global Change Biology, 12(12), 2272-2281.
ISSN 1354-1013, 1365-2486, doi:10.1111/j.1365-2486.2006.01256.x
.
Hijmans RJ, Phillips S, Leathwick J, Elith J (2017).
“Dismo: Species Distribution Modeling. R Package Version 1.1-4.”
R package version 1.1-4.
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004).
“Kernlab - An S4 Package for Kernel Methods in R.”
Journal of Statistical Software, 11(9), 1--20.
Liaw A, Wiener M (2002).
“Classification and Regression by randomForest.”
R News, 2(3), 18-22.
Mahalanobis PC (1936).
“On the Generalized Distance in Statistics.”
In National Institute of Science of India.
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2017).
“E1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien.”
R package version 1.6-8.
Nix HA (1986).
A Biogeographic Analysis of the Australian Elapid Snakes. In `Atlas of Elapid Snakes of Australia'.(Ed. R. Longmore.) Pp. 4-15. Australian Flora and Fauna Series No. 7.
Australian Government Publishing Service: Canberra.
Peterson AT, Papeş M, Soberón J (2008).
“Rethinking Receiver Operating Characteristic Analysis Applications in Ecological Niche Modeling.”
Ecological Modelling, 213(1), 63-72.
ISSN 0304-3800, doi:10/ctk6cf
.
Peterson AT, Soberón J, Pearson RG, Anderson RP, Martínez-Meyer E, Nakamura M (2011).
Ecological Niches and Geographic Distributions, number 49 in Monographs in Population Biology.
Princeton University Press, Princeton, N.J.
ISBN 978-0-691-13686-8 978-0-691-13688-2, OCLC: ocn724664003.
Phillips S (2017).
“Maxnet: Fitting 'Maxent' Species Distribution Models with 'Glmnet'.”
Phillips SJ, Anderson RP, Schapire RE (2006).
“Maximum Entropy Modeling of Species Geographic Distributions.”
Ecological Modelling, 190(3-4), 231-259.
ISSN 0304-3800, doi:10.1016/j.ecolmodel.2005.03.026
.
Qiao H, Soberón J, Peterson AT (2015).
“No Silver Bullets in Correlative Ecological Niche Modelling: Insights from Testing among Many Potential Algorithms for Niche Estimation.”
Methods in Ecology and Evolution, 6(10), 1126-1136.
if (FALSE) {
# run setup_sdmdata first from one species in example_occs data
sp <- names(example_occs)[1]
sp_coord <- example_occs[[1]]
sp_setup <- setup_sdmdata(species_name = sp,
occurrences = sp_coord,
predictors = example_vars,
clean_uni = TRUE)
# run bioclim for one species
sp_any <- do_any(species_name = sp,
predictors = example_vars,
algorithm = "bioclim")
# run do_many
sp_many <- do_many(species_name = sp,
predictors = example_vars,
bioclim = TRUE)
}