do_any reads the output from setup_sdmdata and computes ecological niche models for a species based on an algorithm specified by the user. It fits the model, predicts it into the current environmental layers and calculates basic statistics for model evaluation. In addition to commonly adopted metrics such as AUC and TSS, this package also calculates partial ROC (Peterson et al. 2008; Cobos et al. 2019) . For details on model evaluation see Phillips et al. (2006) and Peterson et al. (2011) . do_any performs one algorithm at a time. do_many runs internally do_any and can be used to run multiple algorithms at a time. Given that there are "no silver bullets in correlative ecological niche modeling" (Qiao et al. 2015) the choice of which algorithm to run is on the user. See Details for a description of how each algorithm supported in this package is implemented.

do_any(species_name, predictors, models_dir = "./models",
  algorithm = c("bioclim"), project_model = FALSE,
  proj_data_folder = "./data/proj", mask = NULL, write_rda = FALSE,
  png_partitions = FALSE, write_bin_cut = FALSE,
  dismo_threshold = "spec_sens", equalize = TRUE, sensitivity = 0.9,
  proc_threshold = 0.5, ...)

do_many(species_name, bioclim = FALSE, domain = FALSE, glm = FALSE,
  mahal = FALSE, maxent = FALSE, maxnet = FALSE, rf = FALSE,
  svmk = FALSE, svme = FALSE, brt = FALSE, ...)

Arguments

species_name

A character string with the species name. Because species name will be used as a directory name, avoid non-ASCII characters, spaces and punctuation marks. Recommendation is to adopt "Genus_species" format. See names in example_occs as an example

predictors

A Raster or RasterStack object with the environmental raster layers

models_dir

Folder path to save the output files. Defaults to "./models"

algorithm

Character string of length 1 specifying the algorithm to be fit: "bioclim", "brt", "domain", "glm", "maxent", "maxnet", "mahal", "svme", "svmk", "rf"

project_model

Logical, whether to project the models to variable sets in proj_data_folder directory

proj_data_folder

Path to directory with projections containing one or more folders with the projection datasets (e.g. "./env/proj/proj1"). This directory should only contain raster files corresponding to the environmental variables. If more than one projection, each projection should be at one directory (e.g. "./env/proj/proj1" and "./env/proj/proj2") and equivalent raster files at diferent subdirectories must have the same names (e.g. "./env/proj/proj1/layer1.asc" and "./env/proj/proj2/layer1.asc")

mask

A SpatialPolygonsDataFrame to be used to mask the models. This mask can be used if the final area of interest is smaller than the area used for model fitting, to save disk space

write_rda

Logical, whether .rda objects with the fitted models will be written

png_partitions

Logical, whether png files will be written

write_bin_cut

Logical, whether binary and cut model files(.tif, .png) should be written

dismo_threshold

Character string indicating threshold (cut-off) to transform model predictions to a binary score as in threshold: "kappa", "spec_sens", "no_omission", "prevalence", "equal_sens_spec", "sensitivity". Default value is "spec_sens"

equalize

Logical, whether the number of presences and absences should be equalized in randomForest and brt

sensitivity

Numeric, value from 0 to 0.9 to indicate the sensitivity value to calculate the threshold. Defaults to 0.9 as in dismo package

proc_threshold

Numeric, value from 0 to 100 that will be used as (E) for partialROC calculations in kuenm_proc. Default is proc_threshold = 5

...

Other arguments from kuenm_proc

bioclim

Execute bioclim algorithm from the dismo implementation with bioclim function

domain

Execute domain from the dismo implementation with domain function

glm

Execute GLM as suggested by the dismo documentation with glm and step

mahal

Execute Mahalanobis distance from the dismo implementation with mahal

maxent

Execute Maxent algorithm from the dismo implementation with maxent function

maxnet

Execute Maxent algorithm from the maxnet implementation with maxnet function

rf

Execute Random forest algorithm from randomForest package with function tuneRF as suggested by the dismo documentation

svmk

Execute Support Vector Machines (SVM) algorithm from kernlab package with ksvm function

svme

Execute Support Vector Machines (SVM) algorithm from e1071 package with best.tune function

brt

Execute Boosted Regression Trees with gbm.step from dismo

Value

Returns a data frame with some key threshold values and evaluation statistics of each algorithm (FNR, FPR, TSSmax, AUC, pROC, FScore, Jaccard dissimilarity etc.) for the selected threshold

Writes on disk a .tif model for each partition of each algorithm

Writes in disk a .csv file with thresholds and evaluation statistics of each algorithm for a given threshold #' @return Writes in disk a .csv file with evaluation statistics for all threshold values

Details

See below for a description on the implementation of the algorithms supported in this package.

Bioclim

Specified by algo = "bioclim" uses bioclim function in dismo package (Hijmans et al. 2017) . Bioclim is the climate-envelope-model implemented by Henry Nix (Nix 1986) , the first species distribution modelling package. It is based on climate interpolation methods and despite its limitations it is still used in ecological niche modeling, specially for exploration and teaching purposes (see also Booth et al. 2014) . In this package it is implemented by the function bioclim, evaluated and predicted using evaluate and predict also from dismo package.

Boosted Regression Trees (BRT)

Specified by algo = "brt", it uses gbm.step function from dismo package. Runs the cross-validation procedure of Hastie et al. (2001) (see also Elith et al. 2009) . It consists in a regression modeling technique combined with the boosting method, a method for combining many simple models. It is implemented by the function gbm.step as a regression with the response variable set to Bernoulli distribution, evaluated and predicted using evaluate and predict from dismo package.

Domain

Specified by algo = "domain" uses domain function from dismo package. Computes point-to-point similarity based on Gower distance between environmental variables (Carpenter et al. 1993) . (Hijmans et al. 2017) state that one should use it with caution because it does not perform well compared to other algorithms (Elith et al. 2006; Hijmans and Graham 2006) . We add that it is a slow algorithm. In this package it is implemented by the function domain, evaluated and predicted using evaluate and predict also from dismo package.

Generalized Linear Model (GLM)

Specified by algo = "glm" runs a GLM with modeling presences and absences as a response variable following a binomial error distribution. It runs a step-wise model selection based on AIC both backward and forward considering all possible combinations of predictor variables in the RasterStack. In this package it is implemented using functions glm and step to fit a model and choose a model by AIC in a stepwise procedure. Model is evaluated and predicted using evaluate function from dismo and predict function from raster package both with argument type = "response" to return values in the scale of the response variable.

Mahalanobis

Specified by algo = "mahal" uses mahal function from dismo package. Corresponds to a distribution model based on Mahalanobis distance, a measure of the distance between a point P and a distribution D (Mahalanobis 1936) . In this package it is implemented by the function mahal, evaluated and predicted using evaluate and predict also from dismo package.

Maximum Entropy (Maxent)

Specified either by algo = "maxent" or algo = "maxnet" corresponding to implementation by dismo (Hijmans et al. 2017) and maxnet (Phillips 2017) packages respectively. Maxent is a machine learning method for modeling species distributions based in incomplete data allowing ENM with presence-only data (Phillips et al. 2006) . If algo = "maxent" model is fit by the function maxent, evaluated and predicted using evaluate and predict also in dismo package. If algo = "maxnet" model is fit by the function maxnet from maxnet package, evaluated using evaluate from dismo package with argument type = "logistic" and predicted using predict function from raster package.

Random Forest

Specified by algo = "rf" uses tuneRF function from randomForest package (Liaw and Wiener 2002) . Corresponds to machine learning regression based on decision trees. In this package uses tuneRF function with the optimal number of variables available for splitting at each tree node (i.e. mtry) found as set by parameter doBest = TRUE. Random Forest model is evaluated with evaluate function from dismo and predicted with predict function from raster package.

Support Vector Machines (SVM)

Specified either by algo = "svme" or algo = "svmk" corresponding to implementation on e1071 (Meyer et al. 2017) and kernlab (Karatzoglou et al. 2004) packages respectively. SVM are supervised learning models that use learning algorithms for classification and regression analysis. In e1071 package SVM is implemented through function best.tune with method set to "svm" which uses RBF-kernel (radial basis function kernel) for classification. In kernlab package SVM is implemented through function ksvm also with RBF-kernel method (in this case the default method "kbfdot"). We expect both implementations to differ only in performance. Both svme and svmk are evaluated with evaluate function from dismo and predicted with predict function from raster package.

References

Booth TH, Nix HA, Busby JR, Hutchinson MF (2014). “Bioclim: The First Species Distribution Modelling Package, Its Early Applications and Relevance to Most Current MaxEnt Studies.” Diversity and Distributions, 20(1), 1-9. ISSN 1472-4642, doi:10.1111/ddi.12144 .

Carpenter G, Gillison AN, Winter J (1993). “DOMAIN: A Flexible Modelling Procedure for Mapping Potential Distributions of Plants and Animals.” Biodiversity & Conservation, 2(6), 667-680. ISSN 1572-9710, doi:10.1007/BF00051966 .

Cobos ME, Peterson AT, Barve N, Osorio-Olvera L (2019). “Kuenm: An R Package for Detailed Development of Ecological Niche Models Using Maxent.” PeerJ, 7, e6281. ISSN 2167-8359, doi:10.7717/peerj.6281 .

Elith J, H. Graham* C, P. Anderson R, Dudík M, Ferrier S, Guisan A, J. Hijmans R, Huettmann F, R. Leathwick J, Lehmann A, Li J, G. Lohmann L, A. Loiselle B, Manion G, Moritz C, Nakamura M, Nakazawa Y, McC. M. Overton J, Townsend Peterson A, J. Phillips S, Richardson K, Scachetti-Pereira R, E. Schapire R, Soberón J, Williams S, S. Wisz M, E. Zimmermann N (2006). “Novel Methods Improve Prediction of Species' Distributions from Occurrence Data.” Ecography, 29(2), 129-151. ISSN 1600-0587, doi:10.1111/j.2006.0906-7590.04596.x .

Elith J, Leathwick JR, Hastie T (2009). “A Working Guide to Boosted Regression Trees.” Journal of Animal Ecology, 77(4), 802-813. ISSN 1365-2656, doi:10/fn6m6v .

Hastie T, Tibshirani R, Friedman J (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Heidelberg.

Hijmans RJ, Graham CH (2006). “The Ability of Climate Envelope Models to Predict the Effect of Climate Change on Species Distributions.” Global Change Biology, 12(12), 2272-2281. ISSN 1354-1013, 1365-2486, doi:10.1111/j.1365-2486.2006.01256.x .

Hijmans RJ, Phillips S, Leathwick J, Elith J (2017). “Dismo: Species Distribution Modeling. R Package Version 1.1-4.” R package version 1.1-4.

Karatzoglou A, Smola A, Hornik K, Zeileis A (2004). “Kernlab - An S4 Package for Kernel Methods in R.” Journal of Statistical Software, 11(9), 1--20.

Liaw A, Wiener M (2002). “Classification and Regression by randomForest.” R News, 2(3), 18-22.

Mahalanobis PC (1936). “On the Generalized Distance in Statistics.” In National Institute of Science of India.

Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2017). “E1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien.” R package version 1.6-8.

Nix HA (1986). A Biogeographic Analysis of the Australian Elapid Snakes. In `Atlas of Elapid Snakes of Australia'.(Ed. R. Longmore.) Pp. 4-15. Australian Flora and Fauna Series No. 7. Australian Government Publishing Service: Canberra.

Peterson AT, Papeş M, Soberón J (2008). “Rethinking Receiver Operating Characteristic Analysis Applications in Ecological Niche Modeling.” Ecological Modelling, 213(1), 63-72. ISSN 0304-3800, doi:10/ctk6cf .

Peterson AT, Soberón J, Pearson RG, Anderson RP, Martínez-Meyer E, Nakamura M (2011). Ecological Niches and Geographic Distributions, number 49 in Monographs in Population Biology. Princeton University Press, Princeton, N.J. ISBN 978-0-691-13686-8 978-0-691-13688-2, OCLC: ocn724664003.

Phillips S (2017). “Maxnet: Fitting 'Maxent' Species Distribution Models with 'Glmnet'.”

Phillips SJ, Anderson RP, Schapire RE (2006). “Maximum Entropy Modeling of Species Geographic Distributions.” Ecological Modelling, 190(3-4), 231-259. ISSN 0304-3800, doi:10.1016/j.ecolmodel.2005.03.026 .

Qiao H, Soberón J, Peterson AT (2015). “No Silver Bullets in Correlative Ecological Niche Modelling: Insights from Testing among Many Potential Algorithms for Niche Estimation.” Methods in Ecology and Evolution, 6(10), 1126-1136.

See also

bioclim in dismo package

domain in dismo package

do_many

evaluate in dismo package

maxent in dismo package

maxnet in maxnet package

mahal in dismo package

predict in dismo package

predict in raster package

Examples

if (FALSE) {
# run setup_sdmdata first from one species in example_occs data
sp <- names(example_occs)[1]
sp_coord <- example_occs[[1]]
sp_setup <- setup_sdmdata(species_name = sp,
                          occurrences = sp_coord,
                          predictors = example_vars,
                          clean_uni = TRUE)

# run bioclim for one species
sp_any <- do_any(species_name = sp,
                 predictors = example_vars,
                 algorithm = "bioclim")

# run do_many
sp_many <- do_many(species_name = sp,
                   predictors = example_vars,
                   bioclim = TRUE)
                   }