This function reads the output of final_model for each species and multiple algorithms and builds a simple ensemble model by calculating the mean of the final models in order to obtain one model per species. It also calculates median, standard deviation and range (maximum - minimum)

ensemble_model(species_name, occurrences, lon = "lon", lat = "lat",
  models_dir = "./models", final_dir = "final_models",
  ensemble_dir = "ensemble", proj_dir = "present", algorithms = NULL,
  which_ensemble = c("average"), which_final = c("raw_mean"),
  performance_metric = "TSSmax", dismo_threshold = "spec_sens",
  consensus_level = 0.5, png_ensemble = TRUE, write_occs = FALSE,
  write_map = FALSE, scale_models = TRUE, uncertainty = TRUE, ...)

Arguments

species_name

A character string with the species name. Because species name will be used as a directory name, avoid non-ASCII characters, spaces and punctuation marks. Recommendation is to adopt "Genus_species" format. See names in example_occs as an example

occurrences

A data frame with occurrence data. Data must have at least columns with latitude and longitude values of species occurrences. See example_occs as an example

lon

The name of the longitude column. Defaults to "lon"

lat

The name of the latitude column. Defaults to "lat"

models_dir

Folder path to save the output files. Defaults to "./models"

final_dir

Character. Name of the folder to save the output files. A subfolder will be created, defaults to "final_model"

ensemble_dir

Character string, name of the folder to save the output files. A subfolder will be created. Defaults to "ensemble"

proj_dir

Character. The name of the subfolder with the projection. Defaults to "present" but can be set according to the other projections (i.e. to execute the function in projected models)

algorithms

Character vector specifying which algorithms will be processed. Note that it can have length > 1, ex. c("bioclim", "rf"). Defaults to NULL: it no name is given it will process all algorithms present in the final_models folder

which_ensemble

Which method to apply consensus between algorithms will be used? Current options are:

best

Selects models from the best-performing algorithm. A performance metric must be specified (performance_metric). Parameter which_final indicates which model will be returned

average

Computes the means between models. Parameter which_final indicates which model will be returned

weighted_average

Computes a weighted mean between models. A performance metric must be specified. Parameter which_final indicates which model will be returned

median

Computes the median between models. Parameter which_final indicates which model will be returned

frequency

Computes the mean between binary models, which is analogous to calculating a relative consensus

consensus

Computes a binary model with the final consensus area. A consensus_level must be specified

pca

Computes a PCA between the models for each algorithm and extract the first axis, that summarizes variation between them

which_final

Which final_model will be used to calculate the average, weighted average or median ensembles? See final_model

performance_metric

Which performance metric will be used to define the "best" algorithm any in c("AUC", "pROC", "TSSmax", "KAPPAmax", "CCR", "F_score", "Jaccard")

dismo_threshold

Character string indicating threshold (cut-off) to transform raw_mean final models to binary for frequency and consensus methods. The options are from threshold: "kappa", "spec_sens", "no_omission", "prevalence", "equal_sens_spec", "sensitivity". Default value is "spec_sens"

consensus_level

Which proportion of binary models will be kept when creating bin_consensus

png_ensemble

Logical. If TRUE writes png files of the ensemble models

write_occs

Logical. If TRUE writes the occurrence points on the png file of the ensemble model

write_map

Logical. If TRUE adds a map contour to the png file of the ensemble models

scale_models

Logical. Whether input models should be scaled between 0 and 1

uncertainty

Calculates the uncertainty between models, as a range (maximum - minimum)

...

Other parameters from writeRaster

Value

Retuns a RasterStack with all generated statistics written in the ensemble_dir subfolder

Writes on disk raster files with the median, mean and standard deviation and range of the assembled models

If png_ensemble = TRUE writes .png figures in the ensemble_dir subfolder

See also

Examples

if (FALSE) {
# run setup_sdmdata
sp <- names(example_occs)[1]
sp_coord <- example_occs[[1]]
sp_setup <- setup_sdmdata(species_name = sp,
                          occurrences = sp_coord,
                          predictors = example_vars,
                          clean_uni = TRUE)

# run do_many
sp_many <- do_many(species_name = sp,
                   predictors = example_vars,
                   bioclim = TRUE)

# run final_model
sp_final <- final_model(species_name = sp,
                        algorithms = c("bioclim"),
                        select_partitions = TRUE,
                        select_par = "TSSmax",
                        select_par_val = 0,
                        which_models = c("raw_mean"),
                        consensus_level = 0.5,
                        overwrite = TRUE)

# run ensemble model
sp_ensemble <- ensemble_model(species_name = sp,
                              occurrences = sp_coord,
                              overwrite = TRUE)
}