% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ellipsoid_selection.R
\name{ellipsoid_selection}
\alias{ellipsoid_selection}
\title{ellipsoid_selection: Performs models selection for ellipsoid models}
\usage{
ellipsoid_selection(
  env_train,
  env_test = NULL,
  env_vars,
  nvarstest,
  level = 0.95,
  mve = TRUE,
  env_bg = NULL,
  omr_criteria,
  parallel = FALSE,
  ncores = NULL,
  comp_each = 100,
  proc = FALSE,
  proc_iter = 100,
  rseed = TRUE
)
}
\arguments{
\item{env_train}{A data frame with the environmental training data.}

\item{env_test}{A data frame with the environmental testing data.
Default is NULL.}

\item{env_vars}{A vector with the names of environmental variables used in
the selection process. To help choosing which variables to use see
\code{\link[tenm]{correlation_finder}}.}

\item{nvarstest}{A vector indicating the number of variables to fit the
ellipsoids during model selection.}

\item{level}{Proportion of points to be included in the ellipsoids,
equivalent to the error (E) proposed by Peterson et al. (2008).}

\item{mve}{Logical. If \code{TRUE}, a minimum volume ellipsoid will be computed.
using \code{\link[MASS]{cov.rob}} from \pkg{MASS}. If \code{FALSE}, the covariance
matrix of the input data will be used.}

\item{env_bg}{Environmental data to compute the approximated prevalence
of the model, should be a sample of the environmental layers of
the calibration area.}

\item{omr_criteria}{Omission rate criteria: the allowable omission rate for
the selection process. Default is NULL (see details).}

\item{parallel}{Logical. If \code{TRUE}, computations will run in parallel.
Default is \code{F}.}

\item{ncores}{Number of cores to use for parallel processing. Default uses
all available cores minus one.}

\item{comp_each}{Number of models to run in each job in parallel computation.
Default is 100.}

\item{proc}{Logical. If \code{TRUE}, a partial ROC test will be run.}

\item{proc_iter}{Numeric. Total iterations for the partial ROC bootstrap.}

\item{rseed}{Logical. If \code{TRUE}, set a random seed for partial ROC bootstrap.
Default is \code{TRUE}.}
}
\value{
A data.frame with the following columns:
\itemize{
\item "fitted_vars": Names of variables that were fitted.
\item "nvars": Number of fitted variables
\item "om_rate_train": Omission rate of the training data.
\item "non_pred_train_ids": Row IDs of non-predicted training data.
\item "om_rate_test"': Omission rate of the testing data.
\item "non_pred_test_ids": Row IDs of non-predicted testing data.
\item "bg_prevalence": Approximated prevalence of the model (see details).
\item "pval_bin": p-value of the binomial test.
\item "pval_proc": p-value of the partial ROC test.
\item "env_bg_paucratio": Environmental partial AUC ratio value.
\item "env_bg_auc": Environmental AUC value.
\item "mean_omr_train_test": Mean value of omission rates (train and test).
\item "rank_by_omr_train_test": Rank value of importance in model selection
by omission rate.
\item "rank_omr_aucratio": Rank value by AUC ratio.
}
}
\description{
The function performs model selection for ellipsoid models
using three criteria: a) the omission rate, b) the significance of partial
ROC and binomial tests and c) the AUC value.
}
\details{
Model selection occurs in environmental space (E-space). For each variable
combination specified in nvarstest, the omission rate (omr) in E-space is
computed using \code{\link[tenm]{inEllipsoid}} function.
Results are ordered by omr of the testing data. If env_bg is provided,
an estimated prevalence is computed and results are additionally ordered
by partial AUC. Model selection can be run in parallel.
For more details and examples go to \code{\link[tenm]{ellipsoid_omr}} help.
}
\examples{
\donttest{
library(tenm)
data("abronia")
tempora_layers_dir <- system.file("extdata/bio",package = "tenm")
abt <- tenm::sp_temporal_data(occs = abronia,
                              longitude = "decimalLongitude",
                              latitude = "decimalLatitude",
                              sp_date_var = "year",
                              occ_date_format="y",
                              layers_date_format= "y",
                              layers_by_date_dir = tempora_layers_dir,
                              layers_ext="*.tif$")
abtc <- tenm::clean_dup_by_date(abt,threshold = 10/60)
future::plan("multisession",workers=2)
abex <- tenm::ex_by_date(this_species = abtc,train_prop=0.7)
abbg <- tenm::bg_by_date(this_species = abex,
                         buffer_ngbs=10,n_bg=50000)
future::plan("sequential")
varcorrs <- tenm::correlation_finder(environmental_data =
                                     abex$env_data[,-ncol(abex$env_data)],
                                     method = "spearman",
                                     threshold = 0.8,
                                     verbose = FALSE)
edata <- abex$env_data
etrain <- edata[edata$trian_test=="Train",] |> data.frame()
etest <- edata[edata$trian_test=="Test",] |> data.frame()
bg <- abbg$env_bg
res1 <- tenm::ellipsoid_selection(env_train = etrain,
                                  env_test = etest,
                                  env_vars = varcorrs$descriptors,
                                  nvarstest = 3,
                                  level = 0.975,
                                  mve = TRUE,
                                  env_bg = bg,
                                  omr_criteria = 0.1,
                                  parallel = FALSE,proc = TRUE)
head(res1)
}

}
\references{
Peterson, A.T. et al. (2008) Rethinking receiver operating
characteristic analysis applications in ecological niche modeling. Ecol.
Modell. 213, 63–72. \doi{10.1016/j.ecolmodel.2007.11.008}
}
\author{
Luis Osorio-Olvera \href{mailto:luismurao@gmail.com}{luismurao@gmail.com}
}
