% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/fast_mda.R
\name{fast_mda}
\alias{fast_mda}
\title{Fast MDA-style variable selection using ranger permutation importance}
\usage{
fast_mda(
  X,
  Y,
  ntree = 1000,
  nbf = 0,
  nthreads = max(1L, parallel::detectCores() - 1L),
  mtry = NULL,
  sample_fraction = 1,
  min_node_size = 1L,
  seed = 123
)
}
\arguments{
\item{X}{Numeric matrix (n x p); samples in rows, features in columns. Column
names should be feature IDs (e.g., m/z). Non-finite values are set to zero
internally for modeling.}

\item{Y}{Factor (classification) or numeric (regression) response of length n.
The default \code{mtry} is chosen based on the task: floor(sqrt(p)) for
classification; max(floor(p/3), 1) for regression.}

\item{ntree}{Integer; number of trees. Default 1000.}

\item{nbf}{Integer (>= 0); number of artificial “false” (noise) features to
append to X to estimate the null distribution. Default 0 disables this and
uses mirrored negative importances as the null.}

\item{nthreads}{Integer; total number of threads for ranger. Default is
max(1, parallel::detectCores() - 1).}

\item{mtry}{Optional integer; variables tried at each split. If NULL (default),
computed as floor(sqrt(p)) for classification or max(floor(p/3), 1) for regression.}

\item{sample_fraction}{Numeric in (0, 1]; subsampling fraction per tree (speed/
regularization knob). Default 1.}

\item{min_node_size}{Integer; ranger minimum node size. Larger values speed up
training and yield simpler trees. Default 1.}

\item{seed}{Integer; RNG seed for reproducibility. Default 123.}
}
\value{
A list with:
\itemize{
\item nb_to_sel: integer; number of selected features (floor(p_true * (1 - pi0))).
\item sel_moz: character vector of selected feature names (columns of X).
\item imp_sel: named numeric vector of importances for selected features (true features only).
\item all_imp: named numeric vector of importances for all true features.
\item pi0: estimated proportion of null features.
}
}
\description{
Computes feature importances with a single multiclass (or regression) random
forest using the ranger engine and its C++ permutation importance. The null
distribution of importances is estimated either by appending artificial “false”
(noise) features (when \code{nbf > 0}) or by mirroring negative importances (when
\code{nbf = 0}). An estimator of the proportion of useless features yields pi0, and the top (1 - pi0) proportion
of true features are selected. This implementation is a fast, OS-agnostic alternative to
repeated/random-forest-based MDA schemes.
}
\details{
\itemize{
\item A single ranger model is fit with importance = "permutation".
This computes permutation importance in C++ using OOB (fast and stable).
\item Null and pi0:
\itemize{
\item If \code{nbf > 0}, \code{nbf} false features (uniform between \code{min(X)} and \code{max(X)}) are appended;
negative importances among them help shape the null. An estimator of the proportion of useless features
over high quantiles (e.g., 0.75–1) yields pi0 and is adjusted for the number
of false features.
\item If \code{nbf = 0}, the null is approximated by mirroring negative importances of
true features. If no negative importances occur, pi0 is set to 0 (conservative).
}
\item Task: factors in \code{Y} trigger probability = TRUE; numeric \code{Y} triggers regression.
\item Robustness: any non-finite importances are set to zero. Selection is performed
only among the original (true) features; false features are discarded.
\item Performance: this is typically 5–20x faster than randomForest-based MDA and
fully multithreaded via \code{nthreads}.
}
}
\examples{
\dontrun{
set.seed(1)
n <- 100; p <- 300
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("mz_", seq_len(p))
Y <- factor(sample(letters[1:3], n, replace = TRUE))

if (requireNamespace("ranger", quietly = TRUE)) {
  out <- fast_mda(
    X, Y,
    ntree = 500,
    nbf = 50,
    nthreads = max(1L, parallel::detectCores() - 1L),
    seed = 42
  )
  out$nb_to_sel
  head(out$sel_moz)
  # Top importances
  head(sort(out$all_imp, decreasing = TRUE))
}
}

}
\references{
Alexandre Godmer, Yahia Benzerara, Emmanuelle Varon, Nicolas Veziris, Karen Druart, Renaud Mozet, Mariette Matondo, Alexandra Aubry, Quentin Giai Gianetto, MSclassifR: An R package for supervised classification of mass spectra with machine learning methods, Expert Systems with Applications, Volume 294, 2025, 128796, ISSN 0957-4174, \doi{10.1016/j.eswa.2025.128796}.
}
\seealso{
ranger::ranger; for cross-validated permutation importance, see
fast_cvpvi. For a wrapper that plugs MDA/CVP into broader selection workflows,
see SelectionVar (MethodSelection = "mda" or "cvp").
}
