% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/integration.R
\name{runOnlineINMF}
\alias{runOnlineINMF}
\alias{runOnlineINMF.liger}
\alias{runOnlineINMF.Seurat}
\title{Perform online iNMF on scaled datasets}
\usage{
runOnlineINMF(object, k = 20, lambda = 5, ...)

\method{runOnlineINMF}{liger}(
  object,
  k = 20,
  lambda = 5,
  newDatasets = NULL,
  projection = FALSE,
  maxEpochs = 5,
  HALSiter = 1,
  minibatchSize = 5000,
  WInit = NULL,
  VInit = NULL,
  AInit = NULL,
  BInit = NULL,
  seed = 1,
  nCores = 2L,
  verbose = getOption("ligerVerbose", TRUE),
  ...
)

\method{runOnlineINMF}{Seurat}(
  object,
  k = 20,
  lambda = 5,
  datasetVar = "orig.ident",
  layer = "ligerScaleData",
  assay = NULL,
  reduction = "onlineINMF",
  maxEpochs = 5,
  HALSiter = 1,
  minibatchSize = 5000,
  seed = 1,
  nCores = 2L,
  verbose = getOption("ligerVerbose", TRUE),
  ...
)
}
\arguments{
\item{object}{\linkS4class{liger} object. Scaled data required.}

\item{k}{Inner dimension of factorization--number of metagenes. A value in
the range 20-50 works well for most analyses. Default \code{20}.}

\item{lambda}{Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
lambda increases). We recommend always using the default value except
possibly for analyses with relatively small differences (biological
replicates, male/female comparisons, etc.) in which case a lower value such
as 1.0 may improve reconstruction quality. Default \code{5.0}.}

\item{...}{Arguments passed to other S3 methods of this function.}

\item{newDatasets}{Named list of \linkS4class{dgCMatrix}. New datasets for
scenario 2 or scenario 3. Default \code{NULL} triggers scenario 1.}

\item{projection}{Whether to perform data integration with scenario 3 when
\code{newDatasets} is specified. See description. Default \code{FALSE}.}

\item{maxEpochs}{The number of epochs to iterate through. See detail.
Default \code{5}.}

\item{HALSiter}{Maximum number of block coordinate descent (HALS
algorithm) iterations to perform for each update of \eqn{W} and \eqn{V}.
Default \code{1}. Changing this parameter is not recommended.}

\item{minibatchSize}{Total number of cells in each minibatch. See detail.
Default \code{5000}.}

\item{WInit, VInit, AInit, BInit}{Optional initialization for \eqn{W}, \eqn{V},
\eqn{A}, and \eqn{B} matrices, respectively. Must be presented all together.
See detail. Default \code{NULL}.}

\item{seed}{Random seed to allow reproducible results. Default \code{1}.}

\item{nCores}{The number of parallel tasks to speed up the computation.
Default \code{2L}. Only supported for platform with OpenMP support.}

\item{verbose}{Logical. Whether to show information of the progress. Default
\code{getOption("ligerVerbose")} or \code{TRUE} if users have not set.}

\item{datasetVar}{Metadata variable name that stores the dataset source
annotation. Default \code{"orig.ident"}.}

\item{layer}{For Seurat>=4.9.9, the name of layer to retrieve input
non-negative scaled data. Default \code{"ligerScaleData"}. For older Seurat,
always retrieve from \code{scale.data} slot.}

\item{assay}{Name of assay to use. Default \code{NULL} uses current active
assay.}

\item{reduction}{Name of the reduction to store result. Also used as the
feature key. Default \code{"onlineINMF"}.}
}
\value{
\itemize{
 \item{liger method - Returns updated input \linkS4class{liger} object.
 \itemize{
     \item{A list of all \eqn{H} matrices can be accessed with
         \code{getMatrix(object, "H")}}
     \item{A list of all \eqn{V} matrices can be accessed with
         \code{getMatrix(object, "V")}}
     \item{The \eqn{W} matrix can be accessed with
         \code{getMatrix(object, "W")}}
     \item{Meanwhile, intermediate matrices \eqn{A} and \eqn{B} produced in
         HALS update can also be accessed similarly.}
 }
 }
 \item{Seurat method - Returns updated input Seurat object.
 \itemize{
     \item{\eqn{H} matrices for all datasets will be concatenated and
         transposed (all cells by k), and form a DimReduc object in the
         \code{reductions} slot named by argument \code{reduction}.}
     \item{\eqn{W} matrix will be presented as \code{feature.loadings} in the
         same DimReduc object.}
     \item{\eqn{V} matrices, \eqn{A} matrices, \eqn{B} matricesm an objective
         error value and the dataset variable used for the factorization is
         currently stored in \code{misc} slot of the same DimReduc object.}
 }}
}
}
\description{
Perform online integrative non-negative matrix factorization to
represent multiple single-cell datasets in terms of \eqn{H}, \eqn{W}, and
\eqn{V} matrices. It optimizes the iNMF objective function (see
\code{\link{runINMF}}) using online learning (non-negative least squares for
\eqn{H} matrices, and hierarchical alternating least squares (HALS) for
\eqn{V} matrices and \eqn{W}), where the number of factors is set by
\code{k}. The function allows online learning in 3 scenarios:

\enumerate{
 \item Fully observed datasets;
 \item Iterative refinement using continually arriving datasets;
 \item Projection of new datasets without updating the existing factorization
}

All three scenarios require fixed memory independent of the number of cells.

For each dataset, this factorization produces an \eqn{H} matrix (k by cell),
a \eqn{V} matrix (genes by k), and a shared \eqn{W}
matrix (genes by k). The \eqn{H} matrices represent the cell factor loadings.
\eqn{W} is identical among all datasets, as it represents the shared
components of the metagenes across datasets. The \eqn{V} matrices represent
the dataset-specific components of the metagenes.
}
\details{
For performing scenario 2 or 3, a complete set of factorization result from
a run of scenario 1 is required. Given the structure of a \linkS4class{liger}
object, all of the required information can be retrieved automatically.
Under the circumstance where users need customized information for existing
factorization, arguments \code{WInit}, \code{VInit}, \code{AInit} and
\code{BInit} are exposed. The requirements for these argument follows:
\itemize{
 \item{WInit - A matrix object of size \eqn{m \times k}. (see
     \code{\link{runINMF}} for notation)}
 \item{VInit - A list object of matrices each of size \eqn{m \times k}.
     Number of matrices should match with \code{newDatasets}.}
 \item{AInit - A list object of matrices each of size \eqn{k \times k}.
     Number of matrices should match with \code{newDatasets}.}
 \item{BInit - A list object of matrices each of size \eqn{m \times k}.
     Number of matrices should match with \code{newDatasets}.}
}

Minibatch iterations is performed on small subset of cells. The exact
minibatch size applied on each dataset is \code{minibatchSize} multiplied by
the proportion of cells in this dataset out of all cells. In general,
\code{minibatchSize} should be no larger than the number of cells in the
smallest dataset (considering both \code{object} and \code{newDatasets}).
Therefore, a smaller value may be necessary for analyzing very small
datasets.

An epoch is one completion of calculation on all cells after a number of
iterations of minibatches. Therefore, the total number of iterations is
determined by the setting of \code{maxEpochs}, total number of cells, and
\code{minibatchSize}.

Currently, Seurat S3 method does not support working on Scenario 2 and 3,
because there is no simple solution for organizing a number of miscellaneous
matrices with a single Seurat object. We strongly recommend that users create
a \linkS4class{liger} object which has the specific structure.
}
\examples{
pbmc <- normalize(pbmc)
pbmc <- selectGenes(pbmc)
pbmc <- scaleNotCenter(pbmc)
if (requireNamespace("RcppPlanc", quietly = TRUE)) {
    # Scenario 1
    pbmc <- runOnlineINMF(pbmc, minibatchSize = 200)
    # Scenario 2
    # Fake new dataset by increasing all non-zero value in "ctrl" by 1
    ctrl2 <- rawData(dataset(pbmc, "ctrl"))
    ctrl2@x <- ctrl2@x + 1
    colnames(ctrl2) <- paste0(colnames(ctrl2), 2)
    pbmc2 <- runOnlineINMF(pbmc, k = 20, newDatasets = list(ctrl2 = ctrl2),
                           minibatchSize = 100)
    # Scenario 3
    pbmc3 <- runOnlineINMF(pbmc, k = 20, newDatasets = list(ctrl2 = ctrl2),
                           projection = TRUE)
}
}
\references{
Chao Gao and et al., Iterative single-cell multi-omic integration
using online learning, Nat Biotechnol., 2021
}
