% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/jackstraw_kmeanspp.R
\name{jackstraw_kmeanspp}
\alias{jackstraw_kmeanspp}
\title{Non-Parametric Jackstraw for K-means Clustering using RcppArmadillo}
\usage{
jackstraw_kmeanspp(dat, kmeans.dat, s = NULL, B = NULL, covariate = NULL,
  verbose = FALSE, pool = TRUE, seed = NULL, ...)
}
\arguments{
\item{dat}{a matrix with \code{m} rows as variables and \code{n} columns as observations.}

\item{kmeans.dat}{an output from applying \code{ClusterR::KMeans_rcpp} onto \code{dat}.}

\item{s}{a number of ``synthetic'' null variables. Out of \code{m} variables, \code{s} variables are independently permuted.}

\item{B}{a number of resampling iterations.}

\item{covariate}{a model matrix of covariates with \code{n} observations. Must include an intercept in the first column.}

\item{verbose}{a logical specifying to print the computational progress. By default, \code{FALSE}.}

\item{pool}{a logical specifying to pool the null statistics across all clusters. By default, \code{TRUE}.}

\item{seed}{a seed for the random number generator.}

\item{...}{optional arguments to control the k-means clustering algorithm (refers to \code{ClusterR::KMeans_rcpp}).}
}
\value{
\code{jackstraw_kmeanspp} returns a list consisting of
\item{F.obs}{\code{m} observed F statistics between variables and cluster centers.}
\item{F.null}{F null statistics between null variables and cluster centers, from the jackstraw method.}
\item{p.F}{\code{m} p-values of membership.}
}
\description{
Test the cluster membership for K-means clustering, using K-means++ initialization
}
\details{
K-means clustering assign \code{m} rows into \code{K} clusters. This function enable statistical
evaluation if the cluster membership is correctly assigned. Each of \code{m} p-values refers to
the statistical test of that row with regard to its assigned cluster.
Its resampling strategy accounts for the over-fitting characteristics due to direct computation of clusters from the observed data
and protects against an anti-conservative bias.

Generally, it functions identical to \code{jackstraw_kmeans}, but this uses \code{ClusterR::KMeans_rcpp} instead of \code{stats::kmeans}.
A speed improvement is gained by K-means++ initialization and \code{RcppArmadillo}. If the input data is still too large,
consider using \code{jackstraw_MiniBatchKmeans}.

The input data (\code{dat}) must be of a class `matrix`.
}
\examples{
\dontrun{
set.seed(1234)
library(ClusterR)
dat = t(scale(t(Jurkat293T), center=TRUE, scale=FALSE))
kmeans.dat <- KMeans_rcpp(dat,  clusters = 10, num_init = 1,
max_iters = 100, initializer = 'kmeans++')
jackstraw.out <- jackstraw_kmeanspp(dat, kmeans.dat)
}
}
\references{
Chung (2018) Statistical significance for cluster membership. biorxiv, doi:10.1101/248633 \url{https://www.biorxiv.org/content/early/2018/01/16/248633}
}
\author{
Neo Christopher Chung \email{nchchung@gmail.com}
}
