% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ClusterMatch_func_20210910.R
\name{ClusterMatch}
\alias{ClusterMatch}
\title{ClusterMatch() function}
\usage{
ClusterMatch(filepath, path_out, k_summary_table)
}
\arguments{
\item{filepath}{a user defined path to a folder that contains the set of
K-cluster files to be matched against each other. The algorithm will attempt
to load all files in the folder, so it should contain only the relevant
K-cluster files. If the clusters were generated using the BootKmeans()
function, such a folder (named Clusters) was created by the algorithm in the
output path given by the user.
Each K-cluster file should correspond to the model$cluster object in kmeans()
saved as a .Rdata file. Such files are generated as part of the output from
BootKmeans(). ClusterMatch() assumes that the file names contain the string
"model_" followed by a model number, which must match the corresponding row
numbers in k_summary_table. If the data used was generated with the
BootKmeans() function, the formats and numbers will match by default.}

\item{path_out}{a user defined path to the folder where the output files will
be saved.}

\item{k_summary_table}{a data frame summarizing the stats of the kmeans()
models that produced the clusters in the K-cluster files. If the data used
was generated with the BootKmeans() function, a compatible
k_summary_table was produced in the output path with the file name
"k_means_bootstrap_summary_stats_<date>.csv".
If other data is analysed, please observe these formatting requirements:
The k_summary_table must contain the data for each kmeans() model in rows
and as minimum the following columns:
- k-value (colname: k.est)
- residual total within sums-of-squares (colname: Tot.withinss.resid)
- residual AIC (colname: AIC.resid)
- residual BIC (colname: BIC.resid)
- delta BIC/max BIC (colname: prop.delta.BIC)
- delta BIC/k.est (colname: delta.BIC.over.k)
It is crucial that the models have the same numbers in the K-cluster file
names and in the k_summary_table, and that the rows of the table are ordered
by the model number.}
}
\value{
The function returns a summary table, which for each estimated number
  of clusters (i.e. the k-values of the models) lists:
  - number of models that found i clusters
  - mean residual total within sums-of-squares
  - mean residual AIC
  - mean residual BIC
  - mean delta BIC/max BIC
  - mean delta BIC/k
  - mean number of allele assignments that fall outside of the i most abundant
    clusters across all pairwise comparisons between the models that found i
    clusters
  - mean proportion of allele assignments that fall outside of the i most
    abundant clusters across all pairwise comparisons between the models that
    found i clusters
  The summary table is also saved as a .csv file in the output path.
}
\description{
\code{\link{ClusterMatch}} is a tool for evaluating whether k-means()
clustering models with similar estimated values of k identify similar
clusters. ClusterMatch() also summarizes model stats as means for
different estimated values of k. It is designed to take files produced
by the BootKmeans() function as input, but other data can be analysed
if the descriptions of the data formats given below are observed
carefully.
}
\details{
If you publish data produced with MHCtools, please cite:
Roved, J. 2020. MHCtools: Analysis of MHC data in non-model species. Cran.
Roved, J., Hansson, B., Stervander, M., Hasselquist, D., Westerdahl, H. 2020.
Non-random association of MHC-I alleles in favor of high diversity haplotypes
in wild songbirds revealed by computer-assisted MHC haplotype inference using
the R package MHCtools. bioRxiv.
}
\examples{
filepath <- system.file("extdata/ClusterMatch", package="MHCtools")
path_out <- tempdir()
k_summary_table <- k_summary_table
ClusterMatch(filepath, path_out, k_summary_table)
}
\seealso{
\code{\link{BootKmeans}}
}
