% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/modha_spangler.R
\name{gmsClust}
\alias{gmsClust}
\title{A general implementation of Modha-Spangler clustering for mixed-type data.}
\usage{
gmsClust(conData, catData, nclust, searchDensity = 10,
  clustFun = wkmeans, conDist = squaredEuc, catDist = squaredEuc,
  ...)
}
\arguments{
\item{conData}{A data frame of continuous variables.}

\item{catData}{A data frame of categorical variables; the allowable variable types depend on the specific clustering function used.}

\item{nclust}{An integer specifying the number of clusters.}

\item{searchDensity}{An integer determining the number of distinct cluster weightings evaluated in the brute-force search.}

\item{clustFun}{The clustering function to be applied.}

\item{conDist}{The continuous distance function used to construct the objective function.}

\item{catDist}{The categorical distance function used to construct the objective function.}

\item{...}{Arguments to be passed to the \code{clustFun}.}
}
\value{
A list containing the following results objects:
\item{results}{A results object corresponding to the base clustering algorithm}
\item{objFun}{A numeric vector of length \code{searchDensity} containing the values of the objective function for each weight used}
\item{Qcon}{A numeric vector of length \code{searchDensity} containing the values of the continuous component of the objective function}
\item{Qcon}{A numeric vector of length \code{searchDensity} containing the values of the categorical component of the objective function}
\item{bestInd}{The index of the most successful run}
\item{weights}{A numeric vector of length \code{searchDensity} containing the continuous weights used}
}
\description{
Modha-Spangler clustering estimates the optimal weighting for continuous
vs categorical variables using a brute-force search strategy.
}
\details{
Modha-Spangler clustering uses a brute-force search strategy to estimate
the optimal weighting for continuous vs categorical variables. This
implementation admits an arbitrary clustering function and arbitrary
objective functions for continuous and categorical variables.

The input parameter clustFun must be a function accepting inputs 
(conData, catData, conWeight, nclust, ...) and returning a list containing
(at least) the elements cluster, conCenters, and catCenters. The list element
"cluster" contains cluster memberships denoted by the integers 1:nclust. The
list elements "conCenters" and "catCenters" must be data frames whose rows
denote cluster centroids. The function clustFun must allow nclust = 1, in
which case $centers returns a data frame with a single row.
Input parameters conDist and catDist are functions that must each take two
data frame rows as input and return a scalar distance measure.
}
\examples{
\dontrun{
# Generate toy data set with poor quality categorical variables and good
# quality continuous variables.
set.seed(1)
dat <- genMixedData(200, nConVar=2, nCatVar=2, nCatLevels=4, nConWithErr=2,
  nCatWithErr=2, popProportions=c(.5,.5), conErrLev=0.3, catErrLev=0.8)
catDf <- dummyCodeFactorDf(data.frame(apply(dat$catVars, 2, factor)))
conDf <- data.frame(scale(dat$conVars))

msRes <- gmsClust(conDf, catDf, nclust=2)

table(msRes$results$cluster, dat$trueID)
}
}
\references{
Foss A, Markatou M; kamila: Clustering Mixed-Type Data in R and Hadoop. Journal of Statistical Software, 83(13). 2018. doi: 10.18637/jss.v083.i13

Modha DS, Spangler WS; Feature Weighting in k-Means Clustering. Machine Learning, 52(3). 2003. doi: 10.1023/a:1024016609528
}
