% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DSD_Gaussians.R
\name{DSD_Gaussians}
\alias{DSD_Gaussians}
\title{Mixture of Gaussians Data Stream Generator}
\usage{
DSD_Gaussians(
  k = 2,
  d = 2,
  mu,
  sigma,
  p,
  noise = 0,
  noise_range,
  separation_type = c("auto", "Euclidean", "Mahalanobis"),
  separation = 0.2,
  space_limit = c(0.2, 0.8),
  variance_limit = 0.01,
  outliers = 0,
  outlier_options = NULL,
  verbose = FALSE
)
}
\arguments{
\item{k}{Determines the number of clusters.}

\item{d}{Determines the number of dimensions.}

\item{mu}{A matrix of means for each dimension of each cluster.}

\item{sigma}{A list of length \code{k} of covariance matrices.}

\item{p}{A vector of probabilities that determines the likelihood of
generated a data point from a particular cluster.}

\item{noise}{Noise probability between 0 and 1.  Noise is uniformly
distributed within noise range (see below).}

\item{noise_range}{A matrix with d rows and 2 columns. The first column
contains the minimum values and the second column contains the maximum
values for noise.}

\item{separation_type}{The type of the separation distance calculation. It
can be either Euclidean norm or Mahalanobis distance.}

\item{separation}{Depends on the \code{separation_type} parameter. It means
minimum separation distance between all generated constructs. When
\code{k>0}, generated constructs include clusters. When \code{outliers>0},
generated constructs include outliers.}

\item{space_limit}{Defines the space bounds. All constructs are generated
inside these bounds. For clusters this means that their centroids must be
within these space bounds.}

\item{variance_limit}{Upper limit for the randomly generated variance when
creating cluster covariance matrices.}

\item{outliers}{Determines the number of data points marked as outliers.
Outliers generated by \code{DSD_Gaussians} are statistically separated
enough from clusters, so that outlier detectors can find them in the overall
data stream. Cluster and outlier separation distance is determined by
\code{separation} and \code{outlier_virtual_variance} parameters. The
outlier virtual variance defines an empty space around outliers, which
separates them from their surrounding. Unlike noise, outliers are data
points of interest for end-users, and the goal of outlier detectors is to
find them in data streams. For more details, read the "Introduction to
\pkg{stream}" vignette.}

\item{outlier_options}{Effective only when \code{outliers>0}. Comprises the
following list of options: \itemize{
\item\code{predefined_outlier_space_positions} - (Default=NULL) A predefined
list of outlier spatial positions. Similar to \code{mu}.
\item\code{predefined_outlier_stream_positions} - (Default=NULL) A
predefined list of outlier stream positions. Must have the same number of
elements as \code{predefined_outlier_space_positions}.
\item\code{outlier_horizon} - (Default=500) A horizon in the generated data
stream measured in data points that will contain requested number of
outliers.  \item\code{outlier_virtual_variance} - (Default=1) A variance
used to create the virtual covariance matrices for outliers. Such virtual
statistical distribution helps to define an empty space around outliers that
separates them from other constructs, both clusters and outliers.  }}

\item{verbose}{Printout of the cluster and outlier generation process.}
}
\value{
Returns a \code{DSD_Gaussians} object (subclass of \code{DSD_R},
\code{DSD}) which is a list of the defined params. The params are either
passed in from the function or created internally. They include:

\item{description}{A brief description of the DSD object.} \item{k}{The
number of clusters.} \item{d}{The number of dimensions.} \item{mu}{The
matrix of means of the dimensions in each cluster.} \item{sigma}{The
covariance matrix.} \item{p}{The probability vector for the clusters.}
\item{noise}{A flag that determines if or if not noise is generated.}
\item{outs}{Outlier spatial positions.} \item{outs_pos}{Outlier stream
positions.} \item{outs_vv}{Outlier virtual variance.}
}
\description{
A data stream generator that produces a data stream with a mixture of static
Gaussians.
}
\details{
\code{DSD_Gaussians} creates a mixture of \code{k} static clusters and
\code{outliers} outliers in a \code{d}-dimensional space. The cluster
centers \code{mu} and the covariance matrices \code{sigma} can be supplied
or will be randomly generated. The probability vector \code{p} defines for
each cluster the probability that the next data point will be chosen from it
(defaults to equal probability). The outlier spatial positions
\code{predefined_outlier_space_positions} and the outlier stream positions
\code{predefined_outlier_stream_positions} can be supplied or will be
randomly generated.

Separation between generated clusters and outliers can be imposed by using
Euclidean or Mahalanobis distance, which is controlled by the
\code{separation_type} parameter. Separation value then is supplied in the
\code{separation} parameter.

The generation method is similar to the one suggested by Jain and Dubes
(1988).
}
\examples{

# create data stream with three clusters in 3-dimensional data space
stream1 <- DSD_Gaussians(k=3, d=3)
plot(stream1)


# create data stream with specified cluster positions,
# 20\% noise in a given bounding box and
# with different densities (1 to 9 between the two clusters)
stream2 <- DSD_Gaussians(k=2, d=2,
    mu=rbind(c(-.5,-.5), c(.5,.5)),
    noise=0.2, noise_range=rbind(c(-1,1),c(-1,1)),
    p=c(.1,.9))
plot(stream2)

# create 2 clusters and 2 outliers. Clusters and outliers
# are separated by Euclidean distance of 0.5 or more.
stream3 <- DSD_Gaussians(k=2, d=2,
    separation_type="Euclidean", separation=0.5,
    space_limit=c(0,1),
    outliers=2)
plot(stream3)

# create 2 clusters and 2 outliers separated by a Mahalanobis
# distance of 6 or more.
stream4 <- DSD_Gaussians(k=2, d=2,
  separation_type="Mahalanobis", separation=6,
  space_limit=c(0,25), variance_limit=2,
  outliers=2)
plot(stream4)

# spread outliers over 20000 data instances
stream5 <- DSD_Gaussians(k=2, d=2,
  separation_type="Mahalanobis", separation=6,
  space_limit=c(0,45), variance_limit=2,
  outliers=20, outlier_options=list(
    outlier_horizon=20000,
    outlier_virtual_variance = 0.3))
plot(stream5, n=20000)

}
\references{
Jain and Dubes(1988) Algorithms for clustering data,
Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
}
\seealso{
\code{\link{DSD}}
}
\author{
Michael Hahsler, Dalibor Krleža
}
