\name{makedata}
\alias{makedata}

\title{
Synthetic data generation for the basic unit-level SAE model (incl. outlier contamination)
}
\description{
This function serves for synthetically generating data with area-level variation. It has been written to test several estimating methods. In addition, one may introduce contamination to the laws of the model- and/or random effects (see Details, below).
}

\usage{
makedata(seed=1024, intercept=1, beta=1, n=4, g=20, areaID=NULL,
         ve=1, ve.contam=41, ve.epsilon=0, vu=1, vu.contam=41,
         vu.epsilon=0)
}

\arguments{
  \item{seed}{
an integer, defining the \code{set.seed} (default \code{seed=1024})
}
  \item{intercept}{
either a scalar as intercept of the fixed-effects model or \code{NULL} (default: \code{intercept=1})
}
  \item{beta}{
scalar or vector defining the fixed-effect coefficients (default: \code{beta=1}). For each given coefficient, a vector of realizations is drawn from the standard normal distribution.
}
  \item{n}{
integer, defining the number of units per area in balanced-data setups (default: \code{n=4})
}
  \item{g}{
integer, defining the number of areas (default: \code{g=20})
}
  \item{areaID}{
by default \code{areaID=NULL}. If one attempts to generate synthetic unbalanced data, one may call \code{makedata} with a vector, the elements of which area identifiers. This vector should contain a series of (integer valued) area IDs. The number of areas is set equal to the number unique IDs; see the \code{rsae} Vignette for more details.
}
  \item{ve}{
scalar, defining the model/ residual variance
}
  \item{ve.contam}{
scalar, defining the model variance of the outlier part in a mixture distribution (Tuckey-Huber-type contamination model). e = (1-h)*N(0, ve) + h*N(0, ve.contam) 
}
  \item{ve.epsilon}{
scalar, defining the relative number of outliers (i.e., epsilon or h in the contamination mixture distribution). Typically, it takes values between 0 and 0.5 (but it is not restricted to this interval)
}
  \item{vu}{
scalar, defining the (area-level) random-effect variance
}
  \item{vu.contam}{
scalar, defining the (area-level) random-effect variance of the outlier part in the contamination mixture distribution (cf., \code{ve.contam})
}
  \item{vu.epsilon}{
scalar, defining the relative number of outliers in the contamination mixture distribution of the (area-level) random effects (cf., \code{ve.epsilon})
}

}
\details{
The function \code{makedata} generates synthetic datasets that may be used to study the behavior of different estimating methods. Let \eqn{y_i} denote an area-specific \eqn{n_i}-vector of the response variable for the areas \eqn{i=1,...,g}. Define a \eqn{(n_i \times p)}-matrix \eqn{X_i} of realizations from the std. normal distribution, \eqn{N(0,1)}, and let \eqn{\beta} denote a \eqn{p}-vector of regression coefficients. Now, the \eqn{y_i} are drawn using the law \eqn{y_i \sim N(X_i\beta, v_e I_i + v_u J_i)} with \eqn{v_e} and \eqn{v_u} the variances of the model error and random-effect variance, respectively, and \eqn{I_i} and \eqn{J_i} denoting the identity matrix and matrix of ones, respectively.

In addition, we allow the distribution of the model/residual and area-level random effect to be contaminated (cf. Stahel and Welsh, 1997). Notably, the laws of \eqn{e_{i,j}} and \eqn{u_i} are replaced by the Tukey-Huber contamination mixture:

   \itemize{
      \item \eqn{e_{i,j} \sim (1-\epsilon^{ve})N(0,v_e) + \epsilon^{ve}N(0, v_e^{\epsilon})},
      \item \eqn{u_{i} \sim (1-\epsilon^{vu})N(0,v_u) + \epsilon^{vu}N(0, v_u^{\epsilon})},
   }

where \eqn{\epsilon^{ve}} and \eqn{\epsilon^{vu}} regulate the degree of contamination; \eqn{v_e^{\epsilon}} and \eqn{v_e^{\epsilon}} define the variance of the contamination part of the mixture distribution. 


Four different contamination setups are possible: 

   \itemize{
      \item no contamination (i.e., \code{ve.epsilon=vu.epsilon=0}),
      \item contaminated model error (i.e., \code{ve.epsilon!=0} and \code{vu.epsilon=0}), 
      \item contaminated random effect (i.e., \code{ve.epsilon=0} and \code{vu.epsilon!=0}), 
      \item both are conaminated (i.e., \code{ve.epsilon!=0} and \code{vu.epsilon!=0}).
   }
}
\value{
Instance of the class \code{saemodel}.
}
\references{
Stahel, W.A. and A. Welsh (1997): Approaches to robust estimation in the simplest variance components model, \emph{Journal of Inference and Statistical Planning} 57, pp. 295-319.
}
\author{
Tobas Schoch
}
\examples{
#generate synthetic data
mymodel <- makedata()
}
 
