% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/lmvar.R
\name{lmvar}
\alias{lmvar}
\title{Linear regression with non-constant variances}
\usage{
lmvar(y, X_mu = NULL, X_sigma = NULL, intercept_mu = TRUE,
  intercept_sigma = TRUE, sigma_min = 0, slvr_options = list(method =
  "NR"), control = list(), ...)
}
\arguments{
\item{y}{Vector of observations}

\item{X_mu}{Model matrix for the expected values \eqn{\mu}}

\item{X_sigma}{Model matrix for the logarithms of the standard deviations \eqn{\sigma}}

\item{intercept_mu}{Boolean, if TRUE a column with the name '(Intercept)' is added to the matrix \code{X_mu}. This column
represents the intercept term in the model for \eqn{\mu}.}

\item{intercept_sigma}{Boolean, if TRUE a column with the name '(Intercept_s)' is added to the matrix \code{X_sigma}. This column
represents the intercept term in the model for \eqn{\log \sigma}.}

\item{sigma_min}{Numeric, the minimum value for the standard deviations sigma. Can be a single number which applies to all observations or
a vector containing a separate minimum for each observation.}

\item{slvr_options}{A list with options to be passed on to the function
\code{\link[maxLik]{maxLik}} which maximizes the log-liklihood}

\item{control}{A list with further options. The following options can be set.
\itemize{
\item \code{check_hessian} Boolean, if TRUE it is checked whether the Hessian is negative-definite, i.e., whether the
log-likelihood is at a maximum. The default value is TRUE.
\item \code{slvr_log} Boolean, if TRUE the output of \code{maxLik} is added to the 'lmvar' object.
The default value is FALSE.
\item \code{monitor} Boolean, if TRUE diagnostic messages about errors that occur will be
printed during the run. The default value is FALSE.
\item \code{remove_df_sigma_pre} Warning: this is an experimental option which could cause
unexpected issues! Boolean, if TRUE the algorithm removes degrees of freedom of
the model for \eqn{\sigma} to avoid convergence problems. They are removed before carrying out
the fit. See 'Details'. The default value is FALSE.
\item \code{remove_df_sigma_post} Boolean, if TRUE the algorithm will remove degrees of
freedom of the model for \eqn{\sigma} if this is necessary to achieve
a satisfactory fit. They are removed after a fit has been attempted and turned out to fail.
This option has no effect if \code{sigma_min} (or one of its elements) is larger than zero.
See 'Details'. The default value is FALSE.
\item \code{mu_full_rank} Boolean, if TRUE it is assumed that \code{X_mu} (with the intercept term added in
case \code{intercept_mu = TRUE}) is full-rank. The default value is FALSE.
\item \code{sigma_full_rank} Boolean, if TRUE it is assumed that \code{X_sigma} (with the intercept term added in
case \code{intercept_sigma = TRUE}) is full-rank. The default value is FALSE.
}}

\item{...}{Additional arguments, not used in the current implementation}
}
\value{
An object of class 'lmvar', which is a list. Users are discouraged to access list-members directly.
Instead, list-members are to be accessed with the various accessor and utility functions in the package.
Exceptions are the following list members for which no accessor functions exist:
\itemize{
\item \code{y} the vector of observations
\item \code{X_mu} the  model matrix for \eqn{\mu}. In general, it differs from the user-supplied \code{X_mu} because
\code{lmvar} adds an intercept-column (unless \code{intercept_mu} is \code{FALSE}) and makes the matrix full-rank.
\item \code{X_sigma} the  model matrix for \eqn{\log \sigma}. In general, it differs from the user-supplied
\code{X_sigma} because \code{lmvar} adds an intercept-column (unless \code{intercept_sigma} is \code{FALSE}) and makes
the matrix full-rank.
\item \code{intercept_mu} boolean which tells whether or not an intercept column \code{(Intercept)} has been added to the
model matrix \eqn{X_\mu}
\item \code{intercept_sigma} boolean which tells whether or not an intercept column \code{(Intercept_s)} has been added to the
model matrix \eqn{X_\sigma}
\item \code{sigma_min} the value of the argument \code{sigma_min} in the call of \code{lmvar}
\item \code{slvr_options} the value of the argument \code{slvr_options} in the call of \code{lmvar}
\item \code{control} the value of the argument \code{control} in the call of \code{lmvar}
\item \code{slvr_log} the output of \code{maxLik} (the solver routine used to maximize the likelihood). Included only
if the argument \code{slvr_log} has the value \code{TRUE}.
See \code{\link[maxLik]{maxLik}} for details about this output.
}
}
\description{
Performs a Gaussian regression with non-constant variances and outputs an 'lmvar' object.
}
\details{
\subsection{Response vector}{
The vector \code{y} must be a numeric vector. It can not contain special values like \code{NULL}, \code{NaN}, etc.
}
\subsection{Model matrices}{
If the matrix \code{X_mu} is not specified, the model for the expected values \eqn{\mu} will consist of an intercept term only.
The argument \code{intercept_mu} is ignored in this case. Likewise, the model for \eqn{\log \sigma} will consist of an
intercept term only if \code{X_sigma} is not specified.
In the latter case, the model reduces to a standard linear model.

Both model matrices must be numeric matrices. They can not contain special values like \code{NULL}, \code{NaN}, etc.

The model matrices can be of class \code{\link{matrix}} or of a class from the package \code{\link[Matrix]{Matrix}}.
They can also be of class \code{\link{numeric}} in case they are matrices with one column only.

All columns in the matrix \code{X_mu} must either have a name, or no column has a name at all. It is not allowed that some
colums have a
name while others don't. The same is true for \code{X_sigma}.

If supplied, the column names
must be unique within \code{X_mu}. The same is true for \code{X_sigma}. A column name can be used in both \code{X_mu} and
\code{X_sigma} though.

In case the matrix \code{X_mu} has no columns with a column name, \code{lmvar}  gives the names \code{v1},
\code{v2} etc. to the columns. Likewise, if the matrix \code{X_sigma} has no columns with a column name, \code{lmvar}
gives the names \code{v1_s}, \code{v2_s} etc. to the columns.

Matrix \code{X_mu} can not have a column with the name '(Intercept)'.
Matrix \code{X_sigma} can not have a column with the name '(Intercept_s)'. Both names are reserved names.
}

\subsection{Minimum sigma}{
The argument \code{sigma_min} allows one to run a regression under the constraint of a minimum standard deviation for each
observation. The argument can be a single number, which applies to all observations, or a vector which contains
a separate
minimum for each observation. In the latter case, all vector elements must be zero
(in which case no constraint is applied) or all vector elements must be larger than zero.

The option of a minimum sigma is intended for situations in which an unconstrained regression attempts to make
sigma equal to zero for one or more observations. This will cause extreme values for the \eqn{\beta_\sigma} and
numerical instabilities.Such a situation can be remedied by bounding sigma from below.

In case \code{sigma_min} has a value (or a vector of values) larger than zero and the solve-method is "NR", the
solve method is set to "BFGS". If the argument \code{constraints} is passed on to \code{maxlik} (as a list member of
\code{slvr_options}), it is ignored.

Error estimates and confidence intervals (e.g. such as given by \code{\link{summary.lmvar}} and
\code{\link{predict.lmvar}}) can be unreliable
if minimum sigmas are specified.
}

\subsection{Likelihood maximization}{
The function \code{\link[maxLik]{maxLik}} from the \code{maxLik} package, is used to maximize the (profile) log-likelihood.
\code{maxLik} returns a
termination code which reports whether a maximum was found, see \code{maxLik}.
For the method "NR", a potential problem is reported by a \code{maxLik} return code different from \code{1}, \code{2} or \code{8}.
For other methods, a code different from \code{0} flags a potential problem.
In case the return code flags a potential problem, the message from \code{maxLik} is output as a warning.

All list elements in \code{slvr_options} are passed on as arguments to \code{maxLik}. The name of the list element is
the argument name, the value of the list element is the argument value.
It is not allowed to pass on the arguments \code{fn}, \code{grad} or \code{hess}. In case the list does not contain an element
\code{method}, it is set to "NR". In case the list does not contain an element \code{start}, an initial estimate for
\eqn{\beta_\sigma} is set by \code{lmvar}.

In case one wants to supply an initial estimate for the coefficients, one has to supply an initial estimate
for \eqn{\beta_\sigma}. If \code{beta_sigma_initial} is the initial estimate, one passes on the argument
\code{slvr_options = list(start = beta_sigma_initial)}.
The inital estimate \code{beta_sigma_initial} must be a numeric vector. Its length must be as follows.
\itemize{
\item In case \code{X_sigma} is not specified or has the value \code{NULL}, the initial estimate must be
a single value.
\item In case \code{X_sigma} is specified and \code{intercept_sigma = TRUE}: the length must be equal to the number
of columns of \code{X_sigma} plus one. The first element of the vector is the initial estimate of the
intercept term for \eqn{\log \sigma}, the second element is the inital estimate corresponding to the first
column in \code{X_sigma}, the third element is the inital estimate corresponding to the second
column in \code{X_sigma}, etc.
\item  In case \code{X_sigma} is specified and \code{intercept_sigma = FALSE}: the length must be equal to the
number of columns of \code{X_sigma}. The first element of the vector is the initial estimate corresponding to the
first column in \code{X_sigma}, the second element is the inital estimate corresponding to the second
column in \code{X_sigma}, etc.
}

There is no need to supply an inital estimate for \eqn{\beta_\mu}, see the vignette 'Math' for details.
}
\subsection{Reducing the degrees of freedom to improve fit}{
When \code{maxLik} exits with return code 3 (and corresponding warning 'Last step could not find a value
above the current. Boundary of parameter space?'), it
somehow did not succeed to fit an 'lmvar' model properly. The same is true if the the Hessian if the log-likelihood is not
negative-definite.

In this situation, a proper fit can sometimes be achieved if one drops a few extra columns (sometimes just one column) from
\code{X_sigma}. See the vignette 'Math' for details. The options
\code{control = list(remove_df_sigma_pre = TRUE, remove_df_sigma_post = TRUE)}) do just that.
They attempt to achieve a proper fit by dropping
columns (i.e., reducing the degrees of freedom of the model for \eqn{\sigma}) if necessary.

The option \code{remove_df_sigma_pre} inspects the model matrices and the response vector before
carrying out the fit, and drops columns from \code{X_sigma} if necessary. Warning: this is an
experimental option which could cause unexpected issues!

The option \code{remove_df_sigma_post = TRUE} attempts to achieve a proper fit in the following two cases.
\itemize{
\item \code{maxLik} uses the solve-method
"NR" (the default method) or "BFGSR" and exits with return code 3. Note that this not the case when \code{sigma_min} (or one
of its elements) has been set to a value larger than zero because then the method "BFGS" is used.
\item The option \code{check_hessian} is \code{TRUE} and the Hessian of the log-likelihood is not negative-definite.
}
I.e., this option drops columns from \code{X_sigma} based on the results of a failed fit.
}
\subsection{Other}{
When \code{check_hessian = TRUE}, it is checked whether the fitted log-likelihood is at a maximum. A warning will be issued if
that is not the case.

The control options \code{mu_full_rank} and \code{sigma_full_rank} are for efficiency purposes. If set to TRUE, the
corresponding model matrices will not be made full-rank because it is assumed they are full-rank already. However,
setting one of these to TRUE while the corresponding model matrix is not full-rank will cause unpredictable and
possibly unnoticed errors. These options should therefore only be changed from their default value with the
utmost care.

See the vignettes
that come with the \code{lmvar} package for more info. Run \code{vignette(package="lmvar")} to list the available vignettes.
}
}
\examples{
# As example we use the dataset 'attenu' from the library 'datasets'. The dataset contains
# the response variable 'accel' and two explanatory variables 'mag'  and 'dist'.
library(datasets)

# For more info on the data, study the dataset
help("attenu")

# Create the model matrix for the expected values
X = cbind(attenu$mag, attenu$dist)
colnames(X) = c("mag", "dist")

# Create the model matrix for the standard deviations. The standard deviation
# is large for small distances and small for large distances. The use of 'dist'
# as explanatory variable makes the beta for the intercept term blow up.
# Therefore we use '1 / dist' as explanatory variable
X_s = cbind(attenu$mag, 1 / attenu$dist)
colnames(X_s) = c("mag", "dist_inv")

# Carry out the fit
fit = lmvar(attenu$accel, X, X_s)

# Inspect the results
summary(fit)

# Carry out the fit with an inital estimate for beta and ask for
# a report of the solver-routine
beta_sigma_start = c(-4, 0, 0)
fit = lmvar(attenu$accel, X, X_s,
            slvr_options = list(start = beta_sigma_start),
            control = list(slvr_log = TRUE))
fit$slvr_log
}
\seealso{
\code{\link{lmvar_no_fit}} to create an object which is like an 'lmvar' object without carrying out a
model fit.
}
