% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/FormatData.R
\name{FormatData}
\alias{FormatData}
\title{Formats Data Into Correct Form}
\usage{
FormatData(
  data,
  idvar,
  timevar,
  An,
  varying,
  Cn = NA,
  GenerateHistory = FALSE,
  GenerateHistoryMax = NA
)
}
\arguments{
\item{data}{A data frame in long format containing the data to be analysed.}

\item{idvar}{A character string specifying the name of of the variable specifying
an individuals identifier.}

\item{timevar}{A character string specifying the name of the time variable.
Note that time periods must be labeled as integers starting from 1
(\eqn{1,2,\ldots}).}

\item{An}{A character string specifying the name of the exposure variable}

\item{varying}{A vector of character strings specifying the names of the variables
to be included in the analysis which are time-varying. Specifically
the exposure, time-varying confounders and (if applicable) the time-varying outcome.
If \code{Cn} is specified, it is added to \code{varying} automatically.}

\item{Cn}{Optional character string specifying the name of the censoring indicator if present.}

\item{GenerateHistory}{A TRUE or FALSE indicator. If set to TRUE, variables are generated
corresponding to the lagged histories of all variables included in \code{varying}.
These will be labeled as \code{LagVari} where \code{Var} is the variable name and \code{i}
indicates how much the variable is lagged by. For example \code{LagAn2} is the value of \code{An}, 2
time periods prior. Note that \code{LagAn1} is not generated as this is automatically included
in the g-estimation functions.}

\item{GenerateHistoryMax}{An optional positive integer specifying \code{GenerateHistory} to generate exposure histories
up to \code{GenerateHistoryMax} time periods prior.}
}
\value{
A data frame in long format with additional rows added as necessary. If
\code{data} is already in the correct format then no additional rows will be added.
}
\description{
Takes a dataset in long format and puts it into the required format for use
with the g-estimation functions. Specifically it ensures there exists a data
entry for each individual at each time period, by adding empty rows, and orders the dataset by
time and identifier. It can also create variables for the exposure histories of all time-varying
variables in the data.
}
\details{
Note that any variable in \code{varying} that is strictly categorical MUST be declared as
an \code{as.factor()} variable. Binary or continuous variables should be declared as an
\code{as.numeric()} variable.
}
\examples{
data<-dataexamples(n=1000,seed=3456,Censoring=TRUE)$datagest
#To demonstrate the function we
#Delete the third row, corresponding to the entry for ID 1 at time 3
data<-data[-3,]
datanew<-FormatData(data=data,idvar="id",timevar="time",An="A",
varying=c("A","L"),GenerateHistory=FALSE,GenerateHistoryMax=NA)
head(datanew)
#Note that the missing entry has been re-added,
#with missing values for A and L in the third row
#An example with lagged history of time varying variables created.
data<-dataexamples(n=1000,seed=3456,Censoring=TRUE)$datagestmultcat
datanew<-FormatData(data=data,idvar="id",timevar="time",An="A",
Cn="C",varying=c("A","L"),GenerateHistory=TRUE,GenerateHistoryMax=NA)
head(datanew)
}
