% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cata_code.R
\name{cata_code}
\alias{cata_code}
\title{Code check-all-that-apply responses into a single variable}
\usage{
cata_code(
  data,
  id,
  categ,
  resp,
  approach,
  endorse = 1,
  time = NULL,
  priority = NULL,
  new.name = "Variable",
  multi.name = "Multiple",
  sep = "-"
)
}
\arguments{
\item{data}{A data frame with one row for each \code{id} (by \code{time}, if specified) by category combination.
If \code{data} are currently in "wide" format where each response category is its own column,
use \code{\link[=cata_prep]{cata_prep()}} first to transform \code{data}into the proper format. See \emph{Examples}.}

\item{id}{The column in \code{data} to uniquely identify each participant.}

\item{categ}{Column in \code{data} indicating the check-all-that apply category labels.}

\item{resp}{Column in \code{data} indicating the check-all-that apply responses.}

\item{approach}{One of "all", "count", "multiple", "priority", or "mode". See \emph{Details}.}

\item{endorse}{The value in \code{resp} indicating endorsement of the category in \code{categ}. This must be the same for all categories.
Common values are 1 (default), "yes", TRUE, or 2 (for SPSS data).}

\item{time}{The column in \code{data} for the time variable; used to reshape longitudinal data with multiple observations for each \code{id}.}

\item{priority}{Character vector of one or more categories in the \code{categ} column indicating the order to prioritize
response categories when \code{approach} is "priority" or "mode".}

\item{new.name}{Character; column name for the created variable.}

\item{multi.name}{Character; value given to participants with multiple category endorsements when \code{approach \%in\% c("multiple", "priority", "mode")}.}

\item{sep}{Character; separator to use between values when \code{approach = "all"}.}
}
\value{
\code{data.frame}
}
\description{
In a cross-sectional or longitudinal context, select a set of decision rules
to combine responses to multiple categories from a check-all-that-apply
survey question into a single variable.
}
\details{
For all \code{approach} options, participants with missing data for all categories in \code{categ} are removed and not present in the output.

There are two options for \code{approach} that provide summary information rather than a single code for each \code{id}.

*\code{"all"} returns a data frame with \code{new.name} variable comprised of all categories
endorsed by separated by \code{sep}. The \code{time} argument is ignored when \code{approach = "all"}. Rather,
if \code{data} includes a column for time, then output includes a row for each \code{id} at each time point.
This approach is a useful exploratory first step for identifying all of the response patterns present in the data.

*\code{"counts"} is only relevant for longitudinal data and returns a data frame with the number of times an \code{id} endorsed
a category. Only categories with >= 1 endorsement are included for a particular \code{id}. As with \code{"all"}, the \code{time} argument
is ignored and instead assumes \code{data} is in longer format with a row for each \code{id} by \code{time} combination. If not,
the column of counts will be 1 for all rows.

The three remaining options for \code{approach} produce a single code for each \code{id}.
The output is a data frame with one row for each \code{id}. The choice of approach is
only relevant for participants who selected more than one category whereas
participants who only selected one category will be given that code in the output
regardless of which approach is chosen.

*\code{"multiple"} If participant endorsed multiple categories within or across time, code as \code{multi.name}.

*\code{"priority"} Same as "multiple" unless participant endorsed category in \code{priority} argument at any point.
If so, then code in order specified in \code{priority}.

*\code{"mode"} Participant is coded as the category with the mode (i.e., most common) endorsement across all time points.
Ties are coded as as the value given in \code{multi.name}. If the \code{priority} argument is specified, these categories are prioritized
first, followed by the mode response. The \code{"mode"} approach is only relevant if \code{time} is specified.
When \code{time = NULL} it operates as \code{"priority"} (when specified) or \code{"multiple"}.
}
\examples{
# prepare data
data(sources_race)
sources_long <- cata_prep(data = sources_race, id = ID, cols = Black:White, time = Wave)
  
# Identify all unique response patterns
all <- cata_code(sources_long, id = ID, categ = Category, resp = Response,
approach = "all", time = Wave, new.name = "Race_Ethnicity")
unique(all$Race_Ethnicity)

\donttest{  
# Coding endorsement of multiple categories as "Multiple
multiple <- cata_code(sources_long, id = ID, categ = Category, resp = Response,
approach = "multiple", time = Wave, new.name = "Race_Ethnicity")

# Prioritizing "Native_American" and "Pacific_Islander" endorsements
# If participant endorsed both, they are coded as "Native_American" because it is listed first
# in the priority argument.
priority <- cata_code(sources_long, id = ID, categ = Category, resp = Response,
approach = "priority", time = Wave, new.name = "Race_Ethnicity",
priority = c("Native_American", "Pacific_Islander"))

# Code as category with the most endorsements. In the case of ties, code as "Multiple"
mode <- cata_code(sources_long, id = ID, categ = Category, resp = Response,
approach = "mode", time = Wave, new.name = "Race_Ethnicity")

# Compare frequencies across coding schemes
table(multiple$Race_Ethnicity)
table(priority$Race_Ethnicity)
table(mode$Race_Ethnicity)
}

}
