% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/correctHeap.R
\name{correctHeaps}
\alias{correctHeaps}
\alias{correctHeaps2}
\title{Correct Age Heaping}
\usage{
correctHeaps(
  x,
  heaps = "10year",
  method = "lnorm",
  start = 0,
  fixed = NULL,
  model = NULL,
  dataModel = NULL,
  seed = NULL,
  na.action = "omit",
  verbose = FALSE,
  sd = NULL
)

correctHeaps2(
  x,
  heaps = "10year",
  method = "lnorm",
  start = 0,
  fixed = NULL,
  model = NULL,
  dataModel = NULL,
  seed = NULL,
  na.action = "omit",
  verbose = FALSE,
  sd = NULL
)
}
\arguments{
\item{x}{numeric vector of ages (typically integers).}

\item{heaps}{character string specifying the heaping pattern:
\describe{
  \item{\code{"5year"}}{heaps are assumed every 5 years (0, 5, 10, 15, ...)}
  \item{\code{"10year"}}{heaps are assumed every 10 years (0, 10, 20, ...)}
}
Alternatively, a numeric vector specifying custom heap positions.}

\item{method}{character string specifying the distribution used for correction:
\describe{
  \item{\code{"lnorm"}}{truncated log-normal distribution (default).
    Parameters are estimated from the input data.}
  \item{\code{"norm"}}{truncated normal distribution.
    Parameters are estimated from the input data.}
  \item{\code{"unif"}}{uniform distribution within the truncation bounds.}
  \item{\code{"kernel"}}{kernel density estimation for nonparametric sampling.}
}}

\item{start}{numeric value for the starting point of the heap sequence
(default 0). Use 5 if heaps occur at 5, 15, 25, ... instead of 0, 10, 20, ...
Ignored if \code{heaps} is a numeric vector.}

\item{fixed}{numeric vector of indices indicating observations that should
not be changed. Useful for preserving known accurate values.}

\item{model}{optional formula for model-based correction. When provided,
a random forest model is fit to predict age from other variables, and
the correction direction is adjusted to be consistent with this prediction.
Requires packages \pkg{ranger} and \pkg{VIM}.}

\item{dataModel}{data frame containing variables for the model formula.
Required when \code{model} is specified. Missing values are imputed
using k-nearest neighbors via \code{\link[VIM]{kNN}}.}

\item{seed}{optional integer for random seed to ensure reproducibility.
  If \code{NULL} (default
), no seed is set.}

\item{na.action}{character string specifying how to handle \code{NA} values:
\describe{
  \item{\code{"omit"}}{remove NA values before processing, then restore positions (default)}
  \item{\code{"fail"}}{stop with an error if NA values are present}
}}

\item{verbose}{logical. If \code{TRUE}, return a list with corrected values
and diagnostic information. If \code{FALSE} (default), return only the
corrected vector.}

\item{sd}{optional numeric value for standard deviation when \code{method = "norm"}.
If \code{NULL} (default), estimated from the data using MAD (median absolute deviation)
of non-heap ages, which is robust to the heaping.}
}
\value{
If \code{verbose = FALSE}, a numeric vector of the same length as
  \code{x} with heaping corrected. If \code{verbose = TRUE}, a list with:
  \describe{
    \item{corrected}{the corrected numeric vector}
    \item{n_changed}{total number of values changed}
    \item{changes_by_heap}{named vector of changes per heap age}
    \item{ratios}{named vector of heaping ratios per heap age}
    \item{method}{method used}
    \item{seed}{seed used (if any)}
  }
}
\description{
Age heaping can cause substantial bias in important demographic measures
and thus should be corrected. This function corrects heaping at regular
intervals (every 5 or 10 years) by replacing a proportion of heaped
observations with draws from fitted truncated distributions.
}
\details{
Correct for age heaping at regular intervals using truncated distributions.


For method \dQuote{lnorm}, a truncated log-normal distribution is fit to
the whole age distribution. Then for each age heap (at 0, 5, 10, 15, ...
or 0, 10, 20, ...) random numbers from a truncated log-normal distribution
(with lower and upper bounds) are drawn.

The correction range depends on the heap type:
\itemize{
  \item For 5-year heaps: values are drawn from \eqn{\pm 2} years around the heap
  \item For 10-year heaps: values are drawn in two groups, \eqn{\pm 4} and
    \eqn{\pm 5} years around the heap
}

The ratio of observations to replace is calculated by comparing the count
at each heap age to the arithmetic mean of the two neighboring ages. For
example, for age heap 5, the ratio is: count(age=5) / mean(count(age=4), count(age=6)).

Method \dQuote{norm} uses truncated normal distributions instead. The choice
between \dQuote{lnorm} and \dQuote{norm} depends on whether the age
distribution is right-skewed (use \dQuote{lnorm}) or more symmetric
(use \dQuote{norm}). Many distributions with heaping problems are right-skewed.

Method \dQuote{unif} draws from truncated uniform distributions around the
age heaps, providing a simpler baseline approach.

Method \dQuote{kernel} uses kernel density estimation to sample replacement
values, providing a nonparametric alternative that adapts to the local
data distribution.

Repeated calls of this function mimic multiple imputation, i.e., repeating
this procedure m times provides m corrected datasets that properly reflect
the uncertainty from the correction process. Use the \code{seed} parameter
to ensure reproducibility.
}
\examples{
# Create artificial age data with log-normal distribution
set.seed(123)
age <- rlnorm(10000, meanlog = 2.466869, sdlog = 1.652772)
age <- round(age[age < 93])

# Artificially introduce 5-year heaping
year5 <- seq(0, max(age), 5)
age5 <- sample(c(age, age[age \%in\% year5]))

# Correct with reproducible results
age5_corrected <- correctHeaps(age5, heaps = "5year", method = "lnorm", seed = 42)

# Get diagnostic information
result <- correctHeaps(age5, heaps = "5year", verbose = TRUE, seed = 42)
print(result$n_changed)
print(result$ratios)

# Use kernel method for nonparametric correction
age5_kernel <- correctHeaps(age5, heaps = "5year", method = "kernel", seed = 42)

# Custom heap positions (e.g., heaping at 12, 18, 21)
custom_heaps <- c(12, 18, 21)
age_custom <- correctHeaps(age5, heaps = custom_heaps, method = "lnorm", seed = 42)
}
\references{
Templ, M. (2026). Correction of heaping on individual level.
\emph{Journal TBD}.

Templ, M., Meindl, B., Kowarik, A., Alfons, A., Dupriez, O. (2017).
Simulation of Synthetic Populations for Survey Data Considering Auxiliary
Information. \emph{Journal of Statistical Software}, \strong{79}(10), 1-38.
\doi{10.18637/jss.v079.i10}
}
\seealso{
\code{\link{correctSingleHeap}} for correcting a single specific heap.

Other heaping correction: 
\code{\link{correctSingleHeap}()}
}
\author{
Matthias Templ, Bernhard Meindl
}
\concept{heaping correction}
