Help for package NMAR

Type:

Package

Title:

Estimation under not Missing at Random Nonresponse

Version:

0.1.2

Description:

Methods to estimate finite-population parameters under nonresponse that is not missing at random (NMAR, nonignorable). Incorporates auxiliary information and user-specified response models, and supports independent samples and complex survey designs via objects from the 'survey' package. Provides diagnostics and optional variance estimates. For methodological background see Qin, Leung and Shao (2002) <doi:10.1198/016214502753479338> and Riddles, Kim and Im (2016) <doi:10.1093/jssam/smv047>.

License:

MIT + file LICENSE

URL:

https://github.com/ncn-foreigners/NMAR, https://ncn-foreigners.ue.poznan.pl/NMAR/index.html

BugReports:

https://github.com/ncn-foreigners/NMAR/issues

Encoding:

UTF-8

Imports:

stats, nleqslv, utils, generics, Formula

RoxygenNote:

7.3.3

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0), numDeriv, survey, svrep, broom, progressr, future, future.apply, spelling

VignetteBuilder:

knitr

Config/testthat/edition:

Depends:

R (≥ 3.5)

LazyData:

true

Language:

en-US

NeedsCompilation:

Packaged:

2026-02-04 15:05:13 UTC; runner

Author:

Maciej Beresewicz

[aut, cre], Igor Kołodziej [aut, ctb], Mateusz Iwaniuk [aut, ctb]

Maintainer:

Maciej Beresewicz <maciej.beresewicz@ue.poznan.pl>

Repository:

CRAN

Date/Publication:

2026-02-05 07:40:23 UTC

Apply scaling to a matrix using a recipe

Description

Apply scaling to a matrix using a recipe

Usage

apply_nmar_scaling(matrix_to_scale, recipe)

Arguments

matrix_to_scale

A numeric matrix with column names present in recipe.

recipe

An object of class nmar_scaling_recipe.

Value

A numeric matrix with each column centered and scaled using recipe.

Bootstrap variance estimation module

Description

Estimates the variance of a scalar estimator via bootstrap resampling for IID data or bootstrap replicate weights for survey designs.

Usage

bootstrap_variance(data, estimator_func, point_estimate, ...)

Arguments

data

A data.frame or a survey.design.

estimator_func

Function returning an object with a numeric scalar component y_hat and an optional logical component converged.

point_estimate

Numeric scalar; used for survey bootstrap variance (passed to survey::svrVar() as coef).

...

Additional arguments. Some are consumed by bootstrap_variance() itself (for example resample_guard for IID bootstrap or bootstrap_settings/bootstrap_options/bootstrap_type/bootstrap_mse or survey bootstrap). Remaining arguments are forwarded to estimator_func.

Details

For data.frame inputs, performs IID bootstrap by resampling rows and rerunning estimator_func on each resample, then computing the empirical variance of the replicate estimates.
For survey.design inputs, converts the design to a bootstrap replicate-weight design with svrep::as_bootstrap_design(), evaluates estimator_func on each replicate weight vector by injecting the replicate analysis weights into a copy of the input design, and passes the resulting replicate estimates and replicate scaling factors to survey::svrVar().

Bootstrap-specific options

resample_guard: IID bootstrap only. A function function(indices, data) that returns TRUE to accept a resample and FALSE to reject it.
bootstrap_settings: A list of arguments forwarded to svrep::as_bootstrap_design().
bootstrap_options: Alias for bootstrap_settings.
bootstrap_type: The type argument for svrep::as_bootstrap_design().
bootstrap_mse: The mse argument for svrep::as_bootstrap_design().

Progress Reporting

If the optional progressr package is installed, bootstrap calls indicate progress via a progressr::progressor inside progressr::with_progress(). Users control if and how progress is shown by registering handlers with progressr::handlers(). When progressr is not installed or no handlers are active, bootstrap runs silently.

Parallelization

By default, bootstrap replicate evaluation runs sequentially via base::lapply() for both IID resampling and survey replicate-weight bootstrap. If the optional future.apply package is installed, bootstrap can use future.apply::future_lapply(future.seed = TRUE) when the user has set a parallel future::plan(). The backend is controlled by the package option nmar.bootstrap_apply:

"auto": (default) Use base::lapply() unless the current future plan has more than one worker, in which case use future.apply::future_lapply() if available.
"base": Always use base::lapply(), even if future.apply is installed.
"future": Always use future.apply::future_lapply().

When future.apply is used, random-number streams are parallel-safe and backend-independent under the future framework. When base::lapply() is used, results are reproducible under set.seed() but will likely not match the future.seed streams.

Bootstrap for IID data frames

Description

Bootstrap for IID data frames

Usage

## S3 method for class 'data.frame'
bootstrap_variance(
  data,
  estimator_func,
  point_estimate,
  bootstrap_reps = 500,
  ...
)

Arguments

data

A data.frame.

estimator_func

Function returning an object with a numeric scalar component y_hat and an optional logical component converged.

point_estimate

Unused for IID bootstrap, included for signature consistency.

bootstrap_reps

integer; number of resamples.

...

Value

A list with components se, variance, and replicates.

Default dispatch

Description

Default dispatch

Usage

## Default S3 method:
bootstrap_variance(data, estimator_func, point_estimate, ...)

Bootstrap for survey designs

Description

Bootstrap for survey designs

Usage

## S3 method for class 'survey.design'
bootstrap_variance(
  data,
  estimator_func,
  point_estimate,
  bootstrap_reps = 500,
  survey_na_policy = c("strict", "omit"),
  ...
)

Arguments

data

A survey.design.

estimator_func

Function returning an object with a numeric scalar component y_hat and an optional logical component converged.

point_estimate

Numeric scalar; used for survey bootstrap variance (passed to survey::svrVar() as coef).

bootstrap_reps

integer; number of bootstrap replicates.

survey_na_policy

Character string specifying how to handle replicates that fail to produce estimates. Options:

"strict": (default) Any failed replicate causes an error. This is a conservative default that makes instability explicit.
"omit": Failed replicates are omitted. The corresponding rscales are also omitted to maintain correct variance scaling. Use with caution: if failures are non-random, variance may be biased.

...

Details

This path constructs a replicate-weight design using svrep::as_bootstrap_design() and evaluates the estimator on each set of bootstrap replicate analysis weights. Replicate evaluation starts from a shallow template copy of the input survey design (including its ids/strata/fpc structure) and injects each replicate's analysis weights by updating the design's probability slots (prob/allprob) so that weights(design) returns the desired replicate weights. This avoids replaying or reconstructing a svydesign() call and therefore supports designs created via subset() and update(). NA policy: By default, survey bootstrap uses a strict NA policy: if any replicate fails to produce a finite estimate, the entire bootstrap fails with an error. Setting survey_na_policy = "omit" drops failed replicates and proceeds with the remaining replicates.

Value

A list with components se, variance, and replicates.

Limitations

Calibrated/post-stratified designs: Post-hoc adjustments applied via survey::calibrate(), survey::postStratify(), or survey::rake() are not supported here and will cause the function to error. These adjustments are not recomputed when replicate weights are injected, so the replicate designs would not reflect the intended calibrated/post-stratified analysis.

Replicate-weight designs not supported

Description

Replicate-weight designs not supported

Usage

## S3 method for class 'svyrep.design'
bootstrap_variance(data, estimator_func, point_estimate, ...)

Default coefficients for NMAR results

Description

Returns missingness-model coefficients.

Usage

## S3 method for class 'nmar_result'
coef(object, ...)

Arguments

object

An 'nmar_result' object.

...

Ignored.

Value

A named numeric vector or 'NULL'.

Coefficient table for summary objects

Description

Returns a coefficients table (Estimate, Std. Error, statistic, p-value) from a 'summary_nmar_result*' object when missingness-model coefficients and a variance matrix are available. If the summary does not carry missingness-model coefficients, returns 'NULL'.

Usage

## S3 method for class 'summary_nmar_result'
coef(object, ...)

Arguments

object

An object of class 'summary_nmar_result' (or subclass).

...

Ignored.

Details

The statistic column is labelled "t value" when finite degrees of freedom are present in survey designs, otherwise it is labelled "z value".

Value

A data.frame with rows named by coefficient, or 'NULL' if not available.

Compute mean and standard deviation

Description

Compute mean and standard deviation

Usage

compute_weighted_stats(values, weights = NULL)

Wald confidence interval for NMAR results

Description

Wald confidence interval for NMAR results

Usage

## S3 method for class 'nmar_result'
confint(object, parm, level = 0.95, ...)

Arguments

object

An object of class 'nmar_result'.

parm

Ignored.

level

Confidence level.

...

Ignored.

Value

A 1x2 numeric matrix with confidence limits.

Confidence intervals for summary objects

Description

Returns Wald-style confidence intervals for missingness-model coefficients from a 'summary_nmar_result*' object. Uses t-quantiles when finite degrees of freedom are available, otherwise normal quantiles.

Usage

## S3 method for class 'summary_nmar_result'
confint(object, parm, level = 0.95, ...)

Arguments

object

An object of class 'summary_nmar_result' (or subclass).

parm

A specification of which coefficients are to be given confidence intervals, either a vector of names or a vector of indices. By default, all coefficients are considered.

level

The confidence level required.

...

Ignored.

Value

A numeric matrix with columns giving lower and upper confidence limits for each parameter. Row names correspond to coefficient names. Returns 'NULL' if coefficients are unavailable.

Constraint summaries for diagnostics

Description

Reports the constraint sums used in the estimating equations.

Usage

constraint_summaries(w_i_hat, W_hat, mass_untrim, X_centered)

Build a scaling recipe from one or more design matrices

Description

Build a scaling recipe from one or more design matrices

Usage

create_nmar_scaling_recipe(
  ...,
  intercept_col = "(Intercept)",
  weights = NULL,
  weight_mask = NULL,
  tol_constant = 1e-08,
  warn_on_constant = TRUE
)

Arguments

...

One or more numeric matrices with column names.

intercept_col

Name of an intercept column that should remain unscaled.

weights

Optional nonnegative numeric vector used to compute weighted means and standard deviations.

weight_mask

Optional logical mask or nonnegative numeric multipliers applied to weights before computing moments (useful for respondents-only scaling). If weights is NULL, weight_mask is treated as weights.

tol_constant

Numeric tolerance below which columns are treated as constant and left unscaled.

warn_on_constant

Logical; warn when a column is treated as constant.

Create Verbose Printer Factory

Description

Creates a verbose printing function based on trace level settings. Messages are printed only if their level is <= trace_level.

Usage

create_verboser(trace_level = 0)

Arguments

trace_level

Integer 0-3; controls verbosity detail: - 0: No output (silent mode) - 1: Major steps only (initialization, convergence) - 2: Moderate detail (iteration summaries, key diagnostics) - 3: Full detail (all diagnostics, intermediate values)

Value

A function with signature: 'verboser(msg, level = 1, type = c("info", "step", "detail", "result"))'

Empirical likelihood estimator

Description

Empirical likelihood estimator

Usage

el(data, ...)

Arguments

data

A data.frame or a survey.design.

...

el_augment_strata_aux(
  aux_design_full,
  strata_factor,
  weights_full,
  N_pop,
  auxiliary_means
)

Empirical likelihood estimating equations for SRS

Description

Returns a function that evaluates the stacked EL system for \theta = (\beta, z, \lambda_x) with z = \operatorname{logit}(W). Blocks correspond to:

missingness model score equations in \beta,
the response-rate equation in W,
auxiliary moment constraints in \lambda_x.

Usage

el_build_equation_system(
  family,
  missingness_model_matrix,
  auxiliary_matrix,
  respondent_weights,
  N_pop,
  n_resp_weighted,
  mu_x_scaled
)

Details

When no auxiliaries are present the last block is omitted. The system matches QLS equations 7-10. We cap \eta, clip w_i in ratios, and guard D_i away from zero to ensure numerical stability.

Guarding policy:

Cap \eta: eta <- pmax(pmin(eta, get_eta_cap()), -get_eta_cap()).
Compute w <- family$linkinv(eta) and clip to [1e-12, 1 - 1e-12] when used in ratios.
Denominator floor: Di <- pmax(Di_raw, nmar_get_el_denom_floor()). In the Jacobian, terms that depend on d(1/Di)/d(.) are multiplied by active = 1(Di_raw > floor) to match the clamped equations.

Empirical likelihood equations for survey designs

Description

Returns a function that evaluates the stacked EL system for survey designs using design weights. Unknowns are \theta = (\beta, z, \lambda_W, \lambda_x) with z = \operatorname{logit}(W). Blocks correspond to:

response-model score equations in \beta,
the response-rate equation in W based on \sum d_i (w_i - W)/D_i = 0,
auxiliary moment constraints \sum d_i (X_i - \mu_x)/D_i = 0,
and the design-based linkage between \lambda_W and the nonrespondent total: T_0/(1-W) - \lambda_W \sum d_i / D_i = 0, where T_0 = N_{\mathrm{pop}} - \sum d_i on the analysis scale.

Usage

el_build_equation_system_survey(
  family,
  missingness_model_matrix,
  auxiliary_matrix,
  respondent_weights,
  N_pop,
  n_resp_weighted,
  mu_x_scaled
)

Details

When all design weights are equal and N_{\mathrm{pop}} and the respondent count match the simple random sampling setup, this system reduces to the QLS equations 6-10.

Empirical likelihood analytical jacobian for srs

Description

Builds the block Jacobian A = \partial F/\partial \theta for the EL system with \theta = (\beta, z, \lambda_x) and z = \operatorname{logit}(W). Blocks follow QLS equations 7-10.

Usage

el_build_jacobian(
  family,
  missingness_model_matrix,
  auxiliary_matrix,
  respondent_weights,
  N_pop,
  n_resp_weighted,
  mu_x_scaled
)

Empirical likelihood analytical jacobian for survey designs

Description

Empirical likelihood analytical jacobian for survey designs

Usage

el_build_jacobian_survey(
  family,
  missingness_model_matrix,
  auxiliary_matrix,
  respondent_weights,
  N_pop,
  n_resp_weighted,
  mu_x_scaled
)

Usage

el_check_auxiliary_inconsistency_matrix(
  auxiliary_matrix_resp,
  provided_means = NULL
)

Arguments

auxiliary_matrix_resp

Respondent-side auxiliary design matrix.

provided_means

Computes the capped linear predictor, response probabilities, derivatives, and stable scores with respect to the linear predictor for a given family. Centralizes the numerically delicate pieces (capping, clipping, score derivatives) to be reused in EL equations and jacobian.

Usage

el_core_eta_state(family, eta_raw, eta_cap)

Arguments

family

Response family.

eta_raw

Numeric vector of unconstrained linear predictors.

eta_cap

Scalar cap applied symmetrically to eta_raw.

Value

A list with components:

eta: Capped linear predictor.
w: Mean function family$linkinv(eta).
w_clipped: w clipped to [1e-12, 1-1e-12] for use in ratios.
mu_eta: Derivative family$mu.eta(eta).
d2mu: Second derivative family$d2mu.deta2(eta) when available, otherwise NULL.
s_eta: Score with respect to eta.
ds_eta_deta: Derivative of s_eta with respect to eta when d2mu is available, otherwise NULL.

Compute denominator

Description

Compute denominator

Usage

el_denominator(lambda_W, W, Xc_lambda, p_i, floor)

Arguments

lambda_W

numeric scalar

W

numeric scalar in (0,1)

Xc_lambda

numeric vector (X_centered %*% lambda_x) or 0

p_i

numeric vector of response probabilities

floor

numeric scalar > 0, denominator floor

Value

list with denom, active, inv, inv_sq

Empirical likelihood engine for NMAR

Description

Constructs an engine specification for the empirical likelihood estimator of a full-data mean under nonignorable nonresponse.

Usage

el_engine(
  standardize = TRUE,
  trim_cap = Inf,
  on_failure = c("return", "error"),
  variance_method = c("bootstrap", "none"),
  bootstrap_reps = 500,
  auxiliary_means = NULL,
  control = list(),
  strata_augmentation = TRUE,
  n_total = NULL,
  start = NULL,
  family = c("logit", "probit")
)

Arguments

standardize

logical; standardize predictors. Default TRUE.

trim_cap

numeric; cap for EL weights (Inf = no trimming).

on_failure

character; "return" or "error" on solver failure.

variance_method

character; one of "bootstrap" or "none".

bootstrap_reps

integer; number of bootstrap replicates when variance_method = "bootstrap".

auxiliary_means

named numeric vector; population means for auxiliary design columns. Names must match the materialized model.matrix column names on the first RHS after formula expansion, e.g., factor indicator columns created by model.matrix() or transformed terms like I(X^2). Auxiliary intercepts are always dropped automatically, so do not supply (Intercept). If NULL (default) and the outcome contains at least one NA, auxiliary means are estimated from the full input (including nonrespondents): IID uses unweighted column means of the auxiliary design, while survey designs use the design-weighted means based on weights(design). This corresponds to the QLS case where \mu_x is replaced by the full-sample mean \bar X when auxiliary variables are observed for all sampled units.

control

Optional solver configuration forwarded to nleqslv::nleqslv(). Provide a single list that may include solver tolerances (e.g., xtol, ftol, maxit) and, optionally, top-level entries global and xscalm for globalization and scaling. Example: control = list(maxit = 500, xtol = 1e-10, ftol = 1e-10, global = "qline", xscalm = "auto").

strata_augmentation

logical; when TRUE (default), survey designs with an identifiable strata structure are augmented with stratum indicators and corresponding population shares in the auxiliary block (proposed by Wu 2005). Has no effect for data.frame inputs or survey designs without strata.

n_total

numeric; optional when supplying respondents-only data (no NA in the outcome). For data.frame inputs, set to the total number of sampled units before filtering to respondents. For survey.design inputs, set to the total design-weight total on the same analysis scale as weights(design) (default sum(weights(design))). If omitted and the outcome contains no NAs, the estimator errors, requesting n_total.

start

list; optional starting point for the solver. Fields:

beta: named numeric vector of missingness-model coefficients on the original scale, including (Intercept).
W or z: starting value for population response rate (0 < W < 1) or its logit (z). If both are provided, z takes precedence.
lambda: named numeric vector of auxiliary multipliers on the original scale (names must match auxiliary design columns). Values are mapped to the scaled space internally.

family

Missingness model family. Either "logit" (default) or "probit", or a custom family object: a list with components name, linkinv, mu.eta, score_eta, and optionally d2mu.deta2. When d2mu.deta2 is absent the solver uses Broyden/numeric Jacobians.

Details

The implementation follows Qin, Leung, and Shao (2002): the response mechanism is modeled as w(y, x; \beta) = P(R = 1 \mid Y = y, X = x) and the joint distribution of (Y, X) is represented nonparametrically by respondent masses that satisfy empirical likelihood constraints. The mean is estimated as a respondent weighted mean with weights proportional to \tilde w_i = a_i / D_i(\beta, W, \lambda), where a_i are base weights (a_i \equiv 1 for IID data and a_i = d_i for survey designs) and D_i is the EL denominator.

For data.frame inputs the estimator solves the Qin-Leung-Shao estimating equations for (\beta, W, \lambda_x) with W reparameterized as z = \operatorname{logit}(W), and profiles out the response multiplier \lambda_W using the closed-form QLS identity (their Eq. 10). For survey.design inputs the estimator uses a design-weighted analogue (Chen and Sitter 1999, Wu 2005) with an explicit \lambda_W and an additional linkage equation involving the nonrespondent design-weight total T_0.

Numerical stability:

W is optimized on the logit scale so 0 < W < 1.
The response-model linear predictor is capped and EL denominators D_i are floored at a small positive value. The analytic Jacobian is consistent with this guard via an active-set mask.
Optional trimming (trim_cap) is applied only after solving, to the unnormalized masses \tilde w_i = a_i/D_i. This changes the returned weights and therefore the point estimate.

Formula syntax and data constraints: nmar() accepts a partitioned right-hand side y_miss ~ auxiliaries | response_only. Variables left of | enter auxiliary moment constraints. Variables right of | enter only the response model. The outcome (LHS) is always included as a response-model predictor through the evaluated LHS expression. Explicit use of the outcome on the RHS is rejected. The response model always includes an intercept, while the auxiliary block never includes an intercept.

To include a covariate in both the auxiliary constraints and the response model, repeat it on both sides, e.g. y_miss ~ X | X.

Auxiliary means: If auxiliary_means = NULL (default) and the outcome contains at least one NA, auxiliary means are estimated from the full input and used as \bar X in the QLS constraints. For respondents-only data (no NA in the outcome), n_total must be supplied, and if the auxiliary RHS is non-empty, auxiliary_means must also be supplied. When standardize = TRUE, supply auxiliary_means on the original data scale, the engine applies the same standardization internally.

Survey scale: For survey.design inputs, n_total must be on the same analysis scale as weights(design). The default is sum(weights(design)).

Convergence and identification: the stacked EL system can have multiple solutions. Adding response-only predictors (variables to the right of |) can make the problem sensitive to starting values. Inspect diagnostics such as jacobian_condition_number and consider supplying start = list(beta = ..., W = ...) when needed.

Variance: The EL engine supports bootstrap standard errors via variance_method = "bootstrap" or can skip variance with variance_method = "none".

Bootstrap uses no additional packages for IID resampling, and will run sequentially by default. If the suggested future.apply package is installed, IID bootstrap can use future.apply::future_lapply() according to the user's future::plan() for parallel execution. Bootstrap backend is controlled by the package option nmar.bootstrap_apply:

"auto": (default) Use base::lapply() unless the current future plan has more than one worker, in which case use future.apply::future_lapply() if available.
"base": Always use base::lapply(), even if future.apply is installed.
"future": Always use future.apply::future_lapply().

For survey.design inputs, replicate-weight bootstrap requires the suggested packages survey and svrep.

Value

A list of class "nmar_engine_el" containing configuration fields to be supplied to nmar() together with a formula and data.

References

Qin, J., Leung, D., and Shao, J. (2002). Estimation with survey data under nonignorable nonresponse or informative sampling. Journal of the American Statistical Association, 97(457), 193-200. doi:10.1198/016214502753479338

Chen, J., and Sitter, R. R. (1999). A pseudo empirical likelihood approach for the effective use of auxiliary information in complex surveys. Statistica Sinica, 9, 385-406.

Wu, C. (2005). Algorithms and R codes for the pseudo empirical likelihood method in survey sampling. Survey Methodology, 31(2), 239-243.

Examples

set.seed(1)
n <- 200
X <- rnorm(n)
Y <- 2 + 0.5 * X + rnorm(n)
p <- plogis(-0.7 + 0.4 * scale(Y)[, 1])
R <- runif(n) < p
if (all(R)) R[1] <- FALSE
df <- data.frame(Y_miss = Y, X = X)
df$Y_miss[!R] <- NA_real_

# Estimate auxiliary mean from full data
eng <- el_engine(auxiliary_means = NULL, variance_method = "none")

# Put X in both the auxiliary block and the response model
fit <- nmar(Y_miss ~ X | X, data = df, engine = eng)
summary(fit)


# Response-only predictors can be placed to the right of |
Z <- rnorm(n)
df2 <- data.frame(Y_miss = Y, X = X, Z = Z)
df2$Y_miss[!R] <- NA_real_
eng2 <- el_engine(auxiliary_means = NULL, variance_method = "none")
fit2 <- nmar(Y_miss ~ X | Z, data = df2, engine = eng2)
print(fit2)

# Survey design usage
if (requireNamespace("survey", quietly = TRUE)) {
  des <- survey::svydesign(ids = ~1, weights = ~1, data = df)
  eng3 <- el_engine(auxiliary_means = NULL, variance_method = "none")
  fit3 <- nmar(Y_miss ~ X, data = des, engine = eng3)
  summary(fit3)
}

Core of the empirical likelihood estimator

Description

Core of the empirical likelihood estimator

Usage

el_estimator_core(
  missingness_design,
  aux_matrix,
  aux_means,
  respondent_weights,
  analysis_data,
  outcome_expr,
  N_pop,
  formula,
  standardize,
  trim_cap,
  control,
  on_failure,
  family = logit_family(),
  variance_method,
  bootstrap_reps,
  start = NULL,
  trace_level = 0,
  auxiliary_means = NULL
)

Arguments

missingness_design

Respondent-side missingness model design matrix (intercept + predictors).

aux_matrix

Auxiliary design matrix on respondents (may have zero columns).

aux_means

Named numeric vector of auxiliary population means (aligned to columns of aux_matrix).

respondent_weights

Numeric vector of respondent weights aligned with missingness_design rows.

analysis_data

Data object used for logging and variance.

outcome_expr

Character string identifying the outcome expression displayed in outputs.

N_pop

Population size on the analysis scale.

formula

Original model formula used for estimation.

standardize

Logical. Whether to standardize predictors during estimation.

trim_cap

Numeric. Upper bound for empirical likelihood weight trimming.

control

List of control parameters for the nonlinear equation solver.

on_failure

Character. Action when solver fails.

family

List. Link function specification.

variance_method

Character. Variance estimation method.

bootstrap_reps

Integer. Number of bootstrap replications.

auxiliary_means

Named numeric vector of known population means supplied by the user.

Value

List containing estimation results, diagnostics, and metadata.

Extract strata factor

Description

Looks for strata already materialized in the survey.design object. When unavailable, attempts to reconstruct strata from the original svydesign() call. When multiple stratification variables are supplied, their interaction is used.

Usage

el_extract_strata_factor(design)

Compute lambda_W

Description

Compute lambda_W

Usage

el_lambda_W(C_const, W)

Arguments

C_const

numeric scalar: (N_pop / sum(d_resp) - 1)

W

el_masses(weights, denom, floor, trim_cap)

Arguments

weights

numeric respondent base weights (d_i)

denom

numeric denominators Di after floor guard

floor

numeric small positive guard

trim_cap

numeric cap (>0) or Inf

Value

list with mass_untrim, mass_trimmed, prob_mass, trimmed_fraction

Compute the mean

Description

Compute the mean

Usage

el_mean(prob_mass, y)

Input preprocessing

Description

Parses the two-part Formula, constructs EL design matrices, injects the respondent delta indicator, attaches weights and survey metadata, and returns the pieces needed by the EL core.

Usage

el_prepare_inputs(
  formula,
  data,
  weights = NULL,
  n_total = NULL,
  design_object = NULL
)

Details

Enforeces the following format required by the rest of el code:

LHS references exactly one outcome source variable in data; any transforms are applied via the formula environment and must be defined for all respondent rows.
The outcome is never allowed to appear on RHS1 (auxiliaries) or RHS2 (missingness predictors), either explicitly in the formula or implicitly via dot (.) expansion. The missingness model uses the evaluated LHS expression as a dedicated predictor column instead.
RHS1 always yields an intercept-free auxiliary design matrix with k-1 coding for factor auxiliaries, regardless of user +0/-1 syntax or custom contrasts. Auxiliary columns are validated to be fully observed and non-constant among respondents.
RHS2 always yields a missingness-design matrix for respondents that includes an intercept column and zero-variance predictors emit warnings. NA among respondents is rejected.
respondent_mask is defined from the raw outcome in data, not from the transformed LHS. An injected ..nmar_delta.. indicator in analysis_data must match this mask.
N_pop is the analysis-scale population size: for IID it is nrow(data) unless overridden by n_total. For survey designs it is sum(weights) or n_total when supplied.

Prepare nleqslv args

Description

Prepare nleqslv args

Usage

el_prepare_nleqslv(control)

Auxiliary design computation

Description

Computes the respondent-side auxiliary matrix and the population means vector used for centering X - \mu_x. When auxiliary_means is supplied, only respondent rows are required to be fully observed. NA values are permitted on nonrespondent rows. When auxiliary_means is NULL, auxiliaries must be fully observed in the full data used to estimate population means.

Usage

el_resolve_auxiliaries(
  aux_design_full,
  respondent_mask,
  auxiliary_means,
  weights_full = NULL
)

Solver orchestration

Description

Solver orchestration

Usage

el_run_solver(
  equation_system_func,
  analytical_jac_func,
  init,
  final_control,
  top_args,
  solver_method,
  use_solver_jac,
  K_beta,
  K_aux,
  respondent_weights,
  N_pop,
  trace_level = 0
)

Arguments

equation_system_func

Function mapping parameter vector to equation residuals.

analytical_jac_func

Analytic Jacobian function; may be NULL if unavailable or when forcing Broyden.

init

Numeric vector of initial parameter values.

final_control

List passed to nleqslv::nleqslv(control = ...).

top_args

List of top-level nleqslv::nleqslv args (e.g., global, xscalm).

solver_method

Character; one of "auto", "newton", or "broyden".

use_solver_jac

Logical; whether to pass analytic Jacobian to Newton.

K_beta

Integer; number of response model parameters.

K_aux

Integer; number of auxiliary constraints.

respondent_weights

Numeric vector of base sampling weights.

N_pop

Numeric; population total.

trace_level

enforce_nonneg_weights(weights, tol = 1e-08)

Arguments

weights

numeric vector of weights.

tol

numeric tolerance below which negative values are treated as numerical noise and clipped to zero.

Details

Values below -tol are treated as clearly negative. Values in [-tol, 0) are clipped to zero.

Value

A list with components:

ok: logical; TRUE if no clearly negative weights were found.
message: character; diagnostic message when ok is FALSE, otherwise NULL.
weights: numeric vector of adjusted weights (original if ok is FALSE, otherwise with small negatives clipped to zero).

Extract engine configuration

Description

Extract engine configuration

Usage

engine_config(x)

Arguments

x

An object inheriting from class 'nmar_engine'.

Value

A named list of configuration fields.

Canonical engine name

Description

Returns identifier for an engine object.

Usage

engine_name(x)

Arguments

x

An object inheriting from class 'nmar_engine'.

Value

A single character string, e.g. "empirical_likelihood".

Exponential tilting estimator

Description

Generic for the exponential tilting (ET) estimator under NMAR. Methods are provided for 'data.frame' and 'survey.design'.

Usage

exptilt(data, ...)

Arguments

data

A 'data.frame' or a 'survey.design'.

...

Passed to class-specific methods.

Value

An engine-specific NMAR result object (for example nmar_result_exptilt).

Exponential tilting engine for NMAR

Description

Constructs a configuration for the exponential tilting estimator under nonignorable nonresponse. The estimator solves S_2(\boldsymbol{\phi}, \hat{\boldsymbol{\gamma}}) = 0, using nleqslv to apply EM algorithm.

Usage

exptilt_engine(
  standardize = FALSE,
  on_failure = c("return", "error"),
  variance_method = c("bootstrap", "none"),
  bootstrap_reps = 10,
  supress_warnings = FALSE,
  control = list(),
  family = c("logit", "probit"),
  y_dens = c("normal", "lognormal", "exponential", "binomial"),
  stopping_threshold = 1,
  sample_size = 2000
)

Arguments

standardize

logical; standardize predictors. Default TRUE.

on_failure

character; "return" or "error" on solver failure

variance_method

character; one of "bootstrap", or "none".

bootstrap_reps

integer; number of bootstrap replicates when variance_method = "bootstrap".

supress_warnings

Logical; suppress variance-related warnings.

control

Named list of control parameters passed to nleqslv::nleqslv. Common parameters include:

maxit: Maximum number of iterations (default: 100)
method: Solver method - "Newton" or "Broyden" (default: "Newton")
global: Global strategy - "dbldog", "pwldog", "qline", "gline", "hook", or "none" (default: "dbldog")
xtol: Tolerance for relative error in solution (default: 1e-8)
ftol: Tolerance for function value (default: 1e-8)
btol: Tolerance for backtracking (default: 0.01)
allowSingular: Allow singular Jacobians (default: TRUE)

See ?nleqslv::nleqslv for full details.

family

character; response model family, either "logit" or "probit", or a family object created by logit_family() / probit_family().

y_dens

Outcome density model ("auto", "normal", "lognormal", "exponential", or "binomial").

stopping_threshold

Numeric; early stopping threshold. If the maximum absolute value of the score function falls below this threshold, the algorithm stops early (default: 1).

sample_size

Integer; maximum sample size for stratified random sampling (default: 2000). When the dataset exceeds this size, a stratified random sample is drawn to optimize memory usage. The sampling preserves the ratio of respondents to non-respondents in the original data.

Details

The method is a robust Propensity-Score Adjustment (PSA) approach for Not Missing at Random (NMAR). It uses Maximum Likelihood Estimation (MLE), basing the likelihood on the observed part of the sample (f(\boldsymbol{Y}_i | \delta_i = 1, \boldsymbol{X}_i)), making it robust against outcome model misspecification. The propensity score is estimated by assuming an instrumental variable (X_2) that is independent of the response status given other covariates and the study variable. Estimator calculates fractional imputation weights w_i. The final estimator is a weighted average, where the weights are the inverse of the estimated response probabilities \hat{\pi}_i, satisfying the estimating equation:

\sum_{i \in \mathcal{R}} \frac{\boldsymbol{g}(\boldsymbol{Y}_i, \boldsymbol{X}_i ; \boldsymbol{\theta})}{\hat{\pi}_i} = 0,

where \mathcal{R} is the set of observed respondents.

Value

An engine object of class c("nmar_engine_exptilt","nmar_engine"). This is a configuration list; it is not a fit. Pass it to nmar.

References

Minsun Kim Riddles, Jae Kwang Kim, Jongho Im A Propensity-score-adjustment Method for Nonignorable Nonresponse Journal of Survey Statistics and Methodology, Volume 4, Issue 2, June 2016, Pages 215–245.

Examples


generate_test_data <- function(
  n_rows = 500,
  n_cols = 1,
  case = 1,
  x_var = 0.5,
  eps_var = 0.9,
  a = 0.8,
  b = -0.2
) {
# Generate X variables - fixed to match comparison
  X <- as.data.frame(replicate(n_cols, rnorm(n_rows, 0, sqrt(x_var))))
  colnames(X) <- paste0("x", 1:n_cols)

# Generate Y - fixed coefficients to match comparison
  eps <- rnorm(n_rows, 0, sqrt(eps_var))
  if (case == 1) {
# Use fixed coefficient of 1 for all x variables to match: y = -1 + x1 + epsilon
    X$Y <- as.vector(-1 + as.matrix(X) %*% rep(1, n_cols) + eps)
  }
  else if (case == 2) {
    X$Y <- -2 + 0.5 * exp(as.matrix(X) %*% rep(1, n_cols)) + eps
  }
  else if (case == 3) {
    X$Y <- -1 + sin(2 * as.matrix(X) %*% rep(1, n_cols)) + eps
  }
  else if (case == 4) {
    X$Y <- -1 + 0.4 * as.matrix(X)^3 %*% rep(1, n_cols) + eps
  }

  Y_original <- X$Y

# Missingness mechanism - identical to comparison
  pi_obs <- 1 / (1 + exp(-(a + b * X$Y)))

# Create missing values
  mask <- runif(nrow(X)) > pi_obs
  mask[1] <- FALSE # Ensure at least one observation is not missing
  X$Y[mask] <- NA

  return(list(X = X, Y_original = Y_original))
}
res_test_data <- generate_test_data(n_rows = 500, n_cols = 1, case = 1)
x <- res_test_data$X

exptilt_config <- exptilt_engine(
  y_dens = 'normal',
  control = list(maxit = 10),
  stopping_threshold = 0.1,
  standardize = FALSE,
  family = 'logit',
  bootstrap_reps = 5
)
formula = Y ~ x1
res <- nmar(formula = formula, data = x, engine = exptilt_config, trace_level = 1)
summary(res)

Nonparametric Exponential Tilting (Internal Generic)

Description

Nonparametric Exponential Tilting (Internal Generic)

Usage

exptilt_nonparam(data, ...)

Arguments

data

A data.frame or survey.design object

...

Other arguments passed to methods

Value

An engine-specific NMAR result object for the nonparametric exponential tilting estimator.

Nonparametric exponential tilting engine for NMAR

Description

Constructs a configuration for the nonparametric exponential tilting estimator under nonignorable nonresponse. This engine implements the "Fully Nonparametric Approach" from **Appendix 2** of Riddles et al. (2016). The estimator uses an Expectation-Maximization (EM) algorithm to directly estimate the nonresponse odds O(x_1, y) for aggregated, categorical data.

Usage

exptilt_nonparam_engine(refusal_col = "", max_iter = 100, tol_value = 1e-06)

Arguments

refusal_col

character; the column name in data that contains the aggregated counts of non-respondents (refusals).

max_iter

integer; the maximum number of iterations for the EM algorithm.

tol_value

numeric; the convergence tolerance for the EM algorithm. The loop stops when the sum of absolute changes in the odds matrix is less than this value.

Details

This engine is designed for cases where all variables (outcomes $Y$, response predictors $X_1$, and instrumental variables $X_2$) are categorical, and the input data is pre-aggregated into strata.

The method assumes an instrumental variable X_2 is available. The response probability is assumed to depend on X_1 and $Y$, but *not* on X_2.

The EM algorithm iteratively solves for the nonresponse odds:

O^{(t+1)}(x_1^*, y^*) = \frac{M_{y^*x_1^*}^{(t)}}{N_{y^*x_1^*}}

where M_{y^*x_1^*}^{(t)} is the expected count of non-respondents (calculated in the E-step) and N_{y^*x_1^*} is the observed count of respondents for a given stratum $(x_1, y)$.

The final output from the nmar call is an object containing data_to_return, an aggregated data frame where the original 'refusal' counts have been redistributed into the outcome columns (e.g., 'Voted_A', 'Voted_B') as expected non-respondent counts.

Value

An engine object of class c("nmar_engine_exptilt_nonparam","nmar_engine"). This is a configuration list; it is not a fit. Pass it to nmar.

References

Examples

# Test data (Riddles 2016, Table 9)
voting_data_example <- data.frame(
  Gender = rep(c("Male", "Male", "Male", "Male", "Female", "Female", "Female", "Female"), 1),
  Age_group = c("20-29", "30-39", "40-49", ">=50", "20-29", "30-39", "40-49", "50+"),
  Voted_A = c(93, 104, 146, 560, 106, 129, 170, 501),
  Voted_B = c(115, 233, 295, 350, 159, 242, 262, 218),
  Other = c(4, 8, 5, 3, 8, 5, 5, 7),
  Refusal = c(28, 82, 49, 174, 62, 70, 69, 211),
  Total = c(240, 427, 495, 1087, 335, 446, 506, 937)
)

np_em_config <- exptilt_nonparam_engine(
  refusal_col = "Refusal",
  max_iter = 100,
  tol_value = 0.001
)

# Formula: Y1 + Y2 + ... ~ X1_vars | X2_vars
# Here, Y = Voted_A, Voted_B, Other
#      x1 = Gender (response model)
#      x2 = Age_group (instrumental variable)
em_formula <- Voted_A + Voted_B + Other ~ Gender | Age_group


results_em_np <- nmar(
  formula = em_formula,
  data = voting_data_example,
  engine = np_em_config,
  trace_level = 0
)

# View the final adjusted counts
# (Original counts + expected non-respondent counts)
print(results_em_np$data_final)

Extract top-level nleqslv arguments from a control-like list

Description

Extract top-level nleqslv arguments from a control-like list

Usage

extract_nleqslv_top(ctrl)

Default fitted values for NMAR results

Description

Returns fitted response probabilities.

Usage

## S3 method for class 'nmar_result'
fitted(object, ...)

Arguments

object

An 'nmar_result' object.

...

Ignored.

Value

A numeric vector (possibly length 0).

Formatter for engines

Description

Returns a single concise line summarizing an engine configuration.

Usage

## S3 method for class 'nmar_engine'
format(x, ...)

Arguments

x

An engine object inheriting from 'nmar_engine'.

...

Unused.

Value

A length-1 character vector.

Default formula for NMAR results

Description

Returns the estimation formula if available.

Usage

## S3 method for class 'nmar_result'
formula(x, ...)

Arguments

x

An 'nmar_result' object.

...

Ignored.

Value

A formula or 'NULL'.

Generate conditional density

Description

Generate conditional density

Usage

generate_conditional_density(model)

Arguments

model

An internal exptilt object

Glance summary for NMAR results

Description

One-row diagnostics for NMAR fits.

Usage

## S3 method for class 'nmar_result'
glance(x, ...)

Arguments

x

An object of class 'nmar_result'.

...

Ignored.

Value

A one-row data frame with diagnostics and metadata.

Construct logit response family

Description

Construct logit response family

Usage

logit_family()

Usage

new_nmar_result(...)

Details

Engine-level constructors should call this helper with named arguments rather than assembling result lists by hand. At minimum, engines should supply estimate (numeric scalar) and converged (logical). All other fields are optional:

estimate_name: label for the primary estimand (defaults to NA_character_ if omitted).
se: standard error for the primary estimand (defaults to NA_real_ when not available).
model, weights_info, sample, inference, diagnostics, meta, extra: lists that may be partially pecified or NULL; validate_nmar_result() will back-fill missing subfields with safe defaults. item class: engine-specific result subclass name, e.g. "nmar_result_el"; it is combined with the parent class "nmar_result".

Calling new_nmar_result() ensures that every engine returns objects that satisfy the shared schema and are immediately compatible with parent S3 methods such as vcov(), confint(), tidy(), glance(), and weights().

Construct EL Result Object

Description

Construct EL Result Object

Usage

new_nmar_result_el(
  y_hat,
  se,
  weights,
  coefficients,
  vcov,
  converged,
  diagnostics,
  inputs,
  nmar_scaling_recipe,
  fitted_values,
  call,
  formula = NULL
)

Not Missing at Random Estimation

Description

Interface for NMAR estimation. nmar() validates basic inputs and dispatches to an engine (for example el_engine). The engine controls the estimation method and interprets formula. See the engine documentation for model-specific requirements.

Usage

nmar(formula, data, engine, trace_level = 0)

Arguments

formula

A two-sided formula. Engines support a partitioned right-hand side via |, for example y_miss ~ block1_vars | block2_vars. The meaning of these blocks is engine-specific (see the engine documentation). In the common "missing values indicate nonresponse" workflow, the left-hand side is the outcome with NA values for nonrespondents.

data

A data.frame or a survey.design containing the variables referenced by formula.

engine

An NMAR engine configuration object created by el_engine, exptilt_engine, or exptilt_nonparam_engine. This object defines the estimation method and tuning parameters.

trace_level

Integer 0-3; controls verbosity during estimation (default 0):

0: no output,
1: major steps only (initialization, convergence, final results),
2: iteration summaries and key diagnostics,
3: full diagnostic output.

Value

An object of class "nmar_result" with an engine-specific subclass (for example "nmar_result_el"). Use summary(), se, confint(), weights(), coef(), fitted(), and generics::tidy() / generics::glance() to access estimates, standard errors, weights, and diagnostics.

Examples

set.seed(1)
n <- 200
x1 <- rnorm(n)
z1 <- rnorm(n)
y_true <- 0.5 + 0.3 * x1 + 0.2 * z1 + rnorm(n, sd = 0.3)
resp <- rbinom(n, 1, plogis(2 + 0.1 * y_true + 0.1 * z1))
if (all(resp == 1)) resp[sample.int(n, 1)] <- 0L
y_obs <- ifelse(resp == 1, y_true, NA_real_)

# Empirical likelihood engine
df_el <- data.frame(Y_miss = y_obs, X = x1, Z = z1)
eng_el <- el_engine(variance_method = "none")
fit_el <- nmar(Y_miss ~ X | Z, data = df_el, engine = eng_el)
summary(fit_el)


# Exponential tilting engine
dat_et <- data.frame(y = y_obs, x2 = z1, x1 = x1)
eng_et <- exptilt_engine(
  y_dens = "normal",
  family = "logit",
  variance_method = "none"
)
fit_et <- nmar(y ~ x2 | x1, data = dat_et, engine = eng_et)
summary(fit_et)

# Survey design
if (requireNamespace("survey", quietly = TRUE)) {
  w <- runif(n, 0.5, 2)
  des <- survey::svydesign(ids = ~1, weights = ~w,
                           data = data.frame(Y_miss = y_obs, X = x1, Z = z1))
  eng_svy <- el_engine(variance_method = "none")
  fit_svy <- nmar(Y_miss ~ X | Z, data = des, engine = eng_svy)
  summary(fit_svy)
}

# Bootstrap variance usage
# future.apply is optional, if installed, bootstrap may run in parallel under
# the user's future::plan()
set.seed(2)
eng_boot <- el_engine(
  variance_method = "bootstrap",
  bootstrap_reps = 20
)
fit_boot <- nmar(Y_miss ~ X | Z, data = df_el, engine = eng_boot)
se(fit_boot)

Format a number with fixed decimal places using nmar.digits

Description

Format a number with fixed decimal places using nmar.digits

Usage

nmar_fmt_num(x, digits = nmar_get_digits())

Format an abridged call line for printing

Description

Builds a concise one-line summary of the original call without materializing large objects (e.g., full data frames). Intended for use by print/summary methods.

Usage

nmar_format_call_line(x)

Usage

nmar_get_numeric_settings()

Details

Centralized access to numeric thresholds used across the package.

- 'nmar.eta_cap': scalar > 0. Caps the response-model linear predictor to avoid extreme link values in Newton updates. Default 50. - 'nmar.grad_eps': finite-difference step size epsilon for numeric gradients of smooth functionals. Default 1e-6. - 'nmar.grad_d': relative step adjustment for numeric gradients. Default 1e-3.

Value

A named list with entries 'eta_cap', 'grad_eps', and 'grad_d'.

Internal helpers for nmar_result objects

Description

Internal helpers for nmar_result objects

Usage

nmar_result_get_estimate(x)

Polish Household Budget Data with Simulated Nonignorable Nonresponse

Description

This dataset is derived from the 'h05' dataset (Polish household budgets for 2005) found in the 'RClas' package. The original data was cleaned to remove all rows with missing values.

Usage

polish_households

Format

A data frame with 19,330 rows and 17 columns. The key variables are:

class: TODO
voi: TODO
bio: TODO
type: TODO
d345: TODO
d347: TODO
d348: TODO
d36: TODO
d38: TODO
d61: TODO
noper: TODO
income: TODO
expenditure: TODO
y_exp: Numeric. The **true** scaled expenditure ('expenditure / mean(expenditure)'). This is the complete study variable without missingness.
resp: TODO
R: Integer. The simulated response indicator (1=responded, 0=nonresponse).
y_exp_miss: Numeric. The **observed** scaled expenditure, containing 7,778 'NA' values where 'R = 0'. This is the variable to be used as the NMAR-affected outcome.

Details

To create a realistic test case for nonignorable nonresponse (NMAR), a nonresponse mechanism was simulated and applied to the scaled expenditure variable ('y_exp').

The key simulation steps were: 1. 'y_exp' (true study variable) was created by scaling total expenditure. 2. A true response probability ('resp') was created using the logistic model: 'plogis(1 - 0.6 * y_exp)'. 3. A response indicator ('R') was simulated based on this probability. 4. The final variable 'y_exp_miss' was generated by setting 'y_exp' to 'NA' wherever 'R' was 0.

The response is **nonignorable** because the probability of missingness depends directly on the value of the expenditure variable itself.

Source

TODO

Prepare scaled matrices and moments

Description

Prepare scaled matrices and moments

Usage

prepare_nmar_scaling(
  Z_un,
  X_un,
  mu_x_un,
  standardize,
  weights = NULL,
  weight_mask = NULL
)

Arguments

Z_un

response model matrix (with intercept column).

X_un

auxiliary model matrix (no intercept), or NULL.

mu_x_un

named numeric vector of auxiliary means on the original scale (names must match colnames(X_un)), or NULL.

standardize

logical; apply standardization if TRUE.

weights

Optional numeric vector used for weighted scaling.

weight_mask

Optional logical mask or nonnegative numeric multipliers applied to weights.

Value

A list with components Z, X, mu_x, and recipe.

Print method for engines

Description

Compact summary for 'nmar_engine' objects.

Usage

## S3 method for class 'nmar_engine'
print(x, ...)

Arguments

x

An engine object inheriting from 'nmar_engine'.

...

Unused.

Value

'x', invisibly.

Print method for nmar_result

Description

Print method for nmar_result

Usage

## S3 method for class 'nmar_result'
print(x, ...)

Arguments

x

nmar_result object

...

Additional parameters

Value

'x', invisibly.

Print method for EL results

Description

Print for objects of class nmar_result_el.

Usage

## S3 method for class 'nmar_result_el'
print(x, ...)

Arguments

x

An object of class nmar_result_el.

...

Ignored.

Value

x, invisibly.

Print method for Exponential Tilting results (engine-specific)

Description

This print method is tailored for 'nmar_result_exptilt' objects and shows a concise, human-friendly summary of the estimation result together with exptilt-specific diagnostics (loss, iterations) and a compact view of the response coefficients stored in the fitted model.

Usage

## S3 method for class 'nmar_result_exptilt'
print(x, ...)

Arguments

x

An object of class 'nmar_result_exptilt'.

...

Ignored.

Value

'x', invisibly.

Print method for summary.nmar_result

Description

Print method for summary.nmar_result

Usage

## S3 method for class 'summary_nmar_result'
print(x, ...)

Arguments

x

summary_nmar_result object

...

Additional parameters

Value

'x', invisibly.

Construct probit response family

Description

Construct probit response family

Usage

probit_family()

Value

A list with components name, linkinv, mu.eta, d2mu.deta2, and score_eta.

Riddles Simulation, Case 1: Linear Mean

Description

A simulated dataset of 500 observations based on Simulation Study I (Model 1, Case 1) of Riddles, Kim, and Im (2016). The data features a nonignorable nonresponse (NMAR) mechanism where the response probability depends on the study variable 'y'.

Usage

riddles_case1

Format

A data frame with 500 rows and 4 variables:

x: Numeric. The auxiliary variable, x ~ Normal(0, 0.5).
y: Numeric. The study variable with nonignorable nonresponse. 'y' contains 'NA's for nonrespondents.
y_true: Numeric. The complete, true value of 'y' before missingness was introduced.
delta: Integer. The response indicator (1 = responded, 0 = nonresponse).

Details

This dataset was generated using the following model parameters (n = 500):

Density for x:: x ~ Normal(mean = 0, variance = 0.5)
Density for error:: e ~ Normal(mean = 0, variance = 0.9)
True Model (Case 1):: y_true = -1 + x + e
Response Model (NMAR):: logit(pi) = 0.8 - 0.2 * y_true

Source

Riddles, M. K., Kim, J. K., & Im, J. (2016). A Propensity-Score-Adjustment Method for Nonignorable Nonresponse. Journal of Survey Statistics and Methodology, 4(1), 1-31.

Riddles Simulation, Case 2: Exponential Mean

Description

A simulated dataset of 500 observations based on Simulation Study I (Model 1, Case 2) of Riddles, Kim, and Im (2016). The data features a nonignorable nonresponse (NMAR) mechanism where the response probability depends on the study variable 'y'.

Usage

riddles_case2

Format

A data frame with 500 rows and 4 variables:

x: Numeric. The auxiliary variable, x ~ Normal(0, 0.5).
y: Numeric. The study variable with nonignorable nonresponse. 'y' contains 'NA's for nonrespondents.
y_true: Numeric. The complete, true value of 'y' before missingness was introduced.
delta: Integer. The response indicator (1 = responded, 0 = nonresponse).

Details

This dataset was generated using the following model parameters (n = 500):

Density for x:: x ~ Normal(mean = 0, variance = 0.5)
Density for error:: e ~ Normal(mean = 0, variance = 0.9)
True Model (Case 2):: y_true = -2 + 0.5 * exp(x) + e
Response Model (NMAR):: logit(pi) = 0.8 - 0.2 * y_true

Source

Riddles, M. K., Kim, J. K., & Im, J. (2016). A Propensity-Score-Adjustment Method for Nonignorable Nonresponse. Journal of Survey Statistics and Methodology, 4(1), 1-31.

Riddles Simulation, Case 3: Sine Wave Mean

Description

A simulated dataset of 500 observations based on Simulation Study I (Model 1, Case 3) of Riddles, Kim, and Im (2016). The data features a nonignorable nonresponse (NMAR) mechanism where the response probability depends on the study variable 'y'.

Usage

riddles_case3

Format

A data frame with 500 rows and 4 variables:

x: Numeric. The auxiliary variable, x ~ Normal(0, 0.5).
y: Numeric. The study variable with nonignorable nonresponse. 'y' contains 'NA's for nonrespondents.
y_true: Numeric. The complete, true value of 'y' before missingness was introduced.
delta: Integer. The response indicator (1 = responded, 0 = nonresponse).

Details

This dataset was generated using the following model parameters (n = 500):

Density for x:: x ~ Normal(mean = 0, variance = 0.5)
Density for error:: e ~ Normal(mean = 0, variance = 0.9)
True Model (Case 3):: y_true = -1 + sin(2 * x) + e
Response Model (NMAR):: logit(pi) = 0.8 - 0.2 * y_true

Source

Riddles, M. K., Kim, J. K., & Im, J. (2016). A Propensity-Score-Adjustment Method for Nonignorable Nonresponse. Journal of Survey Statistics and Methodology, 4(1), 1-31.

Riddles Simulation, Case 4: Cubic Mean

Description

A simulated dataset of 500 observations based on Simulation Study I (Model 1, Case 4) of Riddles, Kim, and Im (2016). The data features a nonignorable nonresponse (NMAR) mechanism where the response probability depends on the study variable 'y'.

Usage

riddles_case4

Format

A data frame with 500 rows and 4 variables:

x: Numeric. The auxiliary variable, x ~ Normal(0, 0.5).
y: Numeric. The study variable with nonignorable nonresponse. 'y' contains 'NA's for nonrespondents.
y_true: Numeric. The complete, true value of 'y' before missingness was introduced.
delta: Integer. The response indicator (1 = responded, 0 = nonresponse).

Details

This dataset was generated using the following model parameters (n = 500):

Density for x:: x ~ Normal(mean = 0, variance = 0.5)
Density for error:: e ~ Normal(mean = 0, variance = 0.9)
True Model (Case 4):: y_true = -1 + 0.4 * x^3 + e
Response Model (NMAR):: logit(pi) = 0.8 - 0.2 * y_true

Source

Riddles, M. K., Kim, J. K., & Im, J. (2016). A Propensity-Score-Adjustment Method for Nonignorable Nonresponse. Journal of Survey Statistics and Methodology, 4(1), 1-31.

Run method for EL engine

Description

Run method for EL engine

Usage

## S3 method for class 'nmar_engine_el'
run_engine(engine, formula, data, trace_level = 0)

Arguments

engine

An object of class nmar_engine_el.

formula

A two-sided formula passed through by nmar().

data

A data.frame or survey.design.

trace_level

Integer 0-3 controlling verbosity.

Value

An object of class nmar_result_el.

Parse nleqslv control list for compatibility

Description

Parse nleqslv control list for compatibility

Usage

sanitize_nleqslv_control(ctrl)

Map unscaled auxiliary multipliers to scaled space

Description

Map unscaled auxiliary multipliers to scaled space

Usage

scale_aux_multipliers(lambda_unscaled, recipe, columns)

Arguments

lambda_unscaled

named numeric vector of auxiliary multipliers aligned to auxiliary design columns on original scale.

recipe

Scaling recipe of class nmar_scaling_recipe.

columns

character vector of auxiliary column names (order) for the scaled design.

Value

numeric vector of multipliers in the scaled space.

Map unscaled coefficients to scaled space

Description

Map unscaled coefficients to scaled space

Usage

scale_coefficients(beta_unscaled, recipe, columns)

Arguments

beta_unscaled

named numeric vector of coefficients for the response model on the original scale, including an intercept named "(Intercept)".

recipe

Scaling recipe of class nmar_scaling_recipe, or NULL.

columns

character vector of column names (order) for the scaled design matrix (including intercept).

Value

numeric vector of coefficients in the scaled space, ordered by columns.

Extract standard error for NMAR results

Description

Returns the standard error of the primary mean estimate.

Usage

se(object, ...)

Arguments

object

An 'nmar_result' or subclass.

...

Ignored.

Value

Numeric scalar.

Weighted linear algebra

Description

Compute X' diag(w) X efficiently. If w >= 0, use SPD crossprod(X*sqrt(w)). Otherwise, fall back to X' (diag(w) X) via crossprod(X, X*w).

Usage

shared_weighted_gram(X, w)

Summary method for nmar_result

Description

Summary method for nmar_result

Usage

## S3 method for class 'nmar_result'
summary(object, conf.level = 0.95, ...)

Arguments

object

nmar_result object

conf.level

Confidence level for intervals.

...

Additional parameters

Value

An object of class 'summary_nmar_result'.

Summary method for EL results

Description

Summarize estimation, standard error and missingness-model coefficients.

Usage

## S3 method for class 'nmar_result_el'
summary(object, ...)

Arguments

object

An object of class nmar_result_el.

...

Ignored.

Value

An object of class summary_nmar_result_el.

Summary method for Exponential Tilting results (engine-specific)

Description

Summarize estimation, standard error and model coefficients.

Usage

## S3 method for class 'nmar_result_exptilt'
summary(object, conf.level = 0.95, ...)

Arguments

object

An object of class 'nmar_result_exptilt'.

conf.level

Confidence level for confidence interval (default 0.95).

...

Ignored.

Value

An object of class 'summary_nmar_result_exptilt'.

Tidy summary for NMAR results

Description

Return a data frame with the primary estimate and missingness-model coefficients.

Usage

## S3 method for class 'nmar_result'
tidy(x, conf.level = 0.95, ...)

Arguments

x

An object of class 'nmar_result'.

conf.level

Confidence level for the primary estimate.

...

Ignored.

Value

A data frame with one row for the primary estimate and, when available, additional rows for the response-model coefficients.

Trim weights by capping and proportional redistribution

Description

Applies a cap to a nonnegative weight vector and, when feasible, redistributes excess mass across the remaining positive entries so that the total sum is preserved. When the requested cap is too tight to preserve the total mass, all positive entries are set to the cap and the total sum decreases.

Usage

trim_weights(weights, cap, tol = 1e-12, warn_tol = 1e-08)

Arguments

weights

numeric vector of weights.

cap

positive numeric scalar; maximum allowed weight, or Inf to disable trimming.

tol

numeric tolerance used when testing whether a rescaling step respects the cap.

warn_tol

numeric tolerance used when testing whether the total sum has been preserved.

Details

Zero weights remain zero. Only entries that are positive after nonnegativity enforcement can absorb redistributed mass.

Internally, a simple water-filling style algorithm is used on the positive weights: the largest weights are successively saturated at the cap and the remaining weights are rescaled by a common factor chosen to maintain the total sum.

Value

A list with components:

weights: numeric vector of trimmed weights.
trimmed_fraction: fraction of entries at or very close to the cap (within tol).
preserved_sum: logical; TRUE if the total sum of weights is preserved to within warn_tol.
total_before: numeric; sum of the original weights.
total_after: numeric; sum of the trimmed weights.

Unscale coefficients and covariance

Description

Unscale coefficients and covariance

Usage

unscale_coefficients(scaled_coeffs, scaled_vcov, recipe)

Arguments

scaled_coeffs

named numeric vector of coefficients estimated on the scaled space.

scaled_vcov

covariance matrix of scaled_coeffs.

recipe

Scaling recipe of class nmar_scaling_recipe.

Value

A list with components coefficients and vcov.

Validate and apply scaling for engines

Description

Validate and apply scaling for engines

Usage

validate_and_apply_nmar_scaling(
  standardize,
  has_aux,
  response_model_matrix_unscaled,
  aux_matrix_unscaled,
  mu_x_unscaled,
  weights = NULL,
  weight_mask = NULL
)

Arguments

standardize

logical; apply standardization if TRUE.

has_aux

logical; whether the engine uses auxiliary constraints.

response_model_matrix_unscaled

response model matrix (with intercept).

aux_matrix_unscaled

auxiliary matrix (no intercept) or an empty matrix.

mu_x_unscaled

named auxiliary means on original scale, or NULL.

weights

Optional numeric vector used for weighted scaling.

weight_mask

Optional logical mask or nonnegative numeric multipliers applied to weights.

Value

A list with components nmar_scaling_recipe, response_model_matrix_scaled, auxiliary_matrix_scaled, and mu_x_scaled.

Validate Data for NMAR Analysis

Description

Little sanity-check for data

Usage

validate_data(data)

Arguments

data

Ensures both the child class and the parent schema are satisfied. The validator also back-fills defaults so downstream code can rely on the presence of optional components without defensive checks.

Usage

validate_nmar_result(x, class_name)

Details

This helper is the single authority on the 'nmar_result' schema. It expects a list that already carries class c(class_name, "nmar_result") and at least a primary estimate stored in y_hat. All other components are optional. When they are NULL or missing, the validator supplies safe defaults:

Core scalars: se (numeric, default NA_real_), estimate_name (character, default NA_character_), converged (logical, default NA).
model: list with coefficients and vcov, both defaulting to NULL.
weights_info: list with values (default NULL) and trimmed_fraction (default NA_real_).
sample: list with n_total, n_respondents, is_survey, and design, defaulted to missing/empty values.
inference: list with variance_method, df, and message, all defaulted to missing values.
diagnostics, meta, and extra: defaulted to empty lists, with meta carrying engine_name, call, and formula when unset.

Engine constructors should normally call new_nmar_result() rather than invoking this function directly. new_nmar_result() attaches classes and funnels all objects through validate_nmar_result() so downstream S3 methods can assume a consistent structure.

Variance-covariance for NMAR results

Description

Variance-covariance for NMAR results

Usage

## S3 method for class 'nmar_result'
vcov(object, ...)

Arguments

object

An object of class 'nmar_result'.

...

Ignored.

Value

A 1x1 numeric matrix (the variance of the primary estimate).

Aggregated Exit Poll Data for Gangdong-Gap (2012)

Description

This dataset contains the aggregated exit poll results for the Gangdong-Gap district in Seoul from the 2012 nineteenth South Korean legislative election. The data is transcribed directly from Table 9 of Riddles, Kim, and Im (2016).

Usage

voting

Format

A data frame with 8 rows and 7 variables:

Gender: Factor. The gender of the voter ("Male", "Female").
Age_group: Character. The age group of the voter.
Voted_A: Numeric. Count of respondents voting for Party A.
Voted_B: Numeric. Count of respondents voting for Party B.
Other: Numeric. Count of respondents voting for another party.
Refusal: Numeric. Count of sampled individuals who refused to respond (this is the nonresponse count).
Total: Numeric. Total individuals sampled in the group (Responders + Refusals).

Details

In the paper's application, 'Gender' is used as the nonresponse instrumental variable and 'Age_group' is the primary auxiliary variable .

Source

Riddles, M. K., Kim, J. K., & Im, J. (2016). A Propensity-Score-Adjustment Method for Nonignorable Nonresponse. *Journal of Survey Statistics and Methodology*, 4(1), 1–31. (Data from Table 9, p. 20).

Extract weights from an 'nmar_result'

Description

Return analysis weights stored in an 'nmar_result' as either probability-scale (summing to 1) or population-scale (summing to 'sample$n_total'). The function normalizes stored masses and attaches informative attributes.

Usage

## S3 method for class 'nmar_result'
weights(object, scale = c("probability", "population"), ...)

Arguments

object

An 'nmar_result' object.

scale

One of '"probability"' (default) or '"population"'.

...

Additional arguments (ignored).

Value

Numeric vector of weights with length equal to the number of respondents.