Helper function to estimate the mediator distribution. Returns an n-length list, where each entry is a 2-length list corresponding to mediator distributions under each treatment assignment. Within each of these is another list where there are three entries corresponding to the bivariate distribution, and each marginal distribution.

estimate_Q_M(
  A,
  M1,
  M2,
  C,
  DeltaA,
  DeltaM,
  SL_Q_M,
  glm_Q_M = NULL,
  a_0,
  stratify,
  verbose = FALSE,
  return_models = FALSE,
  valid_rows,
  all_mediator_values,
  return_list_by_a_0 = FALSE,
  ...
)

Arguments

A

A vector of binary treatment assignment (assumed to be equal to 0 or 1).

M1

A vector of mediators.

M2

A vector of mediators.

C

A data.frame of named covariates.

DeltaA

Indicator of missing treatment (assumed to be equal to 0 if missing 1 if observed).

DeltaM

Indicator of missing outcome (assumed to be equal to 0 if missing 1 if observed).

glm_Q_M

A character describing a formula to be used in the call to glm for the outcome regression.

a_0

A list of fixed treatment values

stratify

A boolean indicating whether to estimate the outcome regression separately for observations with A equal to 0/1 (if TRUE) or to pool across A (if FALSE).

verbose

A boolean indicating whether to print status updates.

return_models

A boolean indicating whether to return model fits for the outcome regression, propensity score, and reduced-dimension regressions.

valid_rows

A list of length cvFolds containing the row indexes of observations to include in validation fold.

all_mediator_values

All combinations of M1 and M2

return_list_by_a_0

For power users, return the list prior to reformatting

...

Additional arguments (not currently used)

SL_Q

A vector of characters or a list describing the Super Learner library to be used for the outcome regression.

family

A character passed to SuperLearner

Details

The bivariate distribution is estimated by estimating the conditional distribution of M1 given A, C, and M2 and the marginal distribution of M1 given A and C. In each case, we use a hazard-based estimation approach for estimating these distributions. The