Worker function to make long form data set needed for CVTMLE targeting step when nested cv is used
.make_long_data_nested_cv( x, prediction_list, folds, gn, update = FALSE, epsilon_0 = 0, epsilon_1 = 0, tol = 0.001 )
x | The outer validation fold |
prediction_list | The full prediction list |
folds | Vector of CV folds |
gn | An estimate of the marginal dist. of Y |
update | Boolean of whether this is called for initial construction of the long data set or as part of the targeting loop. If the former, cross-validated empirical "density" estimates are used. If the latter these are derived from the targeted cdf. |
epsilon_0 | If |
epsilon_1 | Ditto above |
tol | A truncation level when taking logit transformations. |
A long form data list of a particular set up. Columns are named id (multiple per obs. in validation sample), u (if Yi = 0, these are the unique values of psi(x) in the inner validation samples for psi fit on inner training samples for obs with Y = 1, if Yi = 1, these are values of psi(x) in the inner validation samples for psi fit on inner training samples for obs. with Y = 0), Yi (this id's value of Y), Fn (cross-validation estimated value of the cdf of psi(X) given Y = Yi in the training sample), dFn (cross-validated estimate of the density of psi(X) given Y = (1-Yi) in the training sample), psi (the value of this observations Psihat(P_n,B_n^0)), gn (estimate of marginal of Y e.g., computed in whole sample), outcome (indicator that psix <= u), logit_Fn (the cdf estimate on the logit scale, needed for offset in targeting model).