cv_auc.RdThis function computes K-fold cross-validated estimates of the area under the receiver operating characteristics (ROC) curve (hereafter, AUC). This quantity can be interpreted as the probability that a randomly selected case will have higher predicted risk than a randomly selected control.
cv_auc( Y, X, K = 10, learner = "glm_wrapper", nested_cv = TRUE, nested_K = K - 1, parallel = FALSE, max_cvtmle_iter = 10, cvtmle_ictol = 1/length(Y), prediction_list = NULL, ... )
| Y | A numeric vector of outcomes, assume to equal |
|---|---|
| X | A |
| K | The number of cross-validation folds (default is |
| learner | A wrapper that implements the desired method for building a
prediction algorithm. See See |
| nested_cv | A boolean indicating whether nested cross validation should
be used to estimate the distribution of the prediction function. Default ( |
| nested_K | If nested cross validation is used, how many inner folds should
there be? Default ( |
| parallel | A boolean indicating whether prediction algorithms should be
trained in parallel. Default to |
| max_cvtmle_iter | Maximum number of iterations for the bias correction
step of the CV-TMLE estimator (default |
| cvtmle_ictol | The CV-TMLE will iterate |
| prediction_list | For power users: a list of predictions made by |
| ... | Other arguments, not currently used |
An object of class "cvauc".
est_cvtmlecross-validated targeted minimum loss-based estimator of K-fold CV AUC
iter_cvtmleiterations needed to achieve convergence of CVTMLE algorithm
cvtmle_tracethe value of the CVTMLE at each iteration of the targeting algorithm
se_cvtmleestimated standard error based on targeted nuisance parameters
est_initplug-in estimate of CV AUC where nuisance parameters are estimated in the training sample
est_empiricalthe standard K-fold CV AUC estimator
se_empiricalestimated standard error for the standard estimator
est_onestepcross-validated one-step estimate of K-fold CV AUC
se_onestepestimated standard error for the one-step estimator
est_esteqcross-validated estimating equations estimate of K-fold CV AUC
se_esteqestimated standard error for the estimating equations estimator (same as for one-step)
foldslist of observation indexes in each validation fold
ic_cvtmleinfluence function evaluated at the targeted nuisance parameter estimates
ic_onestepinfluence function evaluated at the training-fold-estimated nuisance parameters
ic_esteqinfluence function evaluated at the training-fold-estimated nuisance parameters
ic_empiricalinfluence function evaluated at the validation-fold estimated nuisance parameters
prediction_lista list of output from the cross-validated model training; see the individual wrapper function documentation for further details
To estimate the AUC of a particular prediction algorithm, K-fold cross-validation is commonly used: data are partitioned into K distinct groups and the prediction algorithm is developed using K-1 of these groups. In standard K-fold cross-validation, the AUC of this prediction algorithm is estimated using the remaining fold. This can be problematic when the number of observations is small or the number of cross-validation folds is large.
Here, we estimate relevant nuisance parameters in the training sample and use
the validation sample to perform some form of bias correction -- either through
cross-validated targeted minimum loss-based estimation, estimating equations,
or one-step estimation. When aggressive learning algorithms are applied, it is
necessary to use an additional layer of cross-validation in the training sample
to estimate the nuisance parameters. This is controlled via the nested_cv
option below.
# simulate data n <- 200 p <- 10 X <- data.frame(matrix(rnorm(n*p), nrow = n, ncol = p)) Y <- rbinom(n, 1, plogis(X[,1] + X[,10])) # get cv auc estimates for logistic regression cv_auc_ests <- cv_auc(Y = Y, X = X, K = 5, learner = "glm_wrapper") # get cv auc estimates for random forest # using nested cross-validation for nuisance parameter estimationfit <- cv_auc(Y = Y, X = X, K = 5, learner = "randomforest_wrapper", nested_cv = TRUE)