The broadly neutralizing antibodies (bNAbs) studied in this analysis are 10-1074 and PG9. The analysis considered 2 measures of neutralization sensitivity: estimated sensitivity and multiple sensitivity. Estimated IC\(_{50}\) was computed based on the additive model of Wagh et al. (2016); for \(J\) bNAbs, it is computed as \[ \mbox{estimated IC} = \left( \sum_{j=1}^J \mbox{IC}_j^{-1} \right)^{-1} \ , \] where \(\mbox{IC}_j\) denotes the measured IC\(_{50}\) for bNAb \(j\). Estimated sensitivity is defined by the binary indicator that estimated IC\(_{50}\) < 1. Multiple sensitivity is defined as the binary indicator of having measured IC\(_{50}\) < 1 for at least 1 bNAb. Based on this specification of bNAbs and outcomes:
441 sequences were extracted from the CATNAP database (Yoon et al. 2015);
441 sequences had complete geographic and genetic sequence information;
441 of these sequences had measured IC\(_{50}\);
out of the sequences with complete data, 385 were estimated to be sensitive to the combination of bNAbs, while 56 were estimated to be resistant;
out of the sequences with complete data, 384 were estimated to be sensitive to at least 1 bNAb, while 57 were estimated to be resistant to both bNAbs.
Prediction of each outcome was performed using a super learner ensemble (van der Laan, Polley, and Hubbard 2007) of several random forests (Breiman 2001) with varied tuning parameters, several gradient boosted trees (Chen and Guestrin 2016) with varied tuning parameters and several elastic net regressions (Zou and Hastie 2005) with varied tuning parameters and intercept-only regression. Each algorithm (excepting xgboost) was additionally implemented in combination with variable pre-screening procedures to ensure that all binary features had at least 0, 4 minority variants. This constituted a total of 5554/5567, 3232/5567 features, respectively.
The specific algorithms used in the learning process are described in Table 1.1.
Label | Description |
---|---|
rf_tune1 | random forest with mtry equal to one-half times square root of number of predictors |
rf_default | random forest with mtry equal to square root of number of predictors |
rf_tune2 | random forest with mtry equal to two times square root of number of predictors |
xgboost_default | boosted regression trees with maximum depth of 4 |
xgboost_tune3 | boosted regression trees with maximum depth of 8 |
xgboost_tune4 | boosted regression trees with maximum depth of 12 |
lasso_default | elastic net with \(\lambda\) selected by CV and \(\alpha\) equal to 0 |
lasso_tune1 | elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.25 |
lasso_tune2 | elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.5 |
lasso_tune3 | elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.75 |
mean | intercept only regression |
The predictive ability of the learner was assessed using cross-validation. The estimated cross-validated area under the receiver operating characteristic curve (AUC) of the learner for predicting binary sensitivity measures are shown in Table 1.2.
CV-AUC | Lower 95% CI | Upper 95% CI | |
---|---|---|---|
Estimated sensitivity | 0.815 | 0.640 | 0.916 |
Multiple sensitivity | 0.836 | 0.671 | 0.927 |
Out of the sequences with complete data, 385 were sensitive to the combination of bNAbs, while 56 were resistant, where estimated sensitivity was defined as the indicator that estimated IC\(_{50}\) was less than 1.
The weights assigned to each algorithm for Super Learner predicting estimated sensitivity are shown in Table 2.1.
Learner | Weight |
---|---|
rf_tune1_screen0 | 0.00 |
rf_default_screen0 | 0.00 |
rf_tune2_screen0 | 0.00 |
lasso_default_screen0 | 0.00 |
lasso_tune1_screen0 | 0.00 |
lasso_tune2_screen0 | 0.00 |
lasso_tune3_screen0 | 0.00 |
rf_tune1_screen4 | 0.00 |
rf_default_screen4 | 0.00 |
rf_tune2_screen4 | 0.76 |
lasso_default_screen4 | 0.00 |
lasso_tune1_screen4 | 0.00 |
lasso_tune2_screen4 | 0.00 |
lasso_tune3_screen4 | 0.00 |
xgboost_default | 0.24 |
xgboost_tune3 | 0.00 |
xgboost_tune4 | 0.00 |
mean | 0.00 |
The cross-validated area under the ROC curve of super learner predictions of estimated sensitivity relative to candidate algorithms is shown in Figure 2.1. Figure 2.2 shows cross-validated ROC curves for this endpoint.
The cross-validated area under the ROC curve of the learner with tuning parameters and optimal pre-screening selected via cross-validation and learners with each individual value of tuning parameters are shown in Figure 2.2.
Figure 2.2 shows the cross-validated ROC curve for predicting estimated sensitivity.
Out of the 441 sequences, 384 were estimated to be sensitive to the combination of bNAbs, while 57 were estimated to be resistant, where multiple sensitivity was defined as the indicator that measured IC\(_{50}\) was less than 1 for at least 1 bNAb.
The weights assigned to each algorithm for Super Learner predicting multiple sensitivity are shown in Table 3.1.
Learner | Weight |
---|---|
rf_tune1_screen0 | 0.00 |
rf_default_screen0 | 0.00 |
rf_tune2_screen0 | 0.40 |
lasso_default_screen0 | 0.00 |
lasso_tune1_screen0 | 0.00 |
lasso_tune2_screen0 | 0.00 |
lasso_tune3_screen0 | 0.00 |
rf_tune1_screen4 | 0.00 |
rf_default_screen4 | 0.49 |
rf_tune2_screen4 | 0.00 |
lasso_default_screen4 | 0.00 |
lasso_tune1_screen4 | 0.00 |
lasso_tune2_screen4 | 0.00 |
lasso_tune3_screen4 | 0.00 |
xgboost_default | 0.11 |
xgboost_tune3 | 0.00 |
xgboost_tune4 | 0.00 |
mean | 0.00 |
The cross-validated area under the ROC curve for the super learner in predicting multiple sensitivity relative to other candidate algorithms is shown in Figure 3.1. Figure 3.2 shows cross-validated ROC curves for this endpoint.
The cross-validated area under the ROC curve for the learner with tuning parameters and optimal pre-screening selected via cross-validation and for learners with each individual value of tuning parameters are shown in Figure 3.2.
Figure 3.2 shows the cross-validated ROC curve for predicting multiple sensitivity.
Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Chen, Tianqi, and Carlos Guestrin. 2016. “Xgboost: A Scalable Tree Boosting System.” In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–94. https://doi.org/10.1145/2939672.2939785.
van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1). https://doi.org/10.2202/1544-6115.1309.
Wagh, Kshitij, Tanmoy Bhattacharya, Carolyn Williamson, Alex Robles, Madeleine Bayne, Jetta Garrity, Michael Rist, et al. 2016. “Optimal Combinations of Broadly Neutralizing Antibodies for Prevention and Treatment of HIV-1 Clade C Infection.” PLoS Pathogens 12 (3). https://doi.org/10.1371/journal.ppat.1005520.
Yoon, Hyejin, Jennifer Macke, Anthony P West Jr, Brian Foley, Pamela J Bjorkman, Bette Korber, and Karina Yusim. 2015. “CATNAP: A Tool to Compile, Analyze and Tally Neutralizing Antibody Panels.” Nucleic Acids Research 43 (W1): W213–W219. https://doi.org/10.1093/nar/gkv404.
Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–20. https://doi.org/10.1111/j.1467-9868.2005.00503.x.