1 Executive summary

The broadly neutralizing antibodies (bNAbs) studied in this analysis are 10-1074 and PG9. The analysis considered 2 measures of neutralization sensitivity: estimated sensitivity and multiple sensitivity. Estimated IC\(_{50}\) was computed based on the additive model of Wagh et al. (2016); for \(J\) bNAbs, it is computed as \[ \mbox{estimated IC} = \left( \sum_{j=1}^J \mbox{IC}_j^{-1} \right)^{-1} \ , \] where \(\mbox{IC}_j\) denotes the measured IC\(_{50}\) for bNAb \(j\). Estimated sensitivity is defined by the binary indicator that estimated IC\(_{50}\) < 1. Multiple sensitivity is defined as the binary indicator of having measured IC\(_{50}\) < 1 for at least 1 bNAb. Based on this specification of bNAbs and outcomes:

441 sequences were extracted from the CATNAP database (Yoon et al. 2015);
441 sequences had complete geographic and genetic sequence information;
441 of these sequences had measured IC\(_{50}\);
out of the sequences with complete data, 385 were estimated to be sensitive to the combination of bNAbs, while 56 were estimated to be resistant;
out of the sequences with complete data, 384 were estimated to be sensitive to at least 1 bNAb, while 57 were estimated to be resistant to both bNAbs.

Prediction of each outcome was performed using a super learner ensemble (van der Laan, Polley, and Hubbard 2007) of several random forests (Breiman 2001) with varied tuning parameters, several gradient boosted trees (Chen and Guestrin 2016) with varied tuning parameters and several elastic net regressions (Zou and Hastie 2005) with varied tuning parameters and intercept-only regression. Each algorithm (excepting xgboost) was additionally implemented in combination with variable pre-screening procedures to ensure that all binary features had at least 0, 4 minority variants. This constituted a total of 5554/5567, 3232/5567 features, respectively.

The specific algorithms used in the learning process are described in Table 1.1.

Table 1.1: Algorithms used in the super learner library. Each algorithm (excepting xgboost) was additionally implemented in combination with variable pre-screening procedures to ensure that all binary features had at least 0, 4 minority variants.
Label	Description
rf_tune1	random forest with `mtry` equal to one-half times square root of number of predictors
rf_default	random forest with `mtry` equal to square root of number of predictors
rf_tune2	random forest with `mtry` equal to two times square root of number of predictors
xgboost_default	boosted regression trees with maximum depth of 4
xgboost_tune3	boosted regression trees with maximum depth of 8
xgboost_tune4	boosted regression trees with maximum depth of 12
lasso_default	elastic net with \(\lambda\) selected by CV and \(\alpha\) equal to 0
lasso_tune1	elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.25
lasso_tune2	elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.5
lasso_tune3	elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.75
mean	intercept only regression

The predictive ability of the learner was assessed using cross-validation. The estimated cross-validated area under the receiver operating characteristic curve (AUC) of the learner for predicting binary sensitivity measures are shown in Table 1.2.

Table 1.2: Estimates of 5-fold cross-validated AUC for predictions of the binary-valued outcomes (n = 441).
	CV-AUC	Lower 95% CI	Upper 95% CI
Estimated sensitivity	0.815	0.640	0.916
Multiple sensitivity	0.836	0.671	0.927

2 Results for estimated sensitivity

2.1 Descriptive statistics

Out of the sequences with complete data, 385 were sensitive to the combination of bNAbs, while 56 were resistant, where estimated sensitivity was defined as the indicator that estimated IC\(_{50}\) was less than 1.

2.2 Super learner results

The weights assigned to each algorithm for Super Learner predicting estimated sensitivity are shown in Table 2.1.

Table 2.1: Table of super learner weights for estimated sensitivity (n = 441 observations).
Learner	Weight
rf_tune1_screen0	0.00
rf_default_screen0	0.00
rf_tune2_screen0	0.00
lasso_default_screen0	0.00
lasso_tune1_screen0	0.00
lasso_tune2_screen0	0.00
lasso_tune3_screen0	0.00
rf_tune1_screen4	0.00
rf_default_screen4	0.00
rf_tune2_screen4	0.76
lasso_default_screen4	0.00
lasso_tune1_screen4	0.00
lasso_tune2_screen4	0.00
lasso_tune3_screen4	0.00
xgboost_default	0.24
xgboost_tune3	0.00
xgboost_tune4	0.00
mean	0.00

2.3 Predictive performance

The cross-validated area under the ROC curve of super learner predictions of estimated sensitivity relative to candidate algorithms is shown in Figure 2.1. Figure 2.2 shows cross-validated ROC curves for this endpoint.

The cross-validated area under the ROC curve of the learner with tuning parameters and optimal pre-screening selected via cross-validation and learners with each individual value of tuning parameters are shown in Figure 2.2.

Figure 2.1: Cross-validated AUC for predicting estimated sensitivity (n = 441 observations).

Figure 2.2 shows the cross-validated ROC curve for predicting estimated sensitivity.

Cross-validated ROC curve for the super learner, discrete super learner, and single best performing algorithm for predicting estimated sensitivity (n = 441 observations).

Figure 2.2: Cross-validated ROC curve for the super learner, discrete super learner, and single best performing algorithm for predicting estimated sensitivity (n = 441 observations).

Cross-validated predicted probabilities of estimated sensitivity made by super learner, discrete super learner, and single best performing algorithm colored by cross-validation fold (n = 441 observations).

Figure 2.3: Cross-validated predicted probabilities of estimated sensitivity made by super learner, discrete super learner, and single best performing algorithm colored by cross-validation fold (n = 441 observations).

3 Results for multiple sensitivity

3.1 Descriptive statistics

Out of the 441 sequences, 384 were estimated to be sensitive to the combination of bNAbs, while 57 were estimated to be resistant, where multiple sensitivity was defined as the indicator that measured IC\(_{50}\) was less than 1 for at least 1 bNAb.

3.2 Super learner results

The weights assigned to each algorithm for Super Learner predicting multiple sensitivity are shown in Table 3.1.

Table 3.1: Table of super learner weights for multiple sensitivity (n = 441 observations).
Learner	Weight
rf_tune1_screen0	0.00
rf_default_screen0	0.00
rf_tune2_screen0	0.40
lasso_default_screen0	0.00
lasso_tune1_screen0	0.00
lasso_tune2_screen0	0.00
lasso_tune3_screen0	0.00
rf_tune1_screen4	0.00
rf_default_screen4	0.49
rf_tune2_screen4	0.00
lasso_default_screen4	0.00
lasso_tune1_screen4	0.00
lasso_tune2_screen4	0.00
lasso_tune3_screen4	0.00
xgboost_default	0.11
xgboost_tune3	0.00
xgboost_tune4	0.00
mean	0.00

3.3 Predictive performance

The cross-validated area under the ROC curve for the super learner in predicting multiple sensitivity relative to other candidate algorithms is shown in Figure 3.1. Figure 3.2 shows cross-validated ROC curves for this endpoint.

The cross-validated area under the ROC curve for the learner with tuning parameters and optimal pre-screening selected via cross-validation and for learners with each individual value of tuning parameters are shown in Figure 3.2.

Figure 3.1: Cross-validated AUC for multiple sensitivity (n = 441 observations).

Figure 3.2 shows the cross-validated ROC curve for predicting multiple sensitivity.

Cross-validated ROC curve for the super learner, discrete super learner, and single best performing algorithm for predicting multiple sensitivity (n = 441 observations).

Figure 3.2: Cross-validated ROC curve for the super learner, discrete super learner, and single best performing algorithm for predicting multiple sensitivity (n = 441 observations).

Cross-validated predicted probabilities of multiple sensitivity made by super learner, discrete super learner, and single best performing algorithm colored by cross-validation fold (n = 441 observations).

Figure 3.3: Cross-validated predicted probabilities of multiple sensitivity made by super learner, discrete super learner, and single best performing algorithm colored by cross-validation fold (n = 441 observations).

References

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.

Chen, Tianqi, and Carlos Guestrin. 2016. “Xgboost: A Scalable Tree Boosting System.” In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–94. https://doi.org/10.1145/2939672.2939785.

van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1). https://doi.org/10.2202/1544-6115.1309.

Wagh, Kshitij, Tanmoy Bhattacharya, Carolyn Williamson, Alex Robles, Madeleine Bayne, Jetta Garrity, Michael Rist, et al. 2016. “Optimal Combinations of Broadly Neutralizing Antibodies for Prevention and Treatment of HIV-1 Clade C Infection.” PLoS Pathogens 12 (3). https://doi.org/10.1371/journal.ppat.1005520.

Yoon, Hyejin, Jennifer Macke, Anthony P West Jr, Brian Foley, Pamela J Bjorkman, Bette Korber, and Karina Yusim. 2015. “CATNAP: A Tool to Compile, Analyze and Tally Neutralizing Antibody Panels.” Nucleic Acids Research 43 (W1): W213–W219. https://doi.org/10.1093/nar/gkv404.

Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–20. https://doi.org/10.1111/j.1467-9868.2005.00503.x.

SLAPNAP Report: 10-1074, PG9

18 November, 2020