The broadly neutralizing antibody (bNAb) studied in this analysis is VRC01. The analysis considered 2 measures of neutralization sensitivity: IC\(_{80}\) and sensitivity. Sensitivity is defined by the binary indicator that IC\(_{80}\) < 1. Based on this specification of bNAb and outcomes:
828 sequences were extracted from the CATNAP database (Yoon et al. 2015);
827 sequences had complete geographic and genetic sequence information;
572 of these sequences had measured IC\(_{80}\);
out of the sequences with complete data, 223 were sensitive to the bNAb, while 349 were resistant.
Prediction of each outcome was performed using a super learner ensemble (van der Laan, Polley, and Hubbard 2007) of several random forests (Breiman 2001) with varied tuning parameters, several gradient boosted trees (Chen and Guestrin 2016) with varied tuning parameters and several elastic net regressions (Zou and Hastie 2005) with varied tuning parameters and intercept-only regression. Each algorithm (excepting xgboost) was additionally implemented in combination with variable pre-screening procedures to ensure that all binary features had at least 0, 4, 8 minority variants. This constituted a total of 6500/6513, 3841/6513, 3074/6513 features, respectively.
The specific algorithms used in the learning process are described in Table 1.1.
Label | Description |
---|---|
rf_tune1 | random forest with mtry equal to one-half times square root of number of predictors |
rf_default | random forest with mtry equal to square root of number of predictors |
rf_tune2 | random forest with mtry equal to two times square root of number of predictors |
xgboost_default | boosted regression trees with maximum depth of 4 |
xgboost_tune3 | boosted regression trees with maximum depth of 8 |
xgboost_tune4 | boosted regression trees with maximum depth of 12 |
lasso_default | elastic net with \(\lambda\) selected by CV and \(\alpha\) equal to 0 |
lasso_tune1 | elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.25 |
lasso_tune2 | elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.5 |
lasso_tune3 | elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.75 |
mean | intercept only regression |
The predictive ability of the learner was assessed using cross-validation. The estimated cross-validated \(R^2\) of the learner for predicting IC\(_{80}\) is shown in Table 1.2. The estimated cross-validated area under the receiver operating characteristic curve (AUC) of the learner for predicting sensitivity is shown in Table 1.3.
CV-R\(^2\) | Lower 95% CI | Upper 95% CI | |
---|---|---|---|
IC\(_{80}\) | 0.342 | 0.262 | 0.414 |
CV-AUC | Lower 95% CI | Upper 95% CI | |
---|---|---|---|
Sensitivity | 0.777 | 0.676 | 0.852 |
We define the marginal biological importance of a subgroup of features as the difference in population predictiveness between the best possible prediction function based on the features under consideration plus geographic confounders versus only geographic confounders (Williamson et al. 2020). In Table 1.4, we display the groups of variables and their ranked marginal biological variable importance for predicting each outcome. The groups are displayed in order of decreasing average rank across outcomes. For variable group definitions, please refer to Table 4.1.
Variable group | IC\(_{80}\) | Sensitivity |
---|---|---|
gp120 CD4 binding sites | 1* | 2 |
gp120 V2 | 2 | 1 |
gp120 V3 | 4 | 3 |
Region-specific counts of PNG sites | 5 | 5 |
gp41 MPER | 3 | 7 |
Cysteine counts | 7 | 4 |
Viral geometry | 6 | 6 |
A summary of the distribution of IC\(_{80}\) for the selected bNAb is shown in Figure 2.1.
The weights assigned to each algorithm for Super Learner predicting IC\(_{80}\) are shown in Table 2.1.
Learner | Weight |
---|---|
rf_tune1_screen0 | 0.00 |
rf_default_screen0 | 0.00 |
rf_tune2_screen0 | 0.00 |
lasso_default_screen0 | 0.21 |
lasso_tune1_screen0 | 0.00 |
lasso_tune2_screen0 | 0.00 |
lasso_tune3_screen0 | 0.00 |
rf_tune1_screen4 | 0.00 |
rf_default_screen4 | 0.00 |
rf_tune2_screen4 | 0.01 |
lasso_default_screen4 | 0.00 |
lasso_tune1_screen4 | 0.00 |
lasso_tune2_screen4 | 0.09 |
lasso_tune3_screen4 | 0.00 |
rf_tune1_screen8 | 0.00 |
rf_default_screen8 | 0.00 |
rf_tune2_screen8 | 0.00 |
lasso_default_screen8 | 0.00 |
lasso_tune1_screen8 | 0.00 |
lasso_tune2_screen8 | 0.00 |
lasso_tune3_screen8 | 0.00 |
xgboost_default | 0.38 |
xgboost_tune3 | 0.24 |
xgboost_tune4 | 0.06 |
mean | 0.00 |
The cross-validated \(R^2\) of the super learner and constituent algorithms (descriptions of algorithms shown in Table 1.1 in predicting IC\(_{80}\) are shown in Figure 2.2.
Figure 2.3 shows cross-validated predictions of IC\(_{80}\) plotted against observed values of IC\(_{80}\), colored by cross-validation fold.
We show the biological variable importance of groups of features (defined in Table 4.1) in predicting IC\(_{80}\) in Figure 2.4. Importance is defined using the difference in \(R^2\) values. The plot shows the marginal biological importance of the group relative to the null model with geographic confounders only.
We show the biological variable importance of individual features in predicting IC\(_{80}\) in Figure 2.5. Importance is defined using the difference in \(R^2\) values. The plot shows the marginal biological importance of the feature relative to the null model with geographic confounders only.
Table 2.2 shows the top 20 features in terms of their predictive importance. Specifically, the algorithm with the largest weight in the super learner ensemble was selected and associated variable importance metrics for this algorithm are shown. In this case, the highest weight was assigned to a xgboost
algorithm, and thus the variable importance measures presented correspond to xgboost gain importance measures were computed and are shown by their rank. Gain measures the improvement in accuracy brought by a given feature to the tree branches on which it appears. The essential idea is that before adding a split on a given feature to the branch, there may be some observations that are poorly predicted, while after adding an additional split on this feature, and each resultant branch is more accurate. Gain measures this change in accuracy.
Feature | Importance |
---|---|
hxb2.456.R.1mer | hxb2.456.R.1mer |
hxb2.459.G.1mer | hxb2.459.G.1mer |
hxb2.234.sequon_actual.1mer | hxb2.234.sequon_actual.1mer |
num.sequons.gp120 | num.sequons.gp120 |
hxb2.364.H.1mer | hxb2.364.H.1mer |
hxb2.471.G.1mer | hxb2.471.G.1mer |
hxb2.268.E.1mer | hxb2.268.E.1mer |
hxb2.279.D.1mer | hxb2.279.D.1mer |
hxb2.65.A.1mer | hxb2.65.A.1mer |
num.sequons.env | num.sequons.env |
hxb2.853.A.1mer | hxb2.853.A.1mer |
length.gp120 | length.gp120 |
hxb2.106.T.1mer | hxb2.106.T.1mer |
hxb2.403.T.1mer | hxb2.403.T.1mer |
hxb2.154.M.1mer | hxb2.154.M.1mer |
hxb2.463.D.1mer | hxb2.463.D.1mer |
hxb2.363.P.1mer | hxb2.363.P.1mer |
hxb2.202.A.1mer | hxb2.202.A.1mer |
hxb2.223.F.1mer | hxb2.223.F.1mer |
subtype.is.D | subtype.is.D |
Out of the sequences with complete data, 223 were estimated to be sensitive to the bNAb, while 349 were estimated to be resistant, where sensitivity was defined as the indicator that IC\(_{80}\) was less than 1.
The weights assigned to each algorithm for Super Learner predicting sensitivity are shown in Table 3.1.
Learner | Weight |
---|---|
rf_tune1_screen0 | 0.00 |
rf_default_screen0 | 0.00 |
rf_tune2_screen0 | 0.00 |
lasso_default_screen0 | 0.00 |
lasso_tune1_screen0 | 0.00 |
lasso_tune2_screen0 | 0.00 |
lasso_tune3_screen0 | 0.00 |
rf_tune1_screen4 | 0.00 |
rf_default_screen4 | 0.19 |
rf_tune2_screen4 | 0.00 |
lasso_default_screen4 | 0.00 |
lasso_tune1_screen4 | 0.00 |
lasso_tune2_screen4 | 0.00 |
lasso_tune3_screen4 | 0.00 |
rf_tune1_screen8 | 0.00 |
rf_default_screen8 | 0.00 |
rf_tune2_screen8 | 0.25 |
lasso_default_screen8 | 0.34 |
lasso_tune1_screen8 | 0.00 |
lasso_tune2_screen8 | 0.00 |
lasso_tune3_screen8 | 0.00 |
xgboost_default | 0.21 |
xgboost_tune3 | 0.00 |
xgboost_tune4 | 0.00 |
mean | 0.00 |
The cross-validated area under the ROC curve of super learner predictions of sensitivity relative to candidate algorithms is shown in Figure 3.1. Figure 3.2 shows cross-validated ROC curves for this endpoint.
The cross-validated area under the ROC curve of the learner with tuning parameters and optimal pre-screening selected via cross-validation and learners with each individual value of tuning parameters are shown in Figure 3.2.
Figure 3.2 shows the cross-validated ROC curve for predicting sensitivity.
We show the biological variable importance of groups of features (defined in Table 4.1) in predicting sensitivity in Figure 3.4. Importance is defined using the difference in AUCs. The plot shows the marginal biological importance of the group relative to the null model with geographic confounders only.
We show the biological variable importance of individual features in predicting sensitivity in Figure 3.5. Importance is defined using the difference in AUCs. The plot shows the marginal biological importance of the feature relative to the null model with geographic confounders only.
Table 3.2 shows the top 20 features in terms of their predictive importance. Specifically, the algorithm with the largest weight in the super learner ensemble was selected and associated variable importance metrics for this algorithm are shown. In this case, the highest weight was assigned to a lasso
algorithm, and thus the variable importance measures presented correspond to the magnitude of the coefficient for the model with \(\lambda\) chosen via cross-validation. Overall, there were 87 features that had non-zero coefficient in the final lasso fit.
Feature | Importance |
---|---|
hxb2.459.G.1mer | 1.092 |
hxb2.147.M.1mer | 0.882 |
hxb2.463.D.1mer | 0.569 |
hxb2.252.R.1mer | 0.460 |
hxb2.460.sequon_actual.1mer | -0.448 |
hxb2.456.R.1mer | 0.398 |
hxb2.149.N.1mer | -0.396 |
hxb2.172.T.1mer | 0.396 |
hxb2.304.R.1mer | 0.376 |
hxb2.719.T.1mer | -0.356 |
hxb2.805.R.1mer | 0.338 |
hxb2.164.A.1mer | 0.337 |
hxb2.236.T.1mer | -0.332 |
hxb2.403.T.1mer | -0.327 |
hxb2.463.R.1mer | -0.322 |
hxb2.234.T.1mer | 0.291 |
hxb2.268.E.1mer | -0.278 |
hxb2.234.sequon_actual.1mer | -0.277 |
hxb2.471.G.1mer | 0.273 |
hxb2.106.T.1mer | 0.258 |
Table 4.1 provides the individual HXB2 coordinates and variable names of the variables that make up each of the variable groups considered for biological importance.
Variables | |
---|---|
gp120_cd4bs | 61.F, 61.H, 61.I, 61.L, 61.Q, 61.T, 61.V, 61.Y, 62.A, 62.D, 62.E, 62.G, 62.H, 62.I, 62.K, 62.M, 62.N, 62.R, 62.S, 62.T, 62.V, 62.Y, 66.H, 66.R, 66.X, 120.I, 120.T, 120.V, 124.F, 124.H, 124.I, 124.P, 124.Y, 125.F, 125.I, 125.L, 125.M, 125.X, 127.I, 127.V, 182.A, 182.E, 182.H, 182.I, 182.K, 182.L, 182.M, 182.N, 182.Q, 182.S, 182.T, 182.V, 182.X, 197.D, 197.I, 197.K, 197.N, 197.R, 197.S, 197.T, 198.A, 198.I, 198.S, 198.T, 198.V, 204.A, 204.E, 204.S, 204.T, 204.V, 206.P, 206.S, 206.T, 209.N, 209.S, 209.T, 274.A, 274.C, 274.F, 274.G, 274.S, 274.T, 274.V, 274.gap, 276.D, 276.E, 276.H, 276.K, 276.N, 276.S, 276.X, 276.gap, 279.A, 279.C, 279.D, 279.E, 279.I, 279.K, 279.N, 279.Q, 279.R, 279.S, 280.A, 280.D, 280.N, 280.S, 280.T, 280.X, 281.A, 281.E, 281.G, 281.H, 281.I, 281.L, 281.R, 281.S, 281.T, 281.V, 282.E, 282.G, 282.H, 282.K, 282.N, 282.P, 282.Q, 282.R, 282.S, 282.Y, 283.A, 283.I, 283.N, 283.P, 283.S, 283.T, 283.V, 283.X, 304.E, 304.G, 304.I, 304.K, 304.L, 304.R, 304.S, 304.V, 304.W, 318.F, 318.H, 318.N, 318.Q, 318.R, 318.S, 318.V, 318.W, 318.Y, 326.A, 326.I, 326.M, 326.P, 326.S, 326.T, 362.A, 362.C, 362.D, 362.E, 362.F, 362.G, 362.K, 362.M, 362.N, 362.Q, 362.R, 362.S, 362.T, 362.V, 362.X, 362.gap, 363.A, 363.E, 363.G, 363.H, 363.I, 363.K, 363.L, 363.M, 363.N, 363.P, 363.Q, 363.R, 363.S, 363.T, 363.V, 363.X, 365.A, 365.G, 365.I, 365.L, 365.N, 365.P, 365.R, 365.S, 365.T, 365.V, 366.E, 366.G, 367.G, 367.S, 367.X, 369.A, 369.E, 369.I, 369.L, 369.P, 369.Q, 369.S, 369.T, 369.V, 370.E, 370.X, 374.F, 374.H, 374.L, 374.X, 374.Y, 386.D, 386.K, 386.N, 386.S, 386.T, 386.X, 386.Y, 392.D, 392.E, 392.F, 392.H, 392.I, 392.K, 392.L, 392.N, 392.P, 392.Q, 392.S, 392.T, 392.X, 392.Y, 392.gap, 425.K, 425.N, 425.R, 425.X, 426.A, 426.I, 426.K, 426.L, 426.M, 426.R, 426.S, 426.T, 426.V, 427.L, 427.W, 427.gap, 428.H, 428.I, 428.K, 428.M, 428.Q, 428.T, 428.V, 428.X, 429.A, 429.D, 429.E, 429.G, 429.K, 429.Q, 429.R, 429.S, 429.T, 430.A, 430.G, 430.I, 430.Q, 430.S, 430.T, 430.V, 430.X, 431.A, 431.E, 431.G, 431.R, 431.V, 432.I, 432.K, 432.L, 432.Q, 432.R, 432.S, 432.X, 455.A, 455.D, 455.E, 455.I, 455.L, 455.Q, 455.S, 455.T, 455.V, 456.H, 456.L, 456.M, 456.N, 456.R, 456.S, 456.V, 456.W, 456.Y, 457.A, 457.D, 457.N, 457.S, 457.X, 458.A, 458.D, 458.E, 458.G, 458.K, 458.N, 458.Q, 458.S, 458.T, 458.Y, 459.A, 459.D, 459.E, 459.G, 459.I, 459.N, 459.P, 459.S, 459.T, 459.V, 459.X, 459.gap, 460.A, 460.C, 460.D, 460.E, 460.G, 460.H, 460.I, 460.K, 460.L, 460.M, 460.N, 460.P, 460.Q, 460.R, 460.S, 460.T, 460.V, 460.W, 460.X, 460.gap, 461.A, 461.D, 461.E, 461.F, 461.G, 461.H, 461.I, 461.K, 461.L, 461.M, 461.N, 461.P, 461.Q, 461.R, 461.S, 461.T, 461.V, 461.X, 461.Y, 461.gap, 462.A, 462.D, 462.E, 462.G, 462.H, 462.I, 462.K, 462.L, 462.M, 462.N, 462.P, 462.Q, 462.R, 462.S, 462.T, 462.V, 462.X, 462.Y, 462.gap, 463.A, 463.C, 463.D, 463.E, 463.G, 463.H, 463.I, 463.K, 463.L, 463.M, 463.N, 463.P, 463.Q, 463.R, 463.S, 463.T, 463.V, 463.X, 463.Y, 463.gap, 469.K, 469.R, 469.S, 469.Y, 469.gap, 471.A, 471.E, 471.G, 471.I, 471.L, 471.Q, 471.S, 471.T, 471.V, 474.D, 474.E, 474.N, 474.Y, 475.I, 475.M, 475.T, 475.V, 476.G, 476.K, 476.M, 476.Q, 476.R, 476.T, 476.V, 477.D, 477.G, 477.N, 197.sequon_actual, 276.sequon_actual, 363.sequon_actual, 386.sequon_actual, 392.sequon_actual, 460.sequon_actual, 461.sequon_actual, 462.sequon_actual, 463.sequon_actual |
gp120_v2 | 121.E, 121.K, 121.M, 121.Q, 121.R, 121.X, 123.A, 123.T, 123.X, 124.F, 124.H, 124.I, 124.P, 124.Y, 127.I, 127.V, 157.C, 157.X, 158.D, 158.E, 158.S, 158.T, 159.D, 159.F, 159.L, 159.X, 159.Y, 160.D, 160.E, 160.H, 160.I, 160.K, 160.N, 160.R, 160.S, 160.T, 160.V, 160.X, 160.Y, 160.gap, 161.A, 161.I, 161.L, 161.M, 161.S, 161.T, 161.V, 161.X, 161.gap, 162.A, 162.H, 162.I, 162.N, 162.P, 162.Q, 162.S, 162.T, 162.X, 162.gap, 163.A, 163.G, 163.I, 163.K, 163.P, 163.R, 163.S, 163.T, 163.X, 163.gap, 164.A, 164.D, 164.E, 164.F, 164.G, 164.H, 164.I, 164.K, 164.L, 164.M, 164.N, 164.P, 164.Q, 164.R, 164.S, 164.T, 164.V, 164.X, 164.gap, 165.G, 165.I, 165.L, 165.M, 165.P, 165.Q, 165.R, 165.S, 165.T, 165.V, 165.W, 165.X, 166.A, 166.D, 166.G, 166.H, 166.I, 166.K, 166.M, 166.N, 166.Q, 166.R, 166.S, 166.T, 166.V, 166.W, 166.X, 167.D, 167.E, 167.G, 167.K, 167.N, 167.P, 167.Q, 167.R, 167.T, 167.X, 168.D, 168.E, 168.G, 168.I, 168.K, 168.L, 168.R, 168.S, 168.V, 168.X, 168.gap, 169.A, 169.E, 169.G, 169.H, 169.I, 169.K, 169.L, 169.M, 169.N, 169.P, 169.Q, 169.R, 169.S, 169.T, 169.V, 169.W, 169.X, 169.Y, 169.gap, 170.C, 170.E, 170.H, 170.K, 170.L, 170.N, 170.Q, 170.R, 170.S, 170.T, 170.X, 170.gap, 171.A, 171.D, 171.E, 171.G, 171.H, 171.K, 171.L, 171.M, 171.N, 171.P, 171.Q, 171.R, 171.S, 171.T, 171.V, 171.X, 171.gap, 172.A, 172.D, 172.E, 172.G, 172.I, 172.K, 172.M, 172.N, 172.Q, 172.R, 172.T, 172.V, 172.X, 172.Y, 173.A, 173.D, 173.E, 173.F, 173.G, 173.H, 173.K, 173.N, 173.Q, 173.R, 173.S, 173.T, 173.X, 173.Y, 174.A, 174.D, 174.G, 174.N, 174.S, 174.T, 174.V, 174.X, 174.Y, 175.A, 175.E, 175.F, 175.H, 175.I, 175.L, 175.M, 175.N, 175.Q, 175.S, 175.T, 175.V, 175.X, 175.Y, 176.F, 176.L, 176.S, 176.X, 177.A, 177.D, 177.F, 177.H, 177.N, 177.Q, 177.X, 177.Y, 178.A, 178.D, 178.E, 178.G, 178.I, 178.K, 178.L, 178.N, 178.R, 178.S, 178.T, 178.V, 178.X, 178.Y, 179.A, 179.E, 179.F, 179.I, 179.K, 179.L, 179.M, 179.P, 179.Q, 179.R, 179.S, 179.T, 179.V, 179.X, 179.Y, 180.D, 180.L, 180.S, 180.X, 181.D, 181.I, 181.K, 181.L, 181.M, 181.T, 181.V, 181.X, 182.A, 182.E, 182.H, 182.I, 182.K, 182.L, 182.M, 182.N, 182.Q, 182.S, 182.T, 182.V, 182.X, 183.A, 183.D, 183.E, 183.H, 183.K, 183.L, 183.N, 183.P, 183.Q, 183.R, 183.S, 183.V, 183.X, 184.A, 184.F, 184.I, 184.L, 184.M, 184.N, 184.S, 184.T, 184.V, 184.X, 184.gap, 185.A, 185.D, 185.E, 185.F, 185.G, 185.H, 185.I, 185.K, 185.L, 185.N, 185.P, 185.Q, 185.R, 185.S, 185.T, 185.V, 185.X, 185.Y, 185.gap, 186.A, 186.D, 186.E, 186.G, 186.H, 186.I, 186.K, 186.L, 186.N, 186.P, 186.Q, 186.R, 186.S, 186.T, 186.V, 186.X, 186.gap, 187.A, 187.C, 187.D, 187.E, 187.G, 187.H, 187.I, 187.K, 187.N, 187.P, 187.Q, 187.R, 187.S, 187.T, 187.X, 187.Y, 187.gap, 188.A, 188.D, 188.E, 188.F, 188.G, 188.H, 188.I, 188.K, 188.N, 188.P, 188.Q, 188.R, 188.S, 188.T, 188.V, 188.W, 188.X, 188.Y, 188.gap, 189.A, 189.D, 189.E, 189.G, 189.H, 189.I, 189.K, 189.L, 189.M, 189.N, 189.P, 189.Q, 189.R, 189.S, 189.T, 189.X, 189.Y, 189.gap, 190.A, 190.D, 190.E, 190.F, 190.G, 190.H, 190.I, 190.K, 190.L, 190.M, 190.N, 190.P, 190.Q, 190.R, 190.S, 190.T, 190.V, 190.X, 190.Y, 191.F, 191.H, 191.S, 191.W, 191.Y, 192.A, 192.G, 192.I, 192.K, 192.M, 192.R, 192.S, 192.T, 192.V, 193.F, 193.I, 193.L, 193.M, 193.P, 194.I, 194.K, 194.L, 194.M, 194.R, 194.T, 194.V, 195.D, 195.H, 195.K, 195.N, 195.Q, 195.S, 195.T, 195.Y, 197.D, 197.I, 197.K, 197.N, 197.R, 197.S, 197.T, 202.A, 202.K, 202.P, 202.R, 202.S, 202.T, 203.K, 203.Q, 203.R, 312.A, 312.G, 312.V, 315.A, 315.G, 315.H, 315.K, 315.M, 315.Q, 315.R, 315.S, 315.T, 315.V, 160.sequon_actual, 171.sequon_actual, 173.sequon_actual, 174.sequon_actual, 185.sequon_actual, 186.sequon_actual, 187.sequon_actual, 188.sequon_actual, 189.sequon_actual, 195.sequon_actual, 197.sequon_actual |
gp120_v3 | 296.C, 296.R, 297.A, 297.E, 297.I, 297.K, 297.L, 297.M, 297.N, 297.Q, 297.R, 297.S, 297.T, 297.V, 297.X, 298.G, 298.R, 299.E, 299.F, 299.H, 299.L, 299.N, 299.P, 299.T, 299.V, 300.A, 300.C, 300.D, 300.F, 300.G, 300.H, 300.N, 300.Q, 300.S, 300.T, 300.W, 300.X, 300.Y, 301.D, 301.E, 301.H, 301.I, 301.K, 301.N, 301.Q, 301.R, 301.T, 301.V, 301.X, 301.Y, 301.gap, 302.A, 302.G, 302.H, 302.K, 302.L, 302.N, 302.Q, 302.S, 302.Y, 303.E, 303.I, 303.K, 303.M, 303.Q, 303.R, 303.S, 303.T, 303.V, 304.E, 304.G, 304.I, 304.K, 304.L, 304.R, 304.S, 304.V, 304.W, 305.D, 305.E, 305.G, 305.H, 305.I, 305.K, 305.N, 305.Q, 305.R, 305.T, 305.X, 305.Y, 306.A, 306.D, 306.E, 306.G, 306.K, 306.Q, 306.R, 306.S, 306.X, 306.gap, 307.A, 307.E, 307.F, 307.H, 307.I, 307.L, 307.M, 307.T, 307.V, 307.X, 307.Y, 308.A, 308.G, 308.H, 308.K, 308.N, 308.P, 308.Q, 308.R, 308.S, 308.T, 308.W, 308.X, 309.F, 309.I, 309.L, 309.M, 309.R, 309.T, 309.V, 309.X, 310.G, 310.Q, 310.gap, 311.I, 311.R, 311.gap, 312.A, 312.G, 312.V, 313.A, 313.G, 313.L, 313.P, 313.Q, 313.S, 313.T, 313.V, 313.W, 314.A, 314.G, 314.M, 314.P, 314.X, 315.A, 315.G, 315.H, 315.K, 315.M, 315.Q, 315.R, 315.S, 315.T, 315.V, 316.A, 316.E, 316.G, 316.I, 316.L, 316.M, 316.R, 316.S, 316.T, 316.V, 316.W, 316.X, 316.gap, 317.F, 317.I, 317.L, 317.M, 317.R, 317.S, 317.V, 317.W, 317.X, 317.Y, 318.F, 318.H, 318.N, 318.Q, 318.R, 318.S, 318.V, 318.W, 318.Y, 319.A, 319.G, 319.I, 319.K, 319.L, 319.M, 319.N, 319.Q, 319.R, 319.S, 319.T, 319.V, 319.Y, 319.gap, 320.A, 320.E, 320.G, 320.H, 320.I, 320.K, 320.M, 320.N, 320.P, 320.Q, 320.R, 320.S, 320.T, 320.W, 320.X, 320.Y, 320.gap, 321.A, 321.D, 321.E, 321.F, 321.G, 321.H, 321.I, 321.K, 321.L, 321.N, 321.R, 321.S, 321.T, 321.V, 321.Y, 321.gap, 322.E, 322.G, 322.I, 322.K, 322.L, 322.N, 322.Q, 322.T, 322.V, 322.Y, 322.gap, 323.D, 323.G, 323.I, 323.K, 323.M, 323.N, 323.Q, 323.R, 323.S, 323.T, 323.V, 323.gap, 324.E, 324.G, 324.L, 324.N, 324.P, 324.R, 324.S, 324.T, 325.D, 325.E, 325.G, 325.I, 325.K, 325.N, 325.Q, 325.R, 325.S, 325.T, 325.Y, 326.A, 326.I, 326.M, 326.P, 326.S, 326.T, 327.G, 327.K, 327.R, 328.A, 328.D, 328.E, 328.G, 328.H, 328.I, 328.K, 328.L, 328.M, 328.N, 328.P, 328.Q, 328.R, 328.S, 328.V, 329.A, 329.V, 329.X, 330.F, 330.H, 330.N, 330.Q, 330.R, 330.S, 330.Y, 331.C, 331.X, 332.D, 332.E, 332.H, 332.I, 332.K, 332.L, 332.N, 332.Q, 332.R, 332.S, 332.T, 332.V, 333.I, 333.L, 333.V, 333.Y, 334.A, 334.D, 334.E, 334.F, 334.G, 334.I, 334.K, 334.N, 334.R, 334.S, 334.T, 334.Y, 334.gap, 300.sequon_actual, 301.sequon_actual, 302.sequon_actual, 322.sequon_actual, 323.sequon_actual, 324.sequon_actual, 330.sequon_actual, 332.sequon_actual, 334.sequon_actual |
gp41_mper | 609.A, 609.F, 609.H, 609.K, 609.L, 609.P, 609.Q, 609.R, 609.S, 609.X, 609.Y, 657.E, 657.K, 657.V, 658.E, 658.H, 658.K, 658.L, 658.N, 658.Q, 658.R, 658.X, 659.A, 659.D, 659.E, 659.K, 659.N, 659.R, 659.S, 659.X, 661.F, 661.L, 661.S, 661.X, 662.A, 662.E, 662.G, 662.K, 662.Q, 662.S, 662.T, 663.F, 663.L, 663.M, 663.W, 664.D, 664.E, 664.G, 664.N, 664.S, 665.E, 665.H, 665.K, 665.N, 665.Q, 665.R, 665.S, 665.T, 665.X, 667.A, 667.D, 667.E, 667.G, 667.K, 667.N, 667.Q, 667.S, 667.T, 668.D, 668.F, 668.G, 668.H, 668.N, 668.Q, 668.S, 668.T, 668.X, 669.I, 669.L, 669.X, 671.D, 671.G, 671.K, 671.N, 671.S, 671.T, 672.L, 672.W, 673.F, 673.L, 673.S, 674.A, 674.D, 674.E, 674.G, 674.K, 674.N, 674.S, 674.T, 674.X, 674.Y, 675.I, 675.L, 675.M, 675.V, 676.A, 676.S, 676.T, 676.V, 677.E, 677.H, 677.K, 677.N, 677.Q, 677.R, 677.S, 677.T, 677.X, 679.I, 679.L, 680.G, 680.R, 680.S, 680.W, 681.D, 681.H, 681.S, 681.Y, 682.I, 682.T, 682.V, 683.K, 683.Q, 683.R, 683.X, 684.I, 684.L, 684.M, 684.T, 684.V, 684.X, 674.sequon_actual |
glyco | num.sequons.env, num.sequons.gp120, num.sequons.v2, num.sequons.v3, num.sequons.v5 |
cysteines | num.cysteine.env, num.cysteine.gp120, num.cysteine.v2, num.cysteine.v3, num.cysteine.v5 |
geometry | length.env, length.gp120, length.v2, length.v3, length.v5 |
Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1). Springer: 5–32. doi:10.1023/A:1010933404324.
Chen, Tianqi, and Carlos Guestrin. 2016. “Xgboost: A Scalable Tree Boosting System.” In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–94. doi:10.1145/2939672.2939785.
van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1). De Gruyter. doi:10.2202/1544-6115.1309.
Williamson, Brian D, Peter B Gilbert, Noah R Simon, and Marco Carone. 2020. “A Unified Approach for Inference on Algorithm-Agnostic Variable Importance.” arXiv Preprint. https://arxiv.org/abs/2004.03683.
Yoon, Hyejin, Jennifer Macke, Anthony P West Jr, Brian Foley, Pamela J Bjorkman, Bette Korber, and Karina Yusim. 2015. “CATNAP: A Tool to Compile, Analyze and Tally Neutralizing Antibody Panels.” Nucleic Acids Research 43 (W1). Oxford University Press: W213–W219. doi:10.1093/nar/gkv404.
Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2). Wiley Online Library: 301–20. doi:10.1111/j.1467-9868.2005.00503.x.