The broadly neutralizing antibody (bNAb) studied in this analysis is 10-1074. The analysis considered 1 measure of neutralization sensitivity: sensitivity. Sensitivity is defined by the binary indicator that IC\(_{50}\) < 50. Based on this specification of bNAb and outcome:
581 sequences were extracted from the CATNAP database (Yoon et al. 2015);
581 sequences had complete geographic and genetic sequence information;
581 of these sequences had measured IC\(_{50}\);
out of the sequences with complete data, 440 were sensitive to the bNAb, while 141 were resistant.
Prediction of each outcome was performed using a super learner ensemble (van der Laan, Polley, and Hubbard 2007) of several random forests (Breiman 2001) with varied tuning parameters, several gradient boosted trees (Chen and Guestrin 2016) with varied tuning parameters and several elastic net regressions (Zou and Hastie 2005) with varied tuning parameters and intercept-only regression.
The specific algorithms used in the learning process are described in Table 1.1.
Label | Description |
---|---|
rf_tune1 | random forest with mtry equal to one-half times square root of number of predictors |
rf_default | random forest with mtry equal to square root of number of predictors |
rf_tune2 | random forest with mtry equal to two times square root of number of predictors |
xgboost_default | boosted regression trees with maximum depth of 4 |
xgboost_tune3 | boosted regression trees with maximum depth of 8 |
xgboost_tune4 | boosted regression trees with maximum depth of 12 |
lasso_default | elastic net with \(\lambda\) selected by CV and \(\alpha\) equal to 0 |
lasso_tune1 | elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.25 |
lasso_tune2 | elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.5 |
lasso_tune3 | elastic net with \(\lambda\) selected by 5-fold CV and \(\alpha\) equal to 0.75 |
mean | intercept only regression |
The predictive ability of the learner was assessed using cross-validation. The estimated cross-validated area under the receiver operating characteristic curve (AUC) of the learner for predicting sensitivity is shown in Table 1.2.
CV-AUC | Lower 95% CI | Upper 95% CI | |
---|---|---|---|
Sensitivity | 0.938 | 0.865 | 0.973 |
We define the marginal biological importance of a subgroup of features as the difference in population predictiveness between the best possible prediction function based on the features under consideration plus geographic confounders versus only geographic confounders (Williamson et al. 2020). In Table 1.3, we display the groups of variables and their ranked marginal biological variable importance for predicting sensitivity. For variable group definitions, please refer to Table 3.1.
Variable group | Sensitivity |
---|---|
gp120 V3 | 1* |
gp120 CD4 binding sites | 2 |
gp120 V2 | 3 |
gp41 MPER | 4 |
Region-specific counts of PNG sites | 5 |
Cysteine counts | 6 |
Viral geometry | 7 |
Out of the sequences with complete data, 440 were estimated to be sensitive to the bNAb, while 141 were estimated to be resistant, where sensitivity was defined as the indicator that IC\(_{50}\) was less than 50.
The weights assigned to each algorithm for Super Learner predicting sensitivity are shown in Table 2.1.
Learner | Weight |
---|---|
rf_tune1 | 0.00 |
rf_default | 0.00 |
rf_tune2 | 0.00 |
xgboost_default | 0.42 |
xgboost_tune3 | 0.00 |
xgboost_tune4 | 0.58 |
lasso_default | 0.00 |
lasso_tune1 | 0.00 |
lasso_tune2 | 0.00 |
lasso_tune3 | 0.00 |
mean | 0.00 |
The cross-validated area under the ROC curve of super learner predictions of sensitivity relative to candidate algorithms is shown in Figure 2.1. Figure 2.2 shows cross-validated ROC curves for this endpoint.
The cross-validated area under the ROC curve of the learner with tuning parameters selected via cross-validation and learners with each individual value of tuning parameters are shown in Figure 2.2.
Figure 2.2 shows the cross-validated ROC curve for predicting sensitivity.
We show the biological variable importance of groups of features (defined in Table 3.1) in predicting sensitivity in Figure 2.4. Importance is defined using the difference in AUCs. The plot shows the marginal biological importance of the group relative to the null model with geographic confounders only.
Table 2.2 shows the top 20 features in terms of their predictive importance. Specifically, the algorithm with the largest weight in the super learner ensemble was selected and associated variable importance metrics for this algorithm are shown. In this case, the highest weight was assigned to a xgboost
algorithm, and thus the variable importance measures presented correspond to xgboost gain importance measures were computed and are shown by their rank. Gain measures the improvement in accuracy brought by a given feature to the tree branches on which it appears. The essential idea is that before adding a split on a given feature to the branch, there may be some observations that are poorly predicted, while after adding an additional split on this feature, and each resultant branch is more accurate. Gain measures this change in accuracy.
Feature | Importance |
---|---|
hxb2.334.S.1mer | hxb2.334.S.1mer |
hxb2.492.E.1mer | hxb2.492.E.1mer |
hxb2.332.N.1mer | hxb2.332.N.1mer |
hxb2.325.D.1mer | hxb2.325.D.1mer |
hxb2.800.L.1mer | hxb2.800.L.1mer |
hxb2.816.N.1mer | hxb2.816.N.1mer |
hxb2.30.T.1mer | hxb2.30.T.1mer |
length.v2 | length.v2 |
hxb2.778.A.1mer | hxb2.778.A.1mer |
hxb2.440.A.1mer | hxb2.440.A.1mer |
hxb2.704.I.1mer | hxb2.704.I.1mer |
hxb2.293.E.1mer | hxb2.293.E.1mer |
hxb2.507.E.1mer | hxb2.507.E.1mer |
geographic.region.of.origin.is.S.Africa | geographic.region.of.origin.is.S.Africa |
hxb2.300.N.1mer | hxb2.300.N.1mer |
hxb2.316.A.1mer | hxb2.316.A.1mer |
hxb2.837.C.1mer | hxb2.837.C.1mer |
hxb2.818.T.1mer | hxb2.818.T.1mer |
hxb2.26.I.1mer | hxb2.26.I.1mer |
hxb2.178.K.1mer | hxb2.178.K.1mer |
Table 3.1 provides the individual HXB2 coordinates and variable names of the variables that make up each of the variable groups considered for biological importance.
Variables | |
---|---|
gp120_cd4bs | 61.F, 61.H, 61.I, 61.L, 61.Q, 61.T, 61.V, 61.Y, 62.A, 62.D, 62.E, 62.G, 62.H, 62.I, 62.K, 62.M, 62.N, 62.R, 62.S, 62.T, 62.V, 62.Y, 120.I, 120.T, 120.V, 124.F, 124.H, 124.I, 124.P, 124.Y, 125.F, 125.I, 125.L, 125.M, 127.I, 127.V, 182.A, 182.E, 182.H, 182.I, 182.K, 182.L, 182.M, 182.N, 182.Q, 182.S, 182.T, 182.V, 197.D, 197.I, 197.K, 197.N, 197.R, 197.T, 198.A, 198.I, 198.S, 198.T, 198.V, 204.A, 204.E, 204.S, 204.T, 204.V, 206.P, 206.S, 206.T, 209.N, 209.S, 209.T, 274.A, 274.C, 274.F, 274.G, 274.S, 274.T, 274.V, 274.gap, 276.D, 276.E, 276.H, 276.K, 276.N, 276.S, 276.gap, 279.A, 279.C, 279.D, 279.E, 279.K, 279.N, 279.Q, 279.S, 280.D, 280.N, 280.S, 280.T, 281.A, 281.E, 281.G, 281.H, 281.I, 281.R, 281.S, 281.T, 281.V, 282.E, 282.G, 282.H, 282.K, 282.N, 282.P, 282.Q, 282.R, 282.S, 282.Y, 283.A, 283.I, 283.N, 283.P, 283.S, 283.T, 283.V, 304.G, 304.I, 304.K, 304.L, 304.R, 304.S, 304.V, 318.F, 318.H, 318.N, 318.Q, 318.R, 318.S, 318.V, 318.W, 318.Y, 326.A, 326.I, 326.M, 326.P, 326.S, 326.T, 362.A, 362.C, 362.D, 362.E, 362.F, 362.G, 362.K, 362.M, 362.N, 362.Q, 362.R, 362.S, 362.T, 362.V, 362.gap, 363.A, 363.E, 363.G, 363.H, 363.I, 363.K, 363.M, 363.N, 363.P, 363.Q, 363.R, 363.S, 363.T, 363.V, 365.A, 365.I, 365.L, 365.N, 365.P, 365.R, 365.S, 365.T, 365.V, 366.E, 366.G, 367.G, 367.S, 369.A, 369.E, 369.I, 369.L, 369.P, 369.Q, 369.S, 369.T, 369.V, 374.F, 374.H, 374.L, 386.D, 386.K, 386.N, 386.S, 386.T, 386.Y, 392.D, 392.E, 392.F, 392.H, 392.I, 392.K, 392.L, 392.N, 392.P, 392.S, 392.T, 392.Y, 392.gap, 425.N, 425.R, 426.A, 426.I, 426.K, 426.L, 426.M, 426.R, 426.S, 426.T, 426.V, 427.L, 427.W, 428.I, 428.M, 428.Q, 428.T, 429.A, 429.D, 429.E, 429.G, 429.K, 429.Q, 429.R, 429.S, 429.T, 430.A, 430.G, 430.I, 430.Q, 430.S, 430.T, 430.V, 431.A, 431.E, 431.G, 431.R, 431.V, 432.I, 432.K, 432.L, 432.Q, 432.R, 432.S, 455.A, 455.D, 455.E, 455.I, 455.L, 455.Q, 455.S, 455.T, 455.V, 456.H, 456.L, 456.M, 456.N, 456.R, 456.S, 456.V, 456.W, 456.Y, 457.A, 457.D, 457.N, 457.S, 458.A, 458.D, 458.E, 458.G, 458.K, 458.N, 458.Q, 458.S, 458.T, 458.Y, 459.D, 459.E, 459.G, 459.I, 459.N, 459.P, 459.S, 459.T, 459.V, 459.gap, 460.A, 460.C, 460.D, 460.E, 460.G, 460.I, 460.K, 460.L, 460.N, 460.P, 460.Q, 460.R, 460.S, 460.T, 460.V, 460.W, 460.gap, 461.A, 461.D, 461.E, 461.F, 461.G, 461.H, 461.I, 461.K, 461.L, 461.M, 461.N, 461.P, 461.Q, 461.R, 461.S, 461.T, 461.V, 461.Y, 461.gap, 462.A, 462.D, 462.E, 462.G, 462.H, 462.I, 462.K, 462.L, 462.M, 462.N, 462.P, 462.Q, 462.R, 462.S, 462.T, 462.V, 462.Y, 462.gap, 463.A, 463.C, 463.D, 463.E, 463.G, 463.H, 463.I, 463.K, 463.M, 463.N, 463.P, 463.R, 463.S, 463.T, 463.V, 463.Y, 463.gap, 469.K, 469.R, 469.S, 469.Y, 469.gap, 471.A, 471.E, 471.G, 471.I, 471.L, 471.Q, 471.S, 471.T, 471.V, 474.D, 474.E, 474.N, 474.Y, 475.I, 475.M, 475.V, 476.G, 476.K, 476.M, 476.Q, 476.R, 476.T, 476.V, 477.D, 477.N, 197.sequon_actual, 276.sequon_actual, 363.sequon_actual, 386.sequon_actual, 392.sequon_actual, 460.sequon_actual, 461.sequon_actual, 462.sequon_actual, 463.sequon_actual |
gp120_v2 | 121.E, 121.K, 121.M, 121.Q, 121.R, 124.F, 124.H, 124.I, 124.P, 124.Y, 127.I, 127.V, 158.D, 158.E, 158.S, 158.T, 159.D, 159.F, 159.L, 159.Y, 160.D, 160.H, 160.I, 160.K, 160.N, 160.R, 160.S, 160.V, 160.Y, 160.gap, 161.A, 161.I, 161.L, 161.M, 161.S, 161.T, 161.V, 161.gap, 162.A, 162.H, 162.I, 162.N, 162.P, 162.Q, 162.S, 162.T, 162.gap, 163.A, 163.G, 163.K, 163.P, 163.R, 163.S, 163.T, 163.gap, 164.A, 164.D, 164.E, 164.F, 164.G, 164.I, 164.L, 164.M, 164.N, 164.P, 164.Q, 164.R, 164.S, 164.T, 164.V, 164.gap, 165.G, 165.I, 165.L, 165.M, 165.P, 165.Q, 165.R, 165.S, 165.T, 165.V, 166.A, 166.D, 166.G, 166.H, 166.I, 166.K, 166.M, 166.N, 166.Q, 166.R, 166.S, 166.T, 166.W, 167.D, 167.G, 167.K, 167.N, 167.P, 167.Q, 167.R, 167.T, 168.E, 168.I, 168.K, 168.L, 168.R, 168.S, 168.gap, 169.A, 169.E, 169.G, 169.H, 169.I, 169.K, 169.M, 169.N, 169.P, 169.Q, 169.R, 169.S, 169.T, 169.V, 169.W, 169.Y, 169.gap, 170.C, 170.E, 170.H, 170.K, 170.N, 170.Q, 170.R, 170.S, 170.T, 170.gap, 171.D, 171.E, 171.G, 171.H, 171.K, 171.L, 171.M, 171.N, 171.P, 171.Q, 171.R, 171.S, 171.T, 171.V, 171.gap, 172.A, 172.D, 172.E, 172.G, 172.I, 172.K, 172.M, 172.Q, 172.R, 172.T, 172.V, 172.Y, 173.A, 173.D, 173.E, 173.F, 173.G, 173.H, 173.N, 173.Q, 173.R, 173.S, 173.T, 173.Y, 174.A, 174.D, 174.S, 174.T, 174.V, 175.F, 175.H, 175.I, 175.L, 175.N, 175.Q, 175.S, 175.T, 175.V, 175.Y, 176.F, 176.L, 177.D, 177.F, 177.H, 177.N, 177.Y, 178.A, 178.D, 178.E, 178.I, 178.K, 178.L, 178.N, 178.R, 178.S, 178.T, 178.V, 179.A, 179.E, 179.I, 179.L, 179.M, 179.P, 179.Q, 179.R, 179.S, 179.T, 179.V, 179.Y, 181.I, 181.L, 181.T, 181.V, 182.A, 182.E, 182.H, 182.I, 182.K, 182.L, 182.M, 182.N, 182.Q, 182.S, 182.T, 182.V, 183.A, 183.E, 183.H, 183.K, 183.L, 183.N, 183.P, 183.Q, 183.S, 184.A, 184.F, 184.I, 184.L, 184.M, 184.N, 184.S, 184.T, 184.V, 184.gap, 185.A, 185.D, 185.E, 185.F, 185.G, 185.H, 185.I, 185.K, 185.L, 185.N, 185.Q, 185.R, 185.S, 185.T, 185.V, 185.Y, 185.gap, 186.A, 186.D, 186.E, 186.G, 186.H, 186.I, 186.K, 186.L, 186.N, 186.P, 186.Q, 186.R, 186.S, 186.T, 186.V, 186.gap, 187.A, 187.C, 187.D, 187.E, 187.G, 187.H, 187.K, 187.N, 187.Q, 187.R, 187.S, 187.T, 187.Y, 187.gap, 188.A, 188.D, 188.E, 188.F, 188.G, 188.H, 188.I, 188.K, 188.N, 188.P, 188.Q, 188.R, 188.S, 188.T, 188.V, 188.W, 188.Y, 188.gap, 189.A, 189.D, 189.E, 189.G, 189.H, 189.I, 189.K, 189.L, 189.M, 189.N, 189.P, 189.Q, 189.R, 189.S, 189.T, 189.Y, 189.gap, 190.A, 190.D, 190.E, 190.F, 190.G, 190.I, 190.K, 190.L, 190.M, 190.N, 190.P, 190.Q, 190.R, 190.S, 190.T, 190.V, 190.Y, 191.F, 191.H, 191.S, 191.Y, 192.G, 192.I, 192.K, 192.M, 192.R, 192.S, 192.T, 192.V, 193.F, 193.L, 193.M, 193.P, 194.I, 194.K, 194.L, 194.M, 194.R, 194.T, 194.V, 195.D, 195.H, 195.K, 195.N, 195.Q, 195.S, 195.T, 197.D, 197.I, 197.K, 197.N, 197.R, 197.T, 202.A, 202.K, 202.R, 202.S, 202.T, 203.K, 203.Q, 203.R, 312.A, 312.G, 315.A, 315.G, 315.K, 315.M, 315.Q, 315.R, 315.S, 315.T, 315.V, 160.sequon_actual, 171.sequon_actual, 173.sequon_actual, 185.sequon_actual, 186.sequon_actual, 187.sequon_actual, 188.sequon_actual, 189.sequon_actual, 197.sequon_actual |
gp120_v3 | 296.C, 296.R, 297.A, 297.E, 297.I, 297.K, 297.L, 297.M, 297.N, 297.Q, 297.R, 297.S, 297.T, 297.V, 299.E, 299.F, 299.H, 299.L, 299.N, 299.P, 299.T, 299.V, 300.A, 300.D, 300.F, 300.G, 300.H, 300.N, 300.Q, 300.S, 300.T, 300.W, 300.Y, 301.D, 301.E, 301.H, 301.K, 301.N, 301.R, 301.T, 301.V, 301.Y, 301.gap, 302.G, 302.H, 302.K, 302.L, 302.N, 302.Q, 303.E, 303.I, 303.K, 303.Q, 303.R, 303.S, 303.T, 303.V, 304.G, 304.I, 304.K, 304.L, 304.R, 304.S, 304.V, 305.D, 305.E, 305.G, 305.H, 305.I, 305.K, 305.N, 305.Q, 305.R, 305.T, 305.Y, 306.D, 306.E, 306.G, 306.K, 306.Q, 306.R, 306.S, 306.gap, 307.A, 307.E, 307.F, 307.H, 307.I, 307.L, 307.M, 307.T, 307.V, 307.Y, 308.A, 308.G, 308.H, 308.K, 308.N, 308.P, 308.Q, 308.R, 308.S, 308.T, 308.W, 309.F, 309.I, 309.L, 309.M, 309.R, 309.T, 309.V, 310.Q, 310.gap, 311.R, 311.gap, 312.A, 312.G, 313.A, 313.L, 313.P, 313.Q, 313.S, 313.T, 313.V, 314.A, 314.G, 314.M, 314.P, 315.A, 315.G, 315.K, 315.M, 315.Q, 315.R, 315.S, 315.T, 315.V, 316.A, 316.E, 316.G, 316.I, 316.M, 316.R, 316.S, 316.T, 316.V, 316.W, 317.F, 317.I, 317.L, 317.M, 317.R, 317.S, 317.V, 317.W, 317.X, 317.Y, 318.F, 318.H, 318.N, 318.Q, 318.R, 318.S, 318.V, 318.W, 318.Y, 319.A, 319.G, 319.I, 319.K, 319.L, 319.M, 319.N, 319.R, 319.S, 319.T, 319.V, 319.Y, 319.gap, 320.A, 320.E, 320.G, 320.H, 320.I, 320.K, 320.M, 320.N, 320.P, 320.Q, 320.R, 320.S, 320.T, 320.W, 320.Y, 320.gap, 321.A, 321.D, 321.E, 321.F, 321.G, 321.H, 321.I, 321.K, 321.L, 321.N, 321.R, 321.S, 321.T, 321.V, 321.Y, 321.gap, 322.E, 322.G, 322.I, 322.K, 322.N, 322.Q, 322.T, 322.V, 322.Y, 322.gap, 323.D, 323.G, 323.I, 323.K, 323.M, 323.N, 323.Q, 323.S, 323.T, 323.V, 323.gap, 324.E, 324.G, 324.L, 324.N, 324.P, 324.S, 324.T, 325.D, 325.E, 325.G, 325.I, 325.K, 325.N, 325.Q, 325.R, 325.S, 325.T, 326.A, 326.I, 326.M, 326.P, 326.S, 326.T, 327.G, 327.K, 327.R, 328.A, 328.D, 328.E, 328.G, 328.I, 328.K, 328.L, 328.M, 328.N, 328.Q, 328.R, 328.S, 328.V, 329.A, 329.V, 330.F, 330.H, 330.N, 330.Q, 330.R, 330.S, 330.Y, 332.D, 332.E, 332.H, 332.I, 332.K, 332.N, 332.Q, 332.R, 332.S, 332.T, 332.V, 333.I, 333.L, 333.V, 333.Y, 334.A, 334.D, 334.E, 334.F, 334.G, 334.I, 334.K, 334.N, 334.R, 334.S, 334.T, 334.Y, 334.gap, 301.sequon_actual, 302.sequon_actual, 322.sequon_actual, 323.sequon_actual, 324.sequon_actual, 330.sequon_actual, 334.sequon_actual |
gp41_mper | 609.A, 609.F, 609.H, 609.K, 609.L, 609.P, 609.Q, 609.R, 609.S, 609.Y, 657.E, 657.K, 657.V, 658.E, 658.H, 658.K, 658.L, 658.N, 658.Q, 658.R, 659.A, 659.D, 659.E, 659.K, 659.N, 659.R, 659.S, 661.F, 661.L, 661.S, 662.A, 662.E, 662.K, 662.Q, 662.S, 662.T, 663.L, 663.M, 663.W, 664.D, 664.E, 664.G, 664.N, 664.S, 665.E, 665.K, 665.N, 665.Q, 665.R, 665.S, 665.T, 667.A, 667.D, 667.E, 667.G, 667.K, 667.N, 667.Q, 667.S, 667.T, 668.D, 668.F, 668.G, 668.H, 668.N, 668.Q, 668.S, 668.T, 669.I, 669.L, 671.D, 671.G, 671.K, 671.N, 671.S, 671.T, 672.L, 672.W, 673.F, 673.L, 673.S, 674.A, 674.D, 674.E, 674.G, 674.K, 674.N, 674.S, 674.T, 674.Y, 675.I, 675.L, 675.M, 676.A, 676.S, 676.T, 676.V, 677.E, 677.H, 677.K, 677.N, 677.Q, 677.R, 677.S, 680.G, 680.W, 681.D, 681.H, 681.S, 681.Y, 682.I, 682.T, 682.V, 683.K, 683.Q, 683.R, 684.I, 684.L, 684.M, 684.T, 684.V, 674.sequon_actual |
glyco | num.sequons.env, num.sequons.gp120, num.sequons.v2, num.sequons.v3, num.sequons.v5 |
cysteines | num.cysteine.env, num.cysteine.gp120, num.cysteine.v2, num.cysteine.v3, num.cysteine.v5 |
geometry | length.env, length.gp120, length.v2, length.v3, length.v5 |
Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1). Springer: 5–32. doi:10.1023/A:1010933404324.
Chen, Tianqi, and Carlos Guestrin. 2016. “Xgboost: A Scalable Tree Boosting System.” In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–94. doi:10.1145/2939672.2939785.
van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1). De Gruyter. doi:10.2202/1544-6115.1309.
Williamson, Brian D, Peter B Gilbert, Noah R Simon, and Marco Carone. 2020. “A Unified Approach for Inference on Algorithm-Agnostic Variable Importance.” arXiv Preprint. https://arxiv.org/abs/2004.03683.
Yoon, Hyejin, Jennifer Macke, Anthony P West Jr, Brian Foley, Pamela J Bjorkman, Bette Korber, and Karina Yusim. 2015. “CATNAP: A Tool to Compile, Analyze and Tally Neutralizing Antibody Panels.” Nucleic Acids Research 43 (W1). Oxford University Press: W213–W219. doi:10.1093/nar/gkv404.
Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2). Wiley Online Library: 301–20. doi:10.1111/j.1467-9868.2005.00503.x.