3 Running slapnap

To run the slapnap container, we make use of the docker run command. Note that administrator (sudo) privileges are needed to execute this command. Additionally, note that slapnap operates in UTC+0 time – this will be important when inspecting the files generated by slapnap.

There are several options that are necessary to include in this command to control the behavior of slapnap. These are discussed in separate subsections below.

3.1 slapnap run options

The user has control over many aspects of slapnap’s behavior. These options are passed in using the -e option1. Semicolon-separated strings are used to set options. For example, to provide input for the option option_name, we would used -e option_name="a;semicolon;separated;string". Note that there are no spaces between the option name and its value and no spaces after semicolons in the separated list. See Section 4 for full syntax.

Each description below lists the default value that is assumed if the option is not specified. Note that many of the default values are chosen simply so that naive calls to slapnap compile quickly. Proper values should be determined based on scientific context.

-e options for slapnap

  • nab: A semicolon-separated list of bnAbs (default = "VRC01"). A list of possible bnAbs can be found here. If multiple bnAbs are listed, it is assumed that the analysis should be of estimated outcomes for a combination of bnAbs (see Section 5.1 for details on how estimated outcomes for multiple bnAbs are computed).
  • outcomes: A semicolon-separated string of outcomes to include in the analysis. Possible values are "ic50" (included in default), "ic80", "iip", "sens" (included in default), "estsens", and "multsens". If only a single nab is specified, use sens to include a dichotomous endpoint. If multiple nabs are specified, use estsens and/or multsens. For detailed definitions of outcomes see Section 5.1.
  • combination_method A string defining the method to use for predicting combination IC\(_{50}\) and/or IC\(_{80}\). Possible values are "additive" (the default, for the additive model defined in Wagh et al. 2016) or "Bliss-Hill" (for the Bliss-Hill model defined in Wagh et al. 2016).
  • binary_outcomes A string defining the measure of neutralization to use for defining binary outcomes. Possible values are "ic50" (the default, for using IC\(_{50}\) to define sensitivity) or "ic80" (for using IC\(_{80}\) to define sensitivity).
  • sens_thresh A numeric value defining the neutralization threshold for defining a sensitive versus resistant pseudovirus (default = 1). The dichotomous sensitivity/resistant outcome is defined as the indicator that (estimated) IC\(_{50}\) (or IC\(_{80}\), if binary_outcomes="ic80") is greater than or equal to sens_thresh.
  • multsens_nab A numeric value used for defining whether a pseudovirus is resistant to a multi-nAb cocktail. Only used if multsens is included in outcomes and more than one nab is requested. The dichotomous outcome multsens is defined as the indicator that a virus has IC\(_{50}\) (or IC\(_{80}\), if binary_outcomes="ic80") greater than sens_thresh for at least multsens_nab nAbs.
  • learners: A semicolon-separated string of machine learning algorithms to include in the analysis. Possible values include "rf" (random forest, default), "xgboost" (eXtreme gradient boosting), "h2oboost" (gradient boosting using H2O.ai) and "lasso" (elastic net). See Section 5.2 for details on how tuning parameters are chosen. If more than one algorithm is included, then it is assumed that a cross-validated-based ensemble (i.e., a super learner) is desired (see Section 5.3).
  • cvtune: A boolean string (i.e., either "TRUE" or "FALSE" [default]) indicating whether the learners should be tuned using cross validation and a small grid search. Defaults to "FALSE". If multiple learners are specified, then the super learner ensemble includes up to three versions of each of the requested learners with different tuning parameters.
  • cvperf: A boolean string (i.e., either "TRUE" or "FALSE" [default]) indicating whether the learners performance should be evaluated using cross validation. If cvtune="TRUE" or learners includes multiple algorithms, then nested cross validation is used to evaluate the performance of the cross validation-selected best value of tuning parameters for the specified algorithm or the super learner, respectively.
  • var_thresh: A numeric string that defines a threshold for pre-screening features. If a single positive number, all binary features with fewer than var_thresh 0’s or 1’s are removed prior to the specified learner training. If several values are included in var_thresh and a single learner is specified, then cross-validation is used to select the optimal threshold. If multiple learners are specified, then each learner is included in the super learner with pre-screening based on each value of var_thresh.
  • nfolds: A numeric string indicating the number of folds to use in cross validation procedures (default = "2").
  • importance_grp: A semicolon-separated string indicating which group-level variable importance measures should be computed. Possible values are none "" (default), marginal "marg", conditional "cond". See Section 5.4.1 for details on these measures.
  • importance_ind: A semicolon-separated string indicating which individual-level variable importance measures should be computed. Possible values are none "" (default), learner-level "pred", marginal "marg" and conditional "cond". The latter two take significant computation time to compute. See Sections 5.4.1 and 5.4.2 for details on these measures.
  • same_subset If "FALSE" (default) all data available for each outcome will be used in the analysis. If "TRUE", when multiple outcomes are requested, the data will be subset to just those sequences that have all measured outcomes, and, if iip is requested, for which iip can be computed (i.e., measured IC\(_{50}\) and IC\(_{80}\) values are different). Thus, if "TRUE" all requested outcomes will be evaluated using the same_subset of the CATNAP data.
  • report_name: A string indicating the desired name of the output report (default = report_[_-separated list of nabs]_[date].html).
  • return: A semicolon-separated string of the desired output. Possible values are "report" (default), "learner" for a .rds object that contains the algorithm for each endpoint trained using the full analysis data, "data" for the analysis dataset, "figures" for all figures from the report, and "vimp" for variable importance objects.
  • view_port: A boolean string indicating whether the compiled report should be made viewable on localhost (default "FALSE"). If "TRUE" then -p option should be used in the docker run command to identify the port. See example in Section 4.2 for details.

3.2 Returning output

At the end of a slapnap run, user-specified output will be saved (see option return in Section 3.1). To retrieve these files from the container, there are two options: mounting a local directory (Section 3.2.1) or, if the report is the only desired output, viewing and saving the report in a web browser (Section 3.2.2).

3.2.1 Mounting a local directory

To mount a local directory to the output directory in the container (/home/output/), use the -v option. Any items saved to the output directory in the container (file path in the container /home/output/) will be available in the mounted directory. Conversely, all files in the mounted local directory will be visible to programs running inside the container.

Suppose /path/to/local/dir is the file path on a local computer in which we wish to save the output files from a slapnap run. A docker run of slapnap would include the option -v /path/to/local/dir:/home/output. After a run completes, the requested output should be viewable in /path/to/local/dir. See Section 4 for full syntax.

To avoid possible naming conflicts and file overwrites in the mounted directory, we recommend mounting an empty directory to store the output.

Widows users need to enable shared drives by clicking Settings > Shared Drives in the Docker Desktop Daemon and sharing the drive that contains path/to/local/dir.

3.2.2 Viewing report in browser

An alternative option to mounting local directories for viewing and downloading the report is to set the view_port option to "TRUE" and open a port to the container via the -p option in the docker run statement. In this case, rather than exiting upon completion of the analysis, the container will continuing to run and broadcast the compiled report to localhost at the specified port (see examples below). The report can be downloaded from the web browser directly in this way.

References

Wagh, Kshitij, Tanmoy Bhattacharya, Carolyn Williamson, Alex Robles, Madeleine Bayne, Jetta Garrity, Michael Rist, et al. 2016. “Optimal Combinations of Broadly Neutralizing Antibodies for Prevention and Treatment of HIV-1 Clade C Infection.” PLoS Pathogens 12 (3). https://doi.org/10.1371/journal.ppat.1005520.


  1. This sets an environment variable in the container environment. These variables are accessed by the various R and bash scripts in the container to dictate how the container executes code.↩︎