R at Warp Speed:

# R at Warp Speed:
## Reproducible coding for COVID vaccine trials
### David Benkeser, PhD MPH<br> <span style="font-size: 75%;"> Emory University<br> Department of Biostatistics and Bioinformatics </span>
### <a href = 'https://twitter.com/biosbenk' style = 'color: white;'><span class="citation">@biosbenk</span></a> <br> <a href='https://bit.ly/warpspeedr' style='color: white;'>bit.ly/warpspeedr</a> <br>

---

## Acknowledgments

__CoVPN Correlates Team__
* Chenchen Yu
* Yiwen Lu
* Ellis Hughes
* Lars van der Laan
* Kendrick Li
* Brian Simpkins
* Di Lu (BARDA)

]]

__Government__
* Dean Follmann (NIAID)
* James Zhou (BARDA)

]
]

---

???

Two main ways companies can interface with OWS: 
* purchasing/manufacturing funding
* OWS-run trials (agreements through BARDA @ NIH)

---

## COVID-19 Prevention Network

[CoVPN](https://www.coronaviruspreventionnetwork.org/) was [formed by NIAID](https://www.nih.gov/news-events/news-releases/nih-launches-clinical-trials-network-test-covid-19-vaccines-other-prevention-tools) to establish a unified clinical trial network for evaluating vaccines and monoclonal antibodies.

* pooling of resources across __four existing trials networks__
* clinical sites, laboratories, recruitment specialists, statisticians, ...

]

* primary trial __design and analysis__
* sequential __efficacy monitoring__
* __safety__ monitoring
* DSMB/FDA comments
* __immune correlates__

]

---

## Correlates of risk/protection

Two, interrelated goals of correlates analysis are to

* identify/validate possible __surrogate endpoints__;
* understand __protective mechanisms__ of vaccines.

If an __immune correlate__ is established to __reliably predict vaccine efficacy__, then subsequent efficacy trials may use the CoP as the __primary endpoint__.

__Accelerates approval__ of

* existing vaccines in __different populations__ (e.g., children);
* __new vaccines__ in the same class.

---

## Immunogenicity

* Descriptive tables 
* E.g., sampling strata and cases

]

* Descriptive plots of marker distributions
* E.g., bivariate scatter plots

]

---

## Correlates of risk

* Risk given immune response + baseline covariate adjustment
* E.g., Cox model

]

* Machine learning prediction using different sets of immune responses.\*

]

---

## Correlates considerations

From the outset, we knew that this science would be:

* __heavily scrutinized__

* need for transparent, open science.

* __high impact__

* need to "get it right"

* __reproduced for many trials__

* five USG-funded trials + others

---

## Correlates challenges

From the outset, we recognized there would be challenges:

* .red[complex statistical methodology]

* different analysts need to implement different analyses

* .red[difficult-to-reproduce methodology]

* ensemble machine learning with cross-validation

* .red[regulatory compliant computing framework]

* FDA interested in results

* .red[under extreme time pressure]

* information needed ASAP to inform policy

---

## Correlates challenges

For Moderna, we also faced the key challenge that we had

---

*Unknown, 2021*

---

## Open science

Early on, we posted a version-controlled SAP [online](https://figshare.com/articles/online_resource/CoVPN_COVID-19_Vaccine_Efficacy_Trial_Immune_Correlates_SAP/13198595).

]

Open software development on GitHub.
* [CoVPN/correlates_reporting](https://github.com/CoVPN/correlates_reporting_usgcove_archive)
* 3,042 commits and counting... 😎

<br>

✅ need for transparent, open science

---

## Project organization

.center[.red[*File organization and naming are powerful weapons against chaos.*]]
.center[Jenny Bryan]

<br>

]

---

## Project organization

Loose guidelines for __sub-directory structure__.

* consistency vs. expediency

Each sub-directory produces a __child R Markdown report__.

* `Makefile` used to document dependencies

Master `Makefile` and `bookdown` run analyses and __produce final reports__.

<br>

✅ different analysts need to implement different analyses

✅ under extreme time pressure

---

## Code verification

* original programmer = generate specification document
* independent programmer = generate same results
* confirm identical results

[Ellis Hughes](https://twitter.com/ellis_hughes) was instrumental in this effort.

Double programming included __ensemble machine learning__ results!

✅ need to "get it right"

✅ difficult-to-reproduce methodology

---

## Remote computing

Approaches used to prepare for Moderna analysis:

* "Practice" data set for local development

* Detailed specifications given to Moderna stats to align data formatting, definitions, etc...

* `R` package control using [`{renv}`](https://rstudio.github.io/renv/index.html)

* 💯

* Continuous integration using [Travis CI](https://www.travis-ci.com/)
  
  * configure environment to be "Moderna"-like
  * automated report builds on practice data, pushed back to GitHub

---

*Me, a foolish optimist, early 2021*

---

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Sure, being bitten by a thousand fire ants is bad, but have you tried debugging someone else’s code... over WebEx?!</p>&mdash; David Benkeser (@biosbenk) <a href="https://twitter.com/biosbenk/status/1387486297930018818?ref_src=twsrc%5Etfw">April 28, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

---

## Happy endings

After many hours, and with great patience from Weiping Deng, we were able to get the report to compile.

* 1129 pages in total 😦

A preprint is available on [medRxiv](https://www.medrxiv.org/content/10.1101/2021.08.09.21261290v1)

* doi: 10.1101/2021.08.09.21261290v1
* Currently under revision at *Science*

---

## Looking forward

❓❓ reproduced for many trials ❓❓

Code is being __generalized__ to adapt to additional trials

* A more [general correlates reporting](https://github.com/CoVPN/correlates_reporting2) process
* __Custom configurations__ depending on trial specifications

Analyses in the pipeline:

* HVTN 705 (HIV vaccine)
* Janssen (Johnson and Johnson) COVID
* AstraZeneca, SinoVac, CureVac, NovaVax, Sanofi-Pasteur

---

## Lessons learned

* Standardizing computing architectures is __relatively straightforward__. .red[Standardizing data formatting/processing is not].

* .red[Code defensively]: check for missing values, sanity check inputs, ...

* .red[Continuous integration] is your (best) friend as a code supervisor.

* Always a difficult tension between .red[speed vs. quality coding].

---

class: center, inverse
background-image: url("img/worldseries.jpeg")
background-position: center
background-size: cover