Data Science Toolkit
- In pursuit of better science through better coding (cartoon source: geek and poke )
This course is an elective for Masters and PhD students at
Emory University Rollins School of Public Health.
The course covers some fundamental tools used in modern (reproducible) data science.
Together, these tools
provide the ability to develop fully reproducible pipelines for data
analysis. By the end of the course students will have learned the tools necessary to:
develop reproducible workflows collaboratively using version control based on
git and
GitHub,
execute these workflows on a local computer using command line operations,
R Markdown,
and
GNU Make,
execute the workflows in a containerized environment using
Docker, and execute the workflow in a cloud
environment using
Amazon Web Services
EC2 and
S3 services.
Along the way, we will
cover a few other tools for data science including best coding practices, basic
python,
software unit testing, and
continuous integration services.