Data Science Toolkit

In pursuit of better science through better coding (cartoon source: <a href='http://geek-and-poke.com/'>geek and poke</a> )
In pursuit of better science through better coding (cartoon source: geek and poke )
This course is an elective for Masters and PhD students at Emory University Rollins School of Public Health. The course covers some fundamental tools used in modern (reproducible) data science. Together, these tools provide the ability to develop fully reproducible pipelines for data analysis. By the end of the course students will have learned the tools necessary to: develop reproducible workflows collaboratively using version control based on git and GitHub, execute these workflows on a local computer using command line operations, R Markdown, and GNU Make, execute the workflows in a containerized environment using Docker, and execute the workflow in a cloud environment using Amazon Web Services EC2 and S3 services. Along the way, we will cover a few other tools for data science including best coding practices, basic python, software unit testing, and continuous integration services.