Version control using git and GitHub

class: left, middle, inverse, title-slide

.title[
# Version control using git and GitHub
]
.author[
### David Benkeser, PhD MPH<br> <span style="font-size: 50%;"> Emory University<br> Department of Biostatistics and Bioinformatics </span>
]
.date[
### INFO550<br><br><svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#ffffff;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> <a href="https://bit.ly/info550">.white[bit.ly/info550]</a> <br> <svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#ffffff;" xmlns="http://www.w3.org/2000/svg"> <path d="M448 360V24c0-13.3-10.7-24-24-24H96C43 0 0 43 0 96v320c0 53 43 96 96 96h328c13.3 0 24-10.7 24-24v-16c0-7.5-3.5-14.3-8.9-18.7-4.2-15.4-4.2-59.3 0-74.7 5.4-4.3 8.9-11.1 8.9-18.6zM128 134c0-3.3 2.7-6 6-6h212c3.3 0 6 2.7 6 6v20c0 3.3-2.7 6-6 6H134c-3.3 0-6-2.7-6-6v-20zm0 64c0-3.3 2.7-6 6-6h212c3.3 0 6 2.7 6 6v20c0 3.3-2.7 6-6 6H134c-3.3 0-6-2.7-6-6v-20zm253.4 250H96c-17.7 0-32-14.3-32-32 0-17.6 14.4-32 32-32h285.4c-1.9 17.1-1.9 46.9 0 64z"></path></svg> <a href="https://benkeser.github.io/info550/readings#git-and-github">.white[Additional reading]</a> <br> <svg viewBox="0 0 640 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#ffffff;" xmlns="http://www.w3.org/2000/svg"> <path d="M278.9 511.5l-61-17.7c-6.4-1.8-10-8.5-8.2-14.9L346.2 8.7c1.8-6.4 8.5-10 14.9-8.2l61 17.7c6.4 1.8 10 8.5 8.2 14.9L293.8 503.3c-1.9 6.4-8.5 10.1-14.9 8.2zm-114-112.2l43.5-46.4c4.6-4.9 4.3-12.7-.8-17.2L117 256l90.6-79.7c5.1-4.5 5.5-12.3.8-17.2l-43.5-46.4c-4.5-4.8-12.1-5.1-17-.5L3.8 247.2c-5.1 4.7-5.1 12.8 0 17.5l144.1 135.1c4.9 4.6 12.5 4.4 17-.5zm327.2.6l144.1-135.1c5.1-4.7 5.1-12.8 0-17.5L492.1 112.1c-4.8-4.5-12.4-4.3-17 .5L431.6 159c-4.6 4.9-4.3 12.7.8 17.2L523 256l-90.6 79.7c-5.1 4.5-5.5 12.3-.8 17.2l43.5 46.4c4.5 4.9 12.1 5.1 17 .6z"></path></svg> <a href="versioncontrol.sh" download="versioncontrol.sh" style="color:white">Code chunks</a>
]

---

background-color: #007dba
class: title-slide, center, inverse, middle

???

Don't be like this guy.

---

### Why formal version control?

* Painlessly revert to older versions of a document
  * Uh oh, everything broke, revert! Revert!

* See history of changes made to a document
  * What did I do that broke everything?

* Try new things out without breaking things that work

* Collaborate with multiple people

* Expected skill in modern data science

* It's required for this course

???

Formal version control has a somewhat steep learning curve, but 
in the end, it is worth it.

Version control is __most useful__ for coding projects. It can be used
for purposes beyond that (e.g., manuscript collaboration). However, in my 
experience, Google docs/Sharepoint/etc... (for Word docs) and OverLeaf
(for Latex docs) are easier for manuscript collaboration, though I have
worked with collaborators on manuscripts using GitHub.

The biggest point here is that knowing your way around git/GitHub is 
an expected job market skill for data scientist positions. Knowing at least
the basics is critical.

We will also use it from here on out for assignments in class, so better to 
get used to it now.

---

### Git

__git__ is a formal system for version control.

* developed by Linus Torvalds, the developer of Linux
* can be used to track any content, but intended for __plain text files__
  * source code, data analysis code, manuscripts, websites, presentations

Why git?
* fast, offline, good at merging changes, industry standard

???

Git was originally developed to manage the source code for Linux.

Most features of git are aimed towards tracking changes made to plain
text code. It's not very useful to look at the raw source code of a 
`.pdf` or `.docx` file. Though these files can definitely still be tracked
by git.

I use git for all my software projects, most all data analysis projects, 
some manuscripts (see previous comments), and for managing this courses 
website.

Git uses very clever caching systems, which make it incredibly fast and 
lightweight (i.e., doesn't eat up all your computer's memory). It works
offline (Dropbox does not).

---

### Terminology

* __Repository__ 
  * a directory of (ideally) plain-text files included in a project

* __Commit__ 
  * a unique flag for a particular state of your project

* __History__
  * a history of all of the commits for a project

???

We'll see that a repository is literally just a directory that includes
a particular (hidden) subfolder that the git program interacts with.

If you've ever used Dropbox (or similar), a commit is like the version 
history that you can view on Dropbox. However, rather than automatically
remembering a version of a file __every time you save the file__ (what
Dropbox does), we need to explicitly give a command to git to make a 
commit and a commit can include multiple files. So it really is a version
history for a __project__ rather than a version history for an individual 
file (though it's also that).

---

### Example repository

.center[![An example GitHub repository](repos.png)]

???

This is a repository of the code that builds a piece of software
for a recent research project (one that inspired much of the contents
of this course!). It is a directory of files, in this case, hosted on
GitHub. The README.md file describes the contents of the folder.

---

### Example commit

.center[![An example GitHub commit](commit.png)]

???

If you view a commit on GitHub, you can easily see what lines 
were modified -- the old version appears in red and the new version
in green. This can be very useful for targeted debugging of software.

---

### Example history

.center[![History of commits](history.png)]

???

One of the beauties of git is that your entire commit history can 
be viewed. You can see how a project evolves over time, revert to 
any point in its history. If it is a software repository, you can 
easily access past versions of software. Etc...

---

### Configuring git

The first time you use `git`, you will need to configure.

```bash
# set user name
git config --global user.name "Jane Doe"
# set email
git config --global user.email "janedoe@emory.edu"
```

* Other options to consider:
  * [`core.editor`](https://docs.github.com/en/get-started/getting-started-with-git/associating-text-editors-with-git) controls which editor is used
  * `core.excludesFile` controls file types that are ignored (more later) 
  * all options [in docs](https://git-scm.com/docs/git-config)

---

### Initialize repository

Before workflow, we'll use `git init` to set up a repository.

```bash
# make a directory (tmp_gitdir is "git repository")
mkdir ~/tmp_gitdir
# move into directory
cd ~/tmp_gitdir
# initialize repository
git init
```

```
## hint: Using 'master' as the name for the initial branch. This default branch name
## hint: is subject to change. To configure the initial branch name to use in all
## hint: of your new repositories, which will suppress this warning, call:
## hint: 
## hint: 	git config --global init.defaultBranch <name>
## hint: 
## hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
## hint: 'development'. The just-created branch can be renamed via this command:
## hint: 
## hint: 	git branch -m <name>
## Initialized empty Git repository in /Users/dbenkes/tmp_gitdir/.git/
```

```bash
# see what it did
ls -lha ~/tmp_gitdir
```

```
## total 0
## drwxr-xr-x   3 dbenkes  staff    96B Aug 18 10:10 .
## drwxr-x---+ 40 dbenkes  staff   1.3K Aug 18 10:10 ..
*## drwxr-xr-x   9 dbenkes  staff   288B Aug 18 10:10 .git
```

???

All the information about a repository is stored in this `.git` directory. 
Don't ask me how. It's just magic, as far as I can tell.

---

### Basic git workflow

* Make modifications to files. Save files.
* `git status`
  * shows what files have changed since your last commit
* `git add` 
  * tell git which files to add to this commit
* `git commit -m "I fixed so many things!"`
  * make a commit with a message `-m`
* `git push origin master`
  * `push` the `master` branch to the `origin` remote

???

These are the git commands you will use 90-95\% of the time (unless you 
are heavily involved in collaborative production of software). We will talk 
through each of these steps in depth and what they mean.

---

### Produce some files

```bash
# make a README.md file from the command line
echo "## My first repository" >> README.md
echo "This is just a toy repository for demonstration." >> README.md

# make a silly bash script
echo "#! /bin/bash" >> silly_shell.sh
echo "echo 'Hello world'" >> silly_shell.sh

# check what has changed
git status
```

```
## On branch master
## 
## No commits yet
## 
## Untracked files:
##   (use "git add <file>..." to include in what will be committed)
## 	README.md
## 	silly_shell.sh
## 
## nothing added to commit but untracked files present (use "git add" to track)
```

???

The `echo` commands here are making two files: `README.md` and `silly_shell.sh`.
These are just meant to provide examples of files you might create for a 
coding project.

We use `git status` to list what has changed since our last commit.

At this point all we see is that there are "Untracked files" in the `tmp_gitdir`
directory. This means that there are files in the directory that we have not
yet told `git` that we want to track.

---

### Add files to a commit

```bash
# add the README to the commit
git add README.md
git status
```

```
## On branch master
## 
## No commits yet
## 
*## Changes to be committed:
##   (use "git rm --cached <file>..." to unstage)
## 	new file:   README.md
## 
## Untracked files:
##   (use "git add <file>..." to include in what will be committed)
## 	silly_shell.sh
```

We have now .gold[staged] `README.md` to be committed.

???

Staging just means that we are getting ready to make a commit, but nothing
has yet been committed.

---

### Make a commit

```bash
# make a commit
git commit -m "added README to repo"
```

```
## [master (root-commit) 2e4453f] added README to repo
##  Committer: dbenkes <dbenkes@bioc02wg0k1hx8f.wireless.emory.edu>
## Your name and email address were configured automatically based
## on your username and hostname. Please check that they are accurate.
## You can suppress this message by setting them explicitly. Run the
## following command and follow the instructions in your editor to edit
## your configuration file:
## 
##     git config --global --edit
## 
## After doing this, you may fix the identity used for this commit with:
## 
##     git commit --amend --reset-author
## 
##  1 file changed, 2 insertions(+)
##  create mode 100644 README.md
```

* `-m` provides a message as to what the commit contains
  * __try__ to make these messages meaningful
* the commit is assigned a unique alpha-numeric code

???

Without `-m`, git will spawn a text editor and force you to type something.

You can use wildcards when committing files 
* `git add *` will add all files (except those starting with `.`)
* `git add *.R` will add all files with `.R` extension
* etc...

Try to use frequent small commits. If you are constantly committing ten 
different files with changes that accomplish ten different goals, then it
sort of defeats the purpose of versioning.

It's difficult in the midst of big coding projects to always use meaningful 
commit messages, but it is worth it to try your hardest.
  * If you dig through my commit history, you may find lots of `blah`, `fix`,
  `fix again`, `fix for the last time?` messages -- this is not good practice. 
  Do as I say, not as I do...
  * If you can tolerate adult language, [Commit Logs from Last Night](http://www.commitlogsfromlastnight.com/) is good for a few laughs.

Again, `git` is most useful for storing code. Commit the source code, but 
consider whether you really need to considered derived files (images, pdfs, 
etc...). It's OK to do so, but generally "frowned upon".

---

### Viewing history of commits

A history of commits can be viewed using `git log`.

```bash
# view history of commits
git log
```

```
## commit 2e4453fe27a1d6d6cf8afb58c0bcfb175e48f574
## Author: dbenkes <dbenkes@bioc02wg0k1hx8f.wireless.emory.edu>
## Date:   Thu Aug 18 10:10:10 2022 -0400
## 
##     added README to repo
```

???

This can useful for retrieving the commit number.

---

class: center, middle
background-color: #006c5b
background-image: url(honnold.jpg)
background-size: contain

.pull-left[.transparentbox[.small[*Using a commit is like using anchors when climbing to catch you if you fall. Commits play a similar role: if you make a mistake, you can't fall past the previous commit. Coding without commits is like free-climbing: you can travel much faster in the short-term, but the long-term chances of catastrophic failure are high! But you want to be judicious in your use of commits. Committing too frequently will slow your progress; use more commits when you're in dangerous territory. Commits are also helpful to others, because they show your journey, not just the destination.*
<div> - Hadley Wickham (not pictured)]]
]

---

### `.gitignore`

A `.gitignore` file tells `git` which files you __do not__ want to track.

```bash
# make a sandbox folder
mkdir sandbox
# add a (blank) file to it
touch sandbox/a_test_file.sh
# make a .gitignore file that ignores a sandbox folder
echo "sandbox/*" >> .gitignore
# check status of repo 
git status
```

```
## On branch master
## Untracked files:
##   (use "git add <file>..." to include in what will be committed)
## 	.gitignore
## 	silly_shell.sh
## 
## nothing added to commit but untracked files present (use "git add" to track)
```

* Note that `sandbox/` and contents are not detected by `git`.
  * but `.gitignore` itself is

???

`.gitignore` files are useful for several purposes:
* ignored derived output files (e.g., images)
* include a `sandbox` directory with embarrassing code that I don't want others to see
* ignoring stupid files that your computer makes for reasons I don't understand (like `_DS.Store` files that Macs love to produce for some reason)

I think it's a good idea to track your .gitignore file as part of your repository.

---

### Breakout exercise

* Add `silly_shell.sh` and commit.
* Add another line to `README.md`.
  * Either use `echo ... >> README.md` or edit by hand
* Stage `README.md` for commit, check `git status`, commit the change.
* Rename `silly_shell.sh` to `my_silly_shell.sh` (hint `git mv`).
* Stage `my_silly_shell.sh` for commit, check `git status`, commit the change. 
* Remove `my_silly_shell.sh` from repository (hint `git rm`).
* Stage commit, check `git status`, commit

???

The goal here is to get you familiar with what commits can look like. We've 
seen what it looks like when we add a file, now we're seeing what it looks like when
* file is modified
* file is renamed
* file is deleted

Note that `rm` and `mv` alone are __not sufficient__ for getting rid of files 
from a git repo.

---

### Branching

* A key feature of __git__ is the ability to develop __branches__.
  * Keep working code available on GitHub
  * Try something new that might break things
  * When working, __merge__ the code back to main branch

```bash
# make a branch called devel
git branch devel
# checkout that branch
git checkout devel
# see all branches
git branch
```

```
## Switched to branch 'devel'
## * devel
##   master
```

???

A shortcut to create and checkout a branch is `git checkout -b devel`

---

### Branching

On the `devel` branch, we can modify files in the repository.

```bash
# add a line to README and commit
echo "Some cool new info for the README!" >> README.md
git add README.md 
git commit -m "update the README"
```

```
## [devel e535530] update the README
##  Committer: dbenkes <dbenkes@bioc02wg0k1hx8f.wireless.emory.edu>
## Your name and email address were configured automatically based
## on your username and hostname. Please check that they are accurate.
## You can suppress this message by setting them explicitly. Run the
## following command and follow the instructions in your editor to edit
## your configuration file:
## 
##     git config --global --edit
## 
## After doing this, you may fix the identity used for this commit with:
## 
##     git commit --amend --reset-author
## 
##  1 file changed, 1 insertion(+)
```

```bash
# look at contents of README
cat README.md
```

```
## ## My first repository
## This is just a toy repository for demonstration.
*## Some cool new info for the README!
```

---

### Branching

When we switch back to the `master` branch, changes are "gone."

```bash
# switch back to master
git checkout master
```

```
## Switched to branch 'master'
```

```bash
# look at contents of README
cat README.md
```

```
## ## My first repository
## This is just a toy repository for demonstration.
```

---

### Merging

* When we're satisfied with changes, we `merge` branches.

```bash
# **!! on the master branch !!**
git merge devel
```

```
## Updating 2e4453f..e535530
## Fast-forward
##  README.md | 1 +
##  1 file changed, 1 insertion(+)
```

```bash
cat README.md
```

```
## ## My first repository
## This is just a toy repository for demonstration.
*## Some cool new info for the README!
```

---

### Branching

* Branches are a name for a __particular commit__ and its __ancestors__

.pull-left[
.center[
  .monobox[.left[
      A---C---E (master)
      <br> |
      <br> B---D------F (new-idea)
    ]
  ]
]

```bash
git checkout master
git merge new-idea
```
.center[
  .monobox[.left[
      A---C---E---G (master)
      <br> | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /
      <br> B---D------F (new-idea)
    ]
  ]
]

]

.pull-right[
.bottom-col-pad[
  * `master` = `A`,`C`,`E`
* `new-idea` = `A`, `B`, `D`, `F`
]

* `master` = `A`, `B`, `C`, `D`, `E`, `F`, `G`
  * `new-idea` = `A`, `B`, `D`, `F`
]

---

### Branching

* You can continue developing on each of these branches.

.center[
  .monobox[.left[
      A---C---E---G---H---I (master)
      <br> | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /
      <br> B---D------F---J---K (new-idea)
    ]
  ]
]
  * `master` = `A`, `B`, `C`, `D`, `E`, `F`, `G`, `H`, `I`
  * `new-idea` = `A`, `B`, `D`, `F`, `J`, `K`

???

Branching can get as complex as you want/need. Helpful to remember that a branch is just a history of commits.

---

### Merge conflicts

* What if commits on different branches affect same code?

<!-- 
  Include an extra code chunk to grab this bash

-->

```bash
git checkout devel
# replace a word in README
sed -i "" 's/cool/great/g' README.md
git add README.md
git commit -m "great info, not cool info."
```

```
## Switched to branch 'devel'
## [devel 490abd2] great info, not cool info.
##  Committer: dbenkes <dbenkes@bioc02wg0k1hx8f.wireless.emory.edu>
## Your name and email address were configured automatically based
## on your username and hostname. Please check that they are accurate.
## You can suppress this message by setting them explicitly. Run the
## following command and follow the instructions in your editor to edit
## your configuration file:
## 
##     git config --global --edit
## 
## After doing this, you may fix the identity used for this commit with:
## 
##     git commit --amend --reset-author
## 
##  1 file changed, 1 insertion(+), 1 deletion(-)
```

```bash
cat README.md
```

```
## ## My first repository
## This is just a toy repository for demonstration.
## Some great new info for the README!
```

???

The most common place I run into merge conflicts is in collaborative projects, where it happens all the time.

If you are on Mac OSX, you need to change the `sed` command to `sed -i "" 's/cool/great/g' README.md`.

---

### Merge conflicts

<!-- 
  Include an extra code chunk to grab this bash

-->

```bash
git checkout master
sed -i "" 's/cool/excellent/g' README.md
git add README.md
git commit -m "excellent info, not cool info."
```

```
## Switched to branch 'master'
## [master f57f7fb] excellent info, not cool info.
##  Committer: dbenkes <dbenkes@bioc02wg0k1hx8f.wireless.emory.edu>
## Your name and email address were configured automatically based
## on your username and hostname. Please check that they are accurate.
## You can suppress this message by setting them explicitly. Run the
## following command and follow the instructions in your editor to edit
## your configuration file:
## 
##     git config --global --edit
## 
## After doing this, you may fix the identity used for this commit with:
## 
##     git commit --amend --reset-author
## 
##  1 file changed, 1 insertion(+), 1 deletion(-)
```

```bash
cat README.md
```

```
## ## My first repository
## This is just a toy repository for demonstration.
## Some excellent new info for the README!
```
* There is now a conflict between `devel` and `master`!

---

### Merge conflicts

* Merging `devel` into `master` results in an error.

```bash
git merge devel
```

```
## Auto-merging README.md
## CONFLICT (content): Merge conflict in README.md
## Automatic merge failed; fix conflicts and then commit the result.
```

---

### Merge conflicts

* text between `<<<<<<` and `=======` is `master` code
* text between `======` and `>>>>>>>` is `devel` code

```bash
cat README.md
```

```
## ## My first repository
## This is just a toy repository for demonstration.
## <<<<<<< HEAD
## Some excellent new info for the README!
## =======
## Some great new info for the README!
## >>>>>>> devel
```

???

This occurs because both branches edited the same code. What happens now is git edits all files that have merge conflicts and indicates where the conflicts occur using the `<<<` `===` `>>>` symbols.

You can search files on these symbols to quickly identify all merge conflicts.

---

### Merge conflicts

* We need to edit `README.md` to resolve conflicts.

<!--

-->

```bash
# remove lines 3-5 and 7
sed -i "" '3,5d;7d' README.md
cat README.md
```

```
## ## My first repository
## This is just a toy repository for demonstration.
```

* Now we can successfully complete the merge.

```bash
git add README.md
git commit -m "fixed merge conflicts"
```

```
## [master b6595d4] fixed merge conflicts
##  Committer: dbenkes <dbenkes@bioc02wg0k1hx8f.wireless.emory.edu>
## Your name and email address were configured automatically based
## on your username and hostname. Please check that they are accurate.
## You can suppress this message by setting them explicitly. Run the
## following command and follow the instructions in your editor to edit
## your configuration file:
## 
##     git config --global --edit
## 
## After doing this, you may fix the identity used for this commit with:
## 
##     git commit --amend --reset-author
```

???

Here, it looks like we decided that we liked the `devel` version of `README.md` better, so we delete the `master` code (and the `<<<`, `===`, `>>>` lines), add and commit the result.

Checkout `git log --graph` to see how git is thinking about this.

---

### GitHub

[GitHub](https://github.com) is an __online host__ for git repositories.

Publish your software/code in a truly __open source__ way. 
* E.g., not behind .red[journal paywalls]

GitHub is a __de facto social network__ site for programmers.
* See what friends are working on, collaborate on projects

???

GitHub is a website that hosts git repositories. It has a nice graphical user interface for exploring repositories and collaborating. We'll learn how to do many git operations from the command line, but it's nice to know that many things can also be done in a web browser, for collaborators who may be scared of the command line.

GitHub provides an appealing way of publishing code associated with published projects. It gives you space to provide a nicely formatted README, make the code easily downloadable for anyone. You can even add a digital object identifier (e.g., through [Zenodo](https://zenodo.org/)).

Public repositories are free on GitHub and you can upgrade to pro account if you want unlimited private repositories ($7 a month, I think, or maybe cheaper since Microsoft took over?). There are also options for student accounts that allow limited number of private repositories.

[Bitbucket](https://bitbucket.org) is an alternative that allows unlimited private repositories.

---

### Why GitHub?

Online __backup__ for your code.

Provides an __appealing GUI__ for git. 
* Look through code and history
* Track issues, create to do lists, etc...

Community-oriented __collaborative__ coding projects.
* See a typo in someone's code? Fix it!
* Find a bug in someone's code? Fix it! (or at least report it)

???

Services like Dropbox provide continuous online back-up of saved file, as we've said, git only provides back up at certain points. GitHub allows for this backup to go online. I.e., if your computer blows up, you'll still be able to pull down your coding project.

Personally, I use git within Dropbox folders. I hardly ever use Dropbox to revert to saved versions of files, because I save files after every sentence I type out of habit. However, I very often use git revert when developing code.

---

### GitHub

* Create an account 
* Click + (upper right), New repository
* Give it a name and description
* .red[Do not initialize] with a `README`

```bash
# replace username with your user name and repo with 
# your repository name
git remote add origin https://github.com/username/repo
# push the repository
git push -u origin master
```

???

Here we are setting a `remote` for our repository, i.e., an online location that we (presumably) want to keep up to date with our local repository. 
* the `-u` option is only needed the first time you push; it sets the 
"upstream" remote branch
  * once it is set `git push` will assume that's where you want to `push`
* remotes can be viewed using `git remote -v`

The `git remote` command just adds lines to the `.git/config` file. 
* If you mess up, just edit that file

You may be prompted to enter your GitHub user name and password when you `git push`. It is worth setting up an `ssh key`, which is described in our video tutorials.

Technically, `origin` is just a nickname for the remote location. However, it is convention to call the original source of code (i.e., main directory from which code originated) the `origin`.

---

### Issues and pull requests

* Bug in someone's code?
  * File an __issue__ 
  * But __PLEASE__ read documentation first!

* Even better: submit a __pull request__!
  * Fork the repo on GitHub
  * `git clone` to download to local machine
  * Modify code (read contributing guidelines first!)
  * `commit` changes
  * `git push` back to GitHub
  * Submit a [pull request](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)

???

* First, find the repository on github and click "Fork". This will copy
the repository to your GitHub account. 
* Use `git clone https://github.com/username/repo` to download your forked
version of the repository to your local computer. 
  * `git clone` automatically sets this GitHub repo as `origin`
* Modify code on your local computer. Make sure it works. 
* `git add` and `git commit`. Make sure your commit messages are informative. Remember, now somebody else will definitely be looking at them.
* `git push` your changes to GitHub. 
* Go to *your* GitHub repo and click "Pull Requests" and "New pull request"

More in-depth description of this workflow can be viewed [here](https://happygitwithr.com/fork-and-clone.html). Of note:
* consider submitting a pull request from non-`master` branch
* this allows you to free up `master` to track the original repo remotely
  * i.e., set an upstream branch
  * if owner makes changes to original repo, you'll need to integrate those into your pull request branch. Easier to do that from a non-`master` branch

---

### Pull requests

Suggested workflow for testing out pull request:

```bash
# add friend's repo as a remote branch
git remote add friend https://github.com/friend/repo
# downloads friend's branch, but do not merge it yet
git fetch friend master
# view all local and remote branches
git branch -a
# checkout friends remote branch
git checkout remotes/friend/master
# make a local branch based on friend's repo
git checkout -b friend
# test out the branch; make sure it works as expected
[...]
# checkout local master, merge, push
git checkout master
git merge friend
git push origin master
```

---

### `fetch` vs. `pull`

If you trust the PR code, you can merge directly on GitHub.
* Then need to update your local repository with the changes.

Two ways of updating local repository with PR-related code.
* `git fetch`

* Download code separately from your local repository.
  * Have to explicitly `merge` into your local repository.

* `git pull`
  * `fetch` and `merge` at the same time
  * Faster, but .red[no chance to verify that the code works]g before updating your local repository.

???

I use `git pull` when I'm working on the same repository from two different computers (e.g., my laptop and a remote computer) to keep code tracked between the two. If you have Dropbox, then this is not needed, as Dropbox will automatically sync.

Best practice is to use `git fetch` always.

---

### PR workflow visualized

.center[
<iframe src="https://emory-my.sharepoint.com/personal/dbenkes_emory_edu/_layouts/15/Doc.aspx?sourcedoc={42c1835d-98d0-4b3a-a9a7-d115df0b605d}&amp;action=embedview&amp;wdAr=1.7777777777777777" width="800px" height="505px" frameborder="0">This is an embedded <a target="_blank" href="https://office.com">Microsoft Office</a> presentation, powered by <a target="_blank" href="https://office.com/webapps">Office</a>.</iframe>
]

---

### Useful git tricks

* Help! I have __too many commits__!
  * Use `--amend` to make work in progress (`WIP`) local commits.

```bash
# assume project starts in functional state
# make a small change to project
[...]
# commit current changes under WIP heading BUT DON'T PUSH!
git add * && git commit -m "WIP"
# make more changes
[...]
# amend past commit
git commit --amend --no-edit
# make more changes
[...]
# amend past commit
git commit --amend --no-edit
...
# push only after confirming everything works
git commit --amend -m "awesome new feature that fixes everything"
git push origin master
```

???

The benefit of this is that as far as git and GitHub are concerned, you just made a single commit. However, it is frowned upon to "rewrite git history"; that is, amending commits once they have been published to GitHub, because presumably somebody could have `pulled` that code already. However, adopting this workflow locally is OK.

I personally just maintain an embarrassingly large number of commits 🤷‍♂️.

---

### Useful git tricks

Help! I was in the middle of my `amends` and __everything broke__.
* Use `git reset` to fall back
* Resets files to their state at the most recent commit

```bash
# assume project starts in functional state
# make a small change to project
[...]
# commit current changes under WIP heading BUT DON'T PUSH!
git add * && git commit -m "WIP"
# make more changes
[...]
# amend past commit
git commit --amend --no-edit
# make more changes
[...]
# !!! UH OH EVERYTHING BROKE !!! #
git reset --hard
```

---

### Useful git tricks

Help! I tried to `pull` but got this error.

```bash
## To https://github.com/username/repo.git
##  ! [rejected]        master -> master (fetch first)
## error: failed to push some refs to 'https://github.com/YOU/REPO.git'
## hint: Updates were rejected because the remote contains work that you do
## hint: not have locally. This is usually caused by another repository pushing
## hint: to the same ref. You may want to first integrate the remote changes
## hint: (e.g., 'git pull ...') before pushing again.
## hint: See the 'Note about fast-forwards' in 'git push --help' for details.
```

* Your local branch and the branch on GitHub have __diverged__. 
  * `fetch`/`merge` or `pull` changes from GitHub then `push`

???

This can happen frequently when multiple people are pushing to the same location. This can be avoided by:
* being the first to push
* using branches judiciously 
* getting in the habit of `pull`ing every time you start to work, and `push`ing everytime you finish

---

### Git tricks

Help! I need to `pull` but I have local work!
* E.g., I wanted to `push` but have __diverged__.

* Simplest case:
  * `git pull` will "just work", i.e., you haven't __diverged__

* More complicated case:
  * `commit` your local work
  * `fetch` remote branch, `merge`, resolve conflicts, `push`

* `git stash` can sometimes be used

???

This is a very common situation. Ideally, there will be no conflicts and you can just `pull` down the new repo without affecting your code atall. But remember `git pull` will try to `fetch` AND `merge` with a new commit. So you may get kicked into a text editor to enter a message explaining the "reason for the merge." Beyond that you may also have merge conflicts that will need to be resolved, as discussed previously.

`git stash` can sometimes be useful. I use it when I'm lazy and I don't really care about overwriting my local changes with what's in the repo. This happens often when I'm working on a remote computer and I've made trivialchanges to scripts (like `chmod +x` or similar) that I've forgotten about. It's fast to type `git stash && git pull && chmod +x *` then to do something more clever.

---

### Breakout exercise

Download <a href="https://raw.githubusercontent.com/benkeser/info550/master/lectures/06_versioncontrol/github_exercise.md" download>`github_exercise`</a>.
* With your breakout partner, follow the instructions.
* Determine who is User A and who is User B.
* Practice the workflows we've just learned!

---

background-color: #84754e
class: title-slide, center, inverse, middle

*Open source means everyone can see my stupid
mistakes.*
<div>

*Version control means everyone can see every stupid
mistake I’ve ever made.*
<div> - Karl Broman

???

If you store your code on GitHub, everyone can see everything there is and everything that ever was. You may be shy or embarrassed about your code, __but__ probably no one is looking and if they are, that is a good thing!