Version control
with Git and GitHub

Carlos Matos // ISPUP // November 2023

What is ?

  • Application that runs on your computer
  • Helps you manage work done on projects
  • It’s like the track changes option on Word, but on steroids
  • Rigorous, powerful, scales up to multiple files (data, figures, reports, code)

Why use version control for data science?

Why use version control for data science?

  • Organizing and recording your work becomes part of your workflow, instead of a separate and burdensome task
  • Collaboration is more structured, improved asynchronous work and managing versions
  • After the initial setup, almost effortless to put your project online is marginal
  • Online exposure
  • Works as a backup (safe, shareable and accessible from different places)

What is ?

  • GitHub complements Git
    • Distribution mechanism for Git repositories
    • Git works locally, GitHub online
    • Control who has what type of access (view, edit, admin)
    • Easy to regularly reconcile files between contributors

GitHub features

  • Markdown integration
  • Issues lets users of your code report problems they found
  • Pull requests by people that want to add features to your work
  • Github Pages to quickly host static websites (like the one where these slides are hosted!)
  • Easy to deploy packages or other functionality (e.g. Some R packages are only available on GitHub, not on CRAN

Git and GitHub jargon

  • repo or repository - your project folder
  • commit - save a snapshot to your repo
  • hash - a computer generated id for each commit
  • checkout - time travel to a specific commit
  • branch - a label that points to a commit
  • merge - combination of two or more branches
  • remote - a repo that exists in a different location (e.g. GitHub, other PC)
  • pull - get new commits from the remote to the repo
  • push - send your new commits to the remote

Git and GitHub jargon

  • repo or repository - your project folder
  • commit - save a snapshot to your repo
  • hash - a computer generated id for each commit
  • checkout - time travel to a specific commit
  • branch - a label that points to a commit
  • merge - combination of two or more branches
  • remote - a repo that exists in a different location (e.g. GitHub, other PC)
  • pull - get new commits from the remote to the repo
  • push - send your new commits to the remote

Git and GitHub workflow

  • We will work almost exclusively with RStudio, not the command line!

How to revert to a previous commit?


gitGraph
    commit
    commit
    branch new_feature
    checkout new_feature
    commit
    commit
    checkout main
    merge new_feature
    commit id: "5-106eefa"
    commit
  • Suppose that When you made commit #6, you made some mistakes and want to revert back to the state of your repo at moment #5.
  • You would write in the terminal git reset 106eefa

Initial setup

  1. Register a free account on GitHub
  2. Install Git
  3. Introduce yourself to git (username, email)
## install.packages("usethis")

library(usethis)
use_git_config(user.name = "Your Name", user.email = "email@example.org")

#configure the default branch name
usethis::git_default_branch_configure()

#Create token
#I highly recommend selecting “repo”, “user”, and “workflow” scopes (the default ones)
usethis::create_github_token()

Initial setup

  1. Tell RStudio where to find Git (Tools > Global Options > Git/SVN should point to your Git install directory)
    1. usually something like “C:/Program Files/Git/bin/git.exe” on windows or “/usr/bin/git” on macOS
  2. Create a project on GitHub. More detailed instructions available here
  3. On the GitHub repository, click on Code and copy the HTTPS value
  4. After that, create a new RProject
    1. File > New Project > Version Control > Git
  5. If all went well, you should have a new folder with the same name as your GitHub repo
  6. If you already had a project, move the contents to the repo directory

Folder structure

  • a .git hidden folder exists at the project directory, that should not be deleted
  • a .gitignore file will be present if you select R language option on GitHub when first creating your repo (recommended)

Exercises

  • After moving the course scripts to the newly created project, stage the files, commit and push to GitHub.
  • Make some changes and commit again. Check the results on git and GitHub.