Introduction to Biostatistical Computing PHC 6937
GitHub
Zhiguang Huo (Caleb)
Monday Oct 10th, 2022
Outline
- Git
- Version control
- Basic command about GitHub
- GitHub
- Introduction
- GitHub commands
- Host R packages on GitHub
Git
- Git is a distributed version-control system for tracking changes in source code during software development.
- It is designed for coordinating work among programmers
- It can be used to track changes in any set of files.
Benefits of version control
- For yourself:
- Keep complete history of changes, and rationale for all changes
- Go back to a previous version of your code
- Support multiple version of the same basic project
- Collaborative project:
- Simplifies concurrent work and merging changes
- You can back your code remotely (on GitHub), which can be easily distributed
Basic command on Git
- Install Git
- Start Git
- Set up username (if this is your first time to use git)
- You only need to do this once
git config --global user.name "Sam Smith"
git config --global user.email sam@example.com
Git basics
Git basics (The three states)
- The Git directory (repository)
- stores the metadata and object database for your project.
- what is copied when you clone a repository from another computer.
- Snapshot (commit) of your project history, for which you can go back to.
- The working directory
- a single checkout of one version of the project.
- usually pulled from the Git directory for you to use and modify
- The staging area
- stores information about what will go into your next commit
- sometimes referred to as the index
- intermediate step between the working directory and the Git directory
Git workflow
- Modify files in your working directory.
- Stage the files, adding snapshots of them to your staging area (index).
- Do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.
Initialize a git repository
- initialize git on your local computer
cd Desktop
mkdir testGit
cd testGit
git init
- initialize on GitHub and clone to your PC
- Will revist this method shortly
- Suggest to use this way to connect to GitHub
The repository
- The repository contains everyting about your project
- code
- data
- documentation
- .git, which is a hidden folder
- At any time,
- You can make some changes, take a snapshot of your project, and commit to the repository (called a commit object)
- you can revisit the snapshot at any time
Make changes in the git repository
- Your changes: happen in your working directory (untracked)
- Index/Stage your changes:
git add XXX
- Commit the indexed/staged changes to the git repository
git commit -m "your message"
Example to make change in git repository (use R codes as example)
WD <- '~/Desktop'
setwd(WD)
usethis::create_package("GatorPKG", open = FALSE) ## open = FALSE will prevent R open a new R studio session.
WD2 <- '~/Desktop/GatorPKG'
setwd(WD2)
- Put in f.R in the R folder
##' Add up two numbers (Description)
##'
##' We want to add up two numbers, blalala... (Details)
##' @title add two numbers
##' @param x first number
##' @param y second number
##' @return sum of two numbers
##' @author Caleb
##' @export
##' @examples
##' f(1,2)
f <- function(x, y) x + y
Example to make change in git repository (git part)
cd ~/Desktop/GatorPKG
git init
- Check the status of files in the working directory
git status
git add -A ## -A stage all your files
git status
git commit -m "first commit"
git status
git status
- Tells what Git thinks is going on
- Do this frequently!
- before staging your workspace
- after staging your workspace, before committing
- after committing
git add
git add newFile
git add newFile1 newFile2 newFile3
git add -A
git commit
- committing makes a snapshot of everything that has been staged in your repository
- a short message is necessary
git commit -m "any message you want to make here"
- The message will helpful for future you and your teammates
- If you don’t type in a message, an editor (usually vi) will be open for you to enter the message. Vi is hard to use for beginners. See https://kb.iu.edu/d/afcz about how to quit vi, just in case.
- After you commit your changes to git, it create a commit object
- You can view all commit objects by
git log
- git log also allows you to revisit previous committed object (snapshot)
git checkout
Go to previous snapshot
git checkout SHA1number
- SHA1 number: Secure Hash Algorithm 1
- full SHA1 number
- first 6 digits
More on git
- Git only saves the changes from the previous commit, won’t waste space
- Make commits often
- Will revist more about git commands later
- branching: have multiple version of your software
- diff: compare two commits
- revert
So far, local version control
Distributed version control
Distributed version control
- GitHub
- GitHub provides free repositories, given that you make your project open.
- GitHub provides inidivual accounts, and organizational accounts.
- Academic users (you need to apply) can have free private repositories.
- Usually 1Gb limit per repository.
- https://github.com
- When you prepare R packages for CRAN or bioconductor, you will need GitHub
- BitBucket
- Similar to GitHub, but provide free private repositories.
Connect the local repository with GitHub
- Benefit
- You can work anywhere with any computer, as long as you can pull/fetch your project from the remote.
- You and your teammates can work on the same project from the same remote.
Set up GitHub and connect with your local repository (1)
- Create a new repository on GitHub (GatorPKG3).
Set up GitHub and connect with your local repository (2)
- clone to your local computer (e.g., desktop)
cd "~/Desktop"
git clone https://github.com/Caleb-Huo/GatorPKG3.git
Make sure to use this way to initialize your repository, in order to connect to GitHub
Set up GitHub and connect with your local repository (3)
Put your code in the GatorPKG3 package
- You should have at least the following files or folders:
Set up GitHub and connect with your local repository (4)
- At the directory of your local repository (e.g., ~/Desktop/GatorPKG3), do the following in the terminal
git add -A
git commit -m "this is a R pacakge"
git log
- Push your changes in local repository to GitHub
git push
- Refresh your GitHub page to see if the changes are made on GitHub
Exercise: put the R package on GitHub
devtools::document()
- add, commit, and push back to GitHub
git add -A
git commit -m "1st R pacakge"
git push
git log
You will see a SHA1 number (Secure Hash Algorithm 1), this is the access number of a commit object
- Install the package from GitHub in R
devtools::install_github("Caleb-Huo/GatorPKG3")
library(GatorPKG3)
f(1,2)
?f
Your turn
Goals:
- make an R package on your GitHub repository
- install this package by devtools::install_github
Add README.md on GitHub
- md represents mark down
- .md is very similar to .Rmd
- except it won’t evaluate R code
- Below is an toy example of README.md
# testGatorPKG
This is a fancy R package
## how to install the R package
devtools::install_github("Caleb-Huo/GatorPKG3")
## example
f(1,2)
- You can add it locally, and push back to GitHub
- Or you can directly edit on the remote, then fetch/pull to the local repository
More on the remote
Two ways to get your PC updated with the latest remote repository
- If you already have a copy of the remote repository on your computer
- If you don’t have a copy of the remote repository on your computer
Keep updated with the remote
- Update your local repository
git fetch
git pull
Clone a existing remote repository to your local PC
git clone https://github.com/cran/mclust.git
After some changes of this package
If you are the owner/team member of the origin remote repository, you are able to push back.
If you are not the owner, you won’t be able to push back
Fork a repository on your GitHub account
- Click on Fork, the repository will be copied to your GitHub account
- Then you can do anything you want on the forked repository, under their license
Other files in the repository
ls -a ## list all files, including hidden ones
Git whole picture
- Have talked about initialize, update, changes
- Will talk about branching before revert and diff
Create a another branch
- By default, we are on the main branch
- Now we want to create some new features on the develop/feature branch
- Create a new branch and switch to it:
git checkout -b <branchname>
- Switch from one branch to another:
git checkout <branchname>
- List all branches and tell where am I:
git branch
git branch -d <branchname>
- Push the branch to the remote
git push origin <branchname>
Git branches
- main
- hotfixes
- release branches
- develop
- feature branches
Exercise for creating a another branch
git checkout -b vignette
usethis::use_vignette("GatorPKG3")
git commit -A
git commit -m "include vignette"
- Check all available branches (your current branch has a * on it)
git branch
- After going back to main, the edits on the vignette branch does not exist in main
git checkout main
Git revert
Go back to the previous snapshot by SHA1 number
git checkout -b old-state f0d8506
git branch
- To go back to where you were, just check out the branch you were on again.
git checkout main
Delete commits
Don’t suggest to do so.
If you want to try, suggest to try this in an experimental branch.
- Revert committed files as staged files
- You still keep all the files
git reset --soft HEAD@{1}
git reset --soft deb4555
git reflog
- Delete commits in a hard way
git reset --hard 2278f51
Be cautious about git reset –hard, though should be a way to recover.
Compare with another commit object
- compare all changes in the working directory with the last commit
git diff
- compare the change for a specific file in the working directory with the last commit
git diff --base <filename>
git diff aSHA1 bSHA1
git diff <sourcebranch> <targetbranch>
Git merge (when there are no conflicts)
- keep merging history
- If you want to merge branch B (vignette) into branch A (main)
- Go to branch A (main)
- merge –no-ff
git checkout main
git merge --no-ff vignette
git log
Set alias for github command
- my favorate colorful logs
git config --global alias.lg "log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --"
git lg
Git merge (when there are conflicts)
- make some changes in the main’s branch
- make some changes (with conflicts) in the vignette’s branch
- notice that after you switch to the vignette’s branch, the previous add/commit to the main’s branch disappear
git checkout main
git merge --no-ff vignette ## fail
git diff ## check the difference
## open the conflict file, resolve conflicts
git commit -a -m "conflict resolved"
Display html on github
- the html file won’t automatically show up in a webpage
- need the following trick
Summarize all important things about Git/GitHub
Reference:
- GitHub pro
- Summary of Git command
- Many contents are from the following resources:
Things you should have done, by the end of this week
- Install Python (>3.8)
- Install Anaconda