Biostatistical Computing, PHC 6068
GitHub
Zhiguang Huo (Caleb)
Monday September 30, 2019
Outline
- Git
- Version control
- Basic command about GitHub
- GitHub
- Introduction
- GitHub commands
- Host R packages on GitHub
Git
- Git is a distributed version-control system for tracking changes in source code during software development.
- It is designed for coordinating work among programmers
- It can be used to track changes in any set of files.
Benefits of version control
- For yourself:
- Keep complete history of changes, and rationale for all changes
- Go back to a previous version of your code
- Support multiple version of the same basic project
- Collaborative project:
- Simplifies concurrent work and merging changes
- You can back your code remotely (on GitHub), which can be easily distributed
Basic command on Git
- Install Git
- Start Git
- Windows: open MobaXterm
- Mac: open Terminal
- Set up username (if this is your first time to use git)
- You only need to do this once
git config --global user.name "Sam Smith"
git config --global user.email sam@example.com
Git basics
Git basics (The three states)
- The Git directory (repository)
- stores the metadata and object database for your project.
- what is copied when you clone a repository from another computer.
- Snapshot (commit) of your project history, for which you can go back to.
- The working directory
- a single checkout of one version of the project.
- usually pulled from the Git directory for you to use and modify
- The staging area
- stores information about what will go into your next commit
- sometimes referred to as the index
Git workflow
- You modify files in your working directory.
- You stage the files, adding snapshots of them to your staging area (index).
- You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.
Initialize a git repository
- initialize git on your local computer
- Not suggest to use this way to connect to GitHub
cd Desktop
mkdir testGit
cd testGit
git init
- initialize on GitHub and clone to your PC
- Will revist this item shortly
The repository
- The repository contains everyting about your project
- code
- data
- documentation
- .git, which is a hidden folder
- At any time,
- You can make some changes, take a snapshot of your project, and commit to the repository (called a commit object)
- you can revisit the snapshot at any time
Make changes in the git repository
- Your changes: happen in your working directory (untracked)
- Index/Stage your changes:
git add XXX
- Commit the indexed/staged changes to the git repository
git commit -m "your message"
Example to make change in git repository (use R codes as example)
WD <- '~/Desktop'
setwd(WD)
devtools::create("GatorPKG")
WD2 <- '~/Desktop/GatorPKG'
setwd(WD2)
- Put in f.R in the R folder
##' Add up two numbers (Description)
##'
##' We want to add up two numbers, blalala... (Details)
##' @title add two numbers
##' @param x first number
##' @param y second number
##' @return sum of two numbers
##' @author Caleb
##' @export
##' @examples
##' f(1,2)
f <- function(x, y) x + y
Example to make change in git repository (git part)
- Initialize git (in git bash)
cd ~/Desktop/GatorPKG
git init
- Check the status of files in the working directory
git status
git add -A ## -A stage all your files
git status
git commit -m "first commit"
git status
git status
- Tells what Git thinks is going on
- Do this frequently!
- before staging your workspace
- after staging your workspace, before committing
- after committing
git add
git add newFile
git add newFile1 newFile2 newFile3
git add -A
git rm --cached <file>..
git commit
- committing makes a snapshot of everything that has been staged in your repository
- a short message is necessary
git commit -m "any message you want to make here"
- The message will helpful for future you and your teammates
- If you don’t type in a message, an editor will be open for you to enter the message
git commit
- After you commit your changes to git, it create a commit object
- You can view all commit objects by
git log
More on git
- Git only saves the changes from the previous commit, won’t waste space
- Make commits often
- Will revist more about git commands later
So far, local version control
Distributed version control
Distributed version control
- GitHub
- GitHub provides free repositories, given that you make your project open.
- GitHub provides inidivual accounts, and organizational accounts.
- Academic users (you need to apply) can have free private repositories.
- Usually 1Gb limit per repository.
- https://github.com
- When you prepare R packages for CRAN or bioconductor, you will need GitHub
- BitBucket
- Similar to GitHub, but provide free private repositories.
Connect the local repository with GitHub
- Benefit
- You can work anywhere with any computer, as long as you can pull/fetch your project from the remote.
- You and your teammates can work on the same project from the same remote.
Set up GitHub and connect with your local repository (1)
- Create a new repository on GitHub (GatorPKG3).
Set up GitHub and connect with your local repository (2)
- clone to your local computer (e.g., desktop)
git clone https://github.com/Caleb-Huo/GatorPKG3.git
Make sure to use this way to initialize your repository, in order to connect to GitHUb
Set up GitHub and connect with your local repository (3)
Put your code in the GatorPKG3 package
- You should have at least the following files:
Set up GitHub and connect with your local repository (4)
- At the directory of your local repository (e.g., ~/Desktop/GatorPKG3), do the following in the terminal
git add -A
git commit -m "this is a R pacakge"
git log
- Push your changes in local repository to GitHub
git push
- Refresh your GitHub page to see if the changes are made on GitHub
Exercise: put the R package on GitHub
devtools::document()
- add, commit, and push back to GitHub
git add -A
git commit -m "1st R pacakge"
git push
git log
You will see a SHA1 number (Secure Hash Algorithm 1), this is the access number of a commit object
- Install the package from GitHub in R
devtools::install_github("Caleb-Huo/GatorPKG3")
library(GatorPKG3)
f(1,2)
?f
Your turn
Goals:
- make an R package on your GitHub repository
- install this package by devtools::install_github
Add README.md on GitHub
- md represents mark down
- .md is very similar to .Rmd
- except it won’t evaluate R code
- Below is an toy example of README.md
# testGatorPKG
This is a fancy R package
## how to install the R package
devtools::install_github("Caleb-Huo/GatorPKG3")
## example
f(1,2)
- Of course, you can add it locally, and push back to GitHub
- Now the remote repository is ahead of our local repository
More on the remote
Two ways to get your PC updatd with the remote repository
- If you already have a copy of the remote repository
- If you don’t have a copy of the remote repository
Keep updated with the remote
- Update your local repository
git fetch
git pull
Clone a existing remote repository to your local PC
git clone https://github.com/cran/mclust.git
After some changes of this package
If you are the owner/team member of the origin remote repository, you are able to push back.
If you are not the owner, you won’t be able to push back
Fork a repository on your GitHub account
- Click on Fork, the repository will be copied to your GitHub account
- Then you can do anything you want on the forked repository, under their license
Other files in the repository
ls -a ## list all files, including hidden ones
Git whole picture
- Have talked about initialize, update, changes
- Will talk about branching before revert and diff
Create a another branch
- By default, we are on the master branch
- Now we want to create some new features on the develop/feature branch
- Create a new branch and switch to it:
git checkout -b <branchname>
- Switch from one branch to another:
git checkout <branchname>
- List all branches and tell where am I:
git branch
git branch -d <branchname>
- Push the branch to the remote
git push origin <branchname>
Git branches
- master
- hotfixes
- release branches
- develop
- feature branches
Exercise for creating a another branch
git checkout -b vignette
usethis::use_vignette("GatorPKG3")
Git revert
Go back to the previous snapshot by SHA1 number
git checkout -b old-state f0d8506
git branch
- To go back to where you were, just check out the branch you were on again.
git checkout master
Delete commits
Don’t suggest to do so, but just in case you don’t want other people see your code history
If you want to try, suggest to try this in an experimental branch
Compare with another commit object
- compare all changes in the working directory with the last commit
git diff
- compare the change for a specific file in the working directory with the last commit
git diff --base <filename>
git diff aSHA1 bSHA1
git diff <sourcebranch> <targetbranch>
Git merge (when there are no conflicts)
- keep merging history
- If you want to merge branch B (vignette) into branch A (master)
- Go to branch A (master)
- merge –no-ff
git checkout master
git merge --no-ff vignette
git log
Set alias for github command
- my favorate colorful logs
git config --global alias.lg "log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --"
git lg
Git merge (when there are conflicts)
- make some changes in the master’s branch
- make some changes (with conflicts) in the vignette’s branch
- notice that after you switch to the vignette’s branch, the previous add/commit to the master’s branch disappear
git checkout master
git merge --no-ff vignette ## fail
git diff ## check the difference
## open the conflict file, resolve conflicts
git commit -a -m "conflict resolved"
Display html on github
- the html file won’t automatically show up in a webpage
- need the following trick
Summarize all important things about Git/GitHub
Reference:
- GitHub pro
- Summary of Git command
- Many contents are from the following resources:
A typical Git work flow (revisit)