Programming basics for Biostatistics 6099
GitHub
Zhiguang Huo (Caleb)
Thursday Oct 5th, 2023
Outline
- Git
- Version control
- Basic command about GitHub
- GitHub
- Introduction
- GitHub commands
- Host R packages on GitHub
Git
- Git is a distributed version-control system for tracking changes in
source code during software development.
- It is designed for coordinating work among programmers
- It can be used to track changes in any set of files.
Benefits of version control
- For yourself:
- Keep complete history of changes, and rationale for all changes
- Go back to a previous version of your code
- Support multiple version of the same basic project
- Collaborative project:
- Simplifies concurrent work and merging changes
- You can back your code remotely (on GitHub), which can be easily
distributed
Basic command on Git
- Install Git
- Start Git
- Set up username (if this is your first time to use git)
- You only need to do this once
git config --global user.name "Sam Smith"
git config --global user.email sam@example.com
Git basics
Git basics (The three states)
- The Git directory (repository)
- stores the metadata and object database for your project.
- what is copied when you clone a repository from another
computer.
- Snapshot (commit) of your project history, for which you can go back
to.
- The working directory
- a single checkout of one version of the project.
- usually pulled from the Git directory for you to use and modify
- The staging area
- stores information about what will go into your next commit
- sometimes referred to as the index
- intermediate step between the working directory and the Git
directory
Git workflow
- Modify files in your working directory.
- Stage the files, adding snapshots of them to your staging area
(index).
- Do a commit, which takes the files as they are in the staging area
and stores that snapshot permanently to your Git directory.
Initialize a git repository
- initialize git on your local computer
cd Desktop
mkdir testGit
cd testGit
git init
- initialize on GitHub and clone to your PC
- Will revist this method shortly
- Suggest to use this way to connect to GitHub
The repository
- The repository contains everyting about your project
- code
- data
- documentation
- .git, which is a hidden folder
- At any time,
- You can make some changes, take a snapshot of your project, and
commit to the repository (called a commit object)
- you can revisit the snapshot at any time
Make changes in the git repository
- Your changes: happen in your working directory (untracked)
- Index/Stage your changes:
git add XXX
- Commit the indexed/staged changes to the git repository
git commit -m "your message"
Example to make change in git repository (use R codes as
example)
WD <- '~/Desktop'
setwd(WD)
usethis::create_package("GatorPKG", open = FALSE) ## open = FALSE will prevent R open a new R studio session.
WD2 <- '~/Desktop/GatorPKG'
setwd(WD2)
- Put in f.R in the R folder
##' Add up two numbers (Description)
##'
##' We want to add up two numbers, blalala... (Details)
##' @title add two numbers
##' @param x first number
##' @param y second number
##' @return sum of two numbers
##' @author Caleb
##' @export
##' @examples
##' f(1,2)
f <- function(x, y) x + y
Example to make change in git repository (git part)
cd ~/Desktop/GatorPKG
git init
- Check the status of files in the working directory
git status
git add -A ## -A stage all your files
git status
git commit -m "first commit"
git status
git status
- Tells what Git thinks is going on
- Do this frequently!
- before staging your workspace
- after staging your workspace, before committing
- after committing
git add
git add newFile
git add newFile1 newFile2 newFile3
git add -A
git commit
- committing makes a snapshot of everything that has been staged in
your repository
- a short message is necessary
git commit -m "any message you want to make here"
- The message will helpful for future you and your teammates
- If you don’t type in a message, an editor (usually vi) will be open
for you to enter the message. Vi is hard to use for beginners. See https://kb.iu.edu/d/afcz
about how to quit vi, just in case.
- After you commit your changes to git, it create a commit object
- You can view all commit objects by
git log
- git log also allows you to revisit previous committed object
(snapshot)
git checkout
Go to previous snapshot
git checkout SHA1number
- SHA1 number: Secure Hash Algorithm 1
- full SHA1 number
- first 6 digits
More on git
- Git only saves the changes from the previous commit, won’t waste
space
- Make commits often
- Will revist more about git commands later
- branching: have multiple version of your software
- diff: compare two commits
- revert
So far, local version control
Distributed version control
Distributed version control
- GitHub
- GitHub provides free repositories, given that you make your project
open.
- GitHub provides inidivual accounts, and organizational
accounts.
- Academic users (you need to apply) can have free private
repositories.
- Usually 1Gb limit per repository.
- https://github.com
- When you prepare R packages for CRAN or bioconductor, you will need
GitHub
- BitBucket
- Similar to GitHub, but provide free private repositories.
Connect the local repository with GitHub
- Benefit
- You can work anywhere with any computer, as long as you can
pull/fetch your project from the remote.
- You and your teammates can work on the same project from the same
remote.
Set up GitHub and connect with your local repository (1)
- Create a new repository on GitHub (GatorPKG3).
Set up GitHub and connect with your local repository (2)
- clone to your local computer (e.g., desktop)
cd "~/Desktop"
git clone https://github.com/Caleb-Huo/GatorPKG3.git
Make sure to use this way to initialize your repository, in
order to connect to GitHub
Set up GitHub and connect with your local repository (3)
Put your code in the GatorPKG3 package
- You should have at least the following files or folders:
Set up GitHub and connect with your local repository (4)
- At the directory of your local repository (e.g.,
~/Desktop/GatorPKG3), do the following in the terminal
git add -A
git commit -m "this is a R pacakge"
git log
- Push your changes in local repository to GitHub
git push
- Refresh your GitHub page to see if the changes are made on
GitHub
Exercise: put the R package on GitHub
## setwd("to your package folder")
devtools::document()
- In Terminal, add, commit, and push back to GitHub
git add -A
git commit -m "1st R pacakge"
git push
git log
You will see a SHA1 number (Secure Hash Algorithm 1), this is the
access number of a commit object
- Install the package from GitHub in R
devtools::install_github("Caleb-Huo/GatorPKG3")
library(GatorPKG3)
f(1,2)
?f
Your turn
Goals:
- make an R package on your GitHub repository
- install this package by devtools::install_github
Add README.md on GitHub
- md represents mark down
- .md is very similar to .Rmd
- except it won’t evaluate R code
- Below is an toy example of README.md
# testGatorPKG
This is a fancy R package
## how to install the R package
devtools::install_github("Caleb-Huo/GatorPKG3")
## example
f(1,2)
- You can add it locally, and push back to GitHub
- Or you can directly edit on the remote, then fetch/pull to the local
repository
More on the remote
Two ways to get your PC updated with the latest remote repository
- If you already have a copy of the remote repository on your computer
- If you don’t have a copy of the remote repository on your computer
Keep updated with the remote
- Update your local repository
git fetch
git pull
Clone a existing remote repository to your local PC
git clone https://github.com/cran/mclust.git
After some changes of this package
If you are the owner/team member of the origin remote repository,
you are able to push back.
If you are not the owner, you won’t be able to push back
Fork a repository on your GitHub account
- Click on Fork, the repository will be copied to your GitHub
account
- Then you can do anything you want on the forked repository, under
their license
Other files in the repository
ls -a ## list all files, including hidden ones
Git whole picture
- Have talked about initialize, update, changes
- Will talk about branching before revert and
diff
Create a another branch
- By default, we are on the main branch
- Now we want to create some new features on the develop/feature
branch
- Create a new branch and switch to it:
git checkout -b <branchname>
- Switch from one branch to another:
git checkout <branchname>
- List all branches and tell where am I:
git branch
git branch -d <branchname>
- Push the branch to the remote
git push origin <branchname>
Git branches
- main
- hotfixes
- release branches
- develop
- feature branches
Set alias for github command
- my favorate colorful logs
git config --global alias.lg "log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --"
git lg
Exercise for creating a another branch
git checkout -b vignette
usethis::use_vignette("GatorPKG3")
git add -A
git commit -m "include vignette"
- Check all available branches (your current branch has a * on
it)
git branch
- After going back to main, the edits on the vignette branch does not
exist in main
git checkout main
Git revert
Go back to the previous snapshot by SHA1 number
git checkout -b old-state f0d8506
git branch
- To go back to where you were, just check out the branch you were on
again.
git checkout main
Delete commits
Don’t suggest to do so.
If you want to try, suggest to try this in an experimental
branch.
Revert committed files as staged files
- You still keep all the files
git reset --soft HEAD@{1}
git reset --soft deb4555
git reflog
Delete commits in a hard way
git reset --hard 2278f51
Be cautious about git reset –hard, though should be a way to
recover.
Compare with another commit object
- compare all changes in the working directory with the last
commit
git diff
- compare the change for a specific file in the working directory with
the last commit
git diff --base <filename>
git diff aSHA1 bSHA1
git diff <sourcebranch> <targetbranch>
Git merge (when there are no conflicts)
- keep merging history
- If you want to merge branch B (vignette) into branch A (main)
- Go to branch A (main)
- merge –no-ff
git checkout main
git merge --no-ff vignette
git log
Git merge (when there are conflicts)
- make some changes in the main’s branch
- make some changes (with conflicts) in the vignette’s branch
- notice that after you switch to the vignette’s branch, the previous
add/commit to the main’s branch disappear
git checkout main
git merge --no-ff vignette ## fail
git diff ## check the difference
## open the conflict file, resolve conflicts
git commit -a -m "conflict resolved"
Display html on github
- the html file won’t automatically show up in a webpage
- need the following trick
Summarize all important things about Git/GitHub
Reference:
- GitHub pro
- Summary of Git command
- Many contents are from the following resources:
Things you should have done, by the end of this week
- Install Python (>=3.10)
- Install Anaconda