Biostatistical Computing, PHC 6068

HiperGator

Zhiguang Huo (Caleb)

Monday September 23, 2018

Advance usage of HiperGator (Optional)

These usages can be applied to other Linux machines.

Rstudio on HiperGator

Review:

Preparation

How to login HiperGator (Windows)

How to login HiperGator (Mac, Linux)

Your HiperGator Home directory

Common linux commands:

Common linux commands:

FileZilla

FileZilla

You can transfer files between your local computer and hiperGator

Login node and working node

Login node and working node

Open an interactive session on HiperGator

## open interactive R session
srun --account=phc6068 --qos=phc6068 --ntasks=1 --cpus-per-task 1 --mem=8gb  --time=04:00:00 --pty bash -i

module load R ## load R

R

Open an interactive session on HiperGator

## do the following on hiperGator
getwd()
dir()

head(cars)
mycars <- cars
write.csv(mycars, "mycars.csv")
dir()

Submit a job (I)

  1. R script (saveCars.R): contains your R code
  2. SLURM job script (saveCars.slurm): coordinate your job with the server
  3. submit: sbatch the slurm file

Prepare R script (a simple one) (I)

WD <- "/ufrc/phc6068/share/zhuo/example/testR" ## change to your own directory
dir.create(WD, re=T) ## force to create this folder

setwd(WD) ## set to your own directory!
mycars <- mtcars
write.csv(mycars,"mycars.csv")

Prepare SLURM job script (I)

#!/bin/sh
#SBATCH --job-name=serial_job_test    # Job name
#SBATCH --account=phc6068             # your own sponser or account from this class
#SBATCH --qos=phc6068                 # your own sponser or account from this class
#SBATCH --mail-type=ALL               # Mail events
#SBATCH --mail-user=xx@xx.xx          # Where to send email 
#SBATCH --ntasks=1                    # Run on a single machine (node)
#SBATCH --cpus-per-task 1             # Run on a single CPU
#SBATCH --mem=8gb                     # Memory limit
#SBATCH --time=04:00:00               # Time: hrs:min:sec
#SBATCH --output=serial_test_%j.out   # Output and error log 

pwd; hostname; date 

module load R 

echo "Running save cars script on a single CPU core" 

R CMD BATCH saveCars.R ## make sure saveCars.R is at your current working directory
## R --no-save --quiet --slave < saveCars.R ## alternative way

date

submit the job (I)

cd /ufrc/phc6068/share/zhuo/example/testR
sbatch saveCars.slurm ## submit job
  1. Submit the slurm job (saveCars.slurm)
  2. The slurm job will submit the R job (saveCars.R)
  3. The R job will return the result

Check log file (I)

cd /ufrc/phc6068/share/zhuo/example/testR
cat serial_test_25280301.out ## you may have your own log file name
head serial_test_25280301.out ## you may have your own log file name
more serial_test_25280301.out ## you may have your own log file name
cd /ufrc/phc6068/share/zhuo/example/testR
cat saveCars.Rout
cd /ufrc/phc6068/share/zhuo/example/testR
cat mycars.csv

Exercise (I)

  1. Copy the saveCars.R and saveCars.slurm into your own working directory
  2. Submit the job
  3. Try to revise the saveCars.R
    • Change to your own working directory
    • Just try to output any other results, and save them.
  4. Try to revise the saveCars.slurm
    • Try to specify your email, revise time and memory
  5. Submit your job again
cp /ufrc/phc6068/share/zhuo/example/testR/saveCars.R 

Submit a job with external argument (II)

R script (with external arguments) (II)

args = commandArgs(trailingOnly = TRUE) ## pass in external argument

rowID <- args[1]
aarg <- as.numeric(rowID)

setwd("/ufrc/phc6068/share/zhuo/example/testR2")
mycars <- mtcars[aarg,]
filename <- paste0("arg",aarg,".csv")
write.csv(mycars,filename)

SLURM job script (II)

#!/bin/sh
#SBATCH --job-name=serial_job_test    # Job name
#SBATCH --account=phc6068             # your own sponser or account from this class
#SBATCH --qos=phc6068                 # your own sponser or account from this class
#SBATCH --mail-type=ALL               # Mail events
#SBATCH --mail-user=xx@xx.xx          # Where to send email 
#SBATCH --ntasks=1                    # Run on a single machine (node)
#SBATCH --cpus-per-task 1             # Run on a single CPU
#SBATCH --mem=8gb                    # Memory limit
#SBATCH --time=04:00:00               # Time: hrs:min:sec
#SBATCH --output=serial_test_%j.out   # Output and error log 

pwd; hostname; date 

module load R 

echo "Running save cars script on a single CPU core" 

R --no-save --quiet --slave --args 1 < saveCarsArgs.R 

date

submit the job (II)

cd /ufrc/phc6068/share/zhuo/example/testR2
sbatch saveCarsArgs.slurm ## submit job
  1. Submit the slurm job (saveCarsArgs.slurm)
  2. The slurm job will submit the R job (saveCarsArgs.R) with extra argument
  3. The R job will return the result

Check log file (II)

cd /ufrc/phc6068/share/zhuo/example/testR2
cat serial_test_25280860.out ## you may have your own log file name
cd /ufrc/phc6068/share/zhuo/example/testR2
cat arg1.csv

Submit a job with loops (III)

SLURM job script (III)

#!/bin/sh
#SBATCH --job-name=serial_job_test    # Job name
#SBATCH --account=phc6068             # your own sponser or account from this class
#SBATCH --qos=phc6068                 # your own sponser or account from this class
#SBATCH --mail-type=ALL               # Mail events
#SBATCH --mail-user=xx@xx.xx          # Where to send email 
#SBATCH --ntasks=1                    # Run on a single machine (node)
#SBATCH --cpus-per-task 1             # Run on a single CPU
#SBATCH --mem=8gb                    # Memory limit
#SBATCH --time=04:00:00               # Time: hrs:min:sec
#SBATCH --output=serial_test_%j.out   # Output and error log 

pwd; hostname; date 

module load R 

for i in {2..10}
do
echo "Running save cars" $i 
R --no-save --quiet --slave --args $i < saveCarsArgs.R 
done

date

submit the job (III)

cd /ufrc/phc6068/share/zhuo/example/testR3
sbatch saveCarsArgsLoops.slurm ## submit a loop job

Check job status

Burst mode

If all computing resources are occupied