Component of a R function

All R functions have three parts:
- body(), the code inside the function
- formals(), list of arguments which controls how you can call the function.
- environment(), the map of the location of the function’s variables.

cubic <- function(x){
  return(x^3)
} 
## x: function argument
## cubic: function name

body(cubic)

## {
##     return(x^3)
## }

formals(cubic)

## $x

environment(cubic)

## <environment: R_GlobalEnv>

Primitive functions

Primitive funcitons are exceptions to the rule.
Primitive funcitons, like sum(), call C code directly .Primitive() and contain no R code.
Primitive funcitons are only in the base package, they can be more efficient.

sum

## function (..., na.rm = FALSE)  .Primitive("sum")

body(sum)

## NULL

formals(sum)

## NULL

environment(sum)

## NULL

Example - Odds Ratio Function

Suppose we want to write a function that calculates the odds ratio \(\theta\) of a 2 x 2 contingency table and the asymptotic confidence interval for \(\theta\).

Sample odds ratio, \[\hat{\theta}=\frac{ad}{bc} \]
The asymptotic standard error for \(log(\hat{\theta})\) is \[SE(log\hat{\theta})=\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}} \]
The asymptotic 100(1-\(\alpha\))% confidence interval for \(log\hat{\theta}\) is, \[ log\hat{\theta} \pm z_{\alpha/2}SE(log\hat{\theta}) \]
Exponentiate the upper and lower bounds to get a confidence interval for \(\theta\).

Example - Odds Ratio Function

Consider the following data that describes the relationship between myocardial infarction and aspirin use (Agresti 1996).

X <- matrix(c(189, 104, 10845, 10933), nrow=2,
            dimnames=list(Treatment=c("Placebo","Aspirin"), 
                          "Myocardial Infarction"=c("Yes", "No")))
X

##          Myocardial Infarction
## Treatment Yes    No
##   Placebo 189 10845
##   Aspirin 104 10933

Returning Objects

Often we will want a function to return an object that can be assigned. The function for returning objects is return().
The return() function prints and returns its arguments.

odds.ratio0 <- function(X){
  result <- X[1,1]*X[2,2]/(X[1,2]*X[2,1])
  return(result)
}
odds.ratio0(X)

## [1] 1.832054

Returning Objects

If the end of a function is reached without calling return(), the value of the last evaluated expression is returned and outputted.
You can also omit {} if there is only one line for the body

odds.ratio1 <- function(X){
  X[1,1]*X[2,2]/(X[1,2]*X[2,1])
}
odds.ratio1(X)

## [1] 1.832054

odds.ratio2 <- function(X) ## if the function has only one line, don't even need brackets
  X[1,1]*X[2,2]/(X[1,2]*X[2,1])

odds.ratio2(X)

## [1] 1.832054

Return multiple variables

A list is often a good tool for returning multiple objects.

odds.ratio <- function(X, conf.level=0.95){
  OR <- X[1,1]*X[2,2]/(X[1,2]*X[2,1])
  logOR.SE <- sqrt(sum(1/X))
  
  alpha <- 1 - conf.level
  CI.lower <- exp(log(OR) - qnorm(1 - alpha/2)*logOR.SE)
  CI.upper <- exp(log(OR) + qnorm(1 - alpha/2)*logOR.SE)
  # qnorm returns the quantiles of a Gaussian distribution.
  
  out <- list(OR=OR, CI=c(CI.lower, CI.upper), conf.level=conf.level)
  return(out)
}
odds.ratio(X)

## $OR
## [1] 1.832054
## 
## $CI
## [1] 1.440042 2.330780
## 
## $conf.level
## [1] 0.95

odds.ratio(X)$OR

## [1] 1.832054

tracking progress of a function

The cat() and print() functions can be used to output results;
- the cat() function gives more control over the appearance of the output.

odds.ratio <- function(X, conf.level=0.95){
  cat("calculating odds ratio...","\n")
  OR <- X[1,1]*X[2,2]/(X[1,2]*X[2,1])
  logOR.SE <- sqrt(sum(1/X))
  
  cat("calculating confidence interval...","\n")
  alpha <- 1 - conf.level
  CI.lower <- exp(log(OR) - qnorm(1 - alpha/2)*logOR.SE)
  CI.upper <- exp(log(OR) + qnorm(1 - alpha/2)*logOR.SE)
  
  cat("done, returning output...","\n")
  out <- list(OR=OR, CI=c(CI.lower, CI.upper), conf.level=conf.level)
  return(out)
}
out <- odds.ratio(X)

## calculating odds ratio... 
## calculating confidence interval... 
## done, returning output...

Verifying Arguments

If you are writing a function for others to use it is often a good idea to include code that verifies that the appropriate arguments were entered.
If an argument value is not valid we want to stop executing the expression and return an error message.
- missing(): Can be used to test whether a value was specified as an argument to a function; returns TRUE if a value is not specified and FALSE if a value is specified.
- stop(): Stop execution of the current expression and prints an error.
- warning(): Generate a warning message.
- message(): Generate a diagnostic message.
- stopifnot(): If any of the arguments are not all TRUE then stop() is called and an error message is produced that indicates the first element of the argument list that is not TRUE.

examples for verifying arguments

odds.ratio <- function(X, conf.level=0.95){
  stopifnot(!missing(X), is.matrix(X),dim(X)==c(2,2),X>0)

  cat("calculating odds ratio...","\n")
  OR <- X[1,1]*X[2,2]/(X[1,2]*X[2,1])
  logOR.SE <- sqrt(sum(1/X))
  
  cat("calculating confidence interval...","\n")
  alpha <- 1 - conf.level
  CI.lower <- exp(log(OR) - qnorm(1 - alpha/2)*logOR.SE)
  CI.upper <- exp(log(OR) + qnorm(1 - alpha/2)*logOR.SE)
  
  cat("done, returning output...","\n")
  out <- list(OR=OR, CI=c(CI.lower, CI.upper), conf.level=conf.level)
  return(out)
}
out <- odds.ratio(X)

## calculating odds ratio... 
## calculating confidence interval... 
## done, returning output...

Try catch

tryCatch function allows your code to be excuted even error exists

# result = tryCatch({
#     expr
# }, warning = function(w) {
#     warning-handler-code
# }, error = function(e) {
#     error-handler-code
# }
# )

success

result = tryCatch({
  2^6
}, warning = function(w) {
  print(paste("MY_WARNING:  ",w))
  return("warning")
}, error = function(e) {
  print(paste("MY_ERROR:  ",e))
  return("error")
}
)
result

## [1] 64

warning

result = tryCatch({
  1:3 + 1:2
}, warning = function(w) {
  print(paste("MY_WARNING:  ",w))
  return("warning")
}, error = function(e) {
  print(paste("MY_ERROR:  ",e))
  return("error")
}
)

## [1] "MY_WARNING:   simpleWarning in 1:3 + 1:2: longer object length is not a multiple of shorter object length\n"

result

## [1] "warning"

error

result = tryCatch({
  "A" + "B"
}, warning = function(w) {
  print(paste("MY_WARNING:  ",w))
  return("warning")
}, error = function(e) {
  print(paste("MY_ERROR:  ",e))
  return("error")
}
)

## [1] "MY_ERROR:   Error in \"A\" + \"B\": non-numeric argument to binary operator\n"

result

## [1] "error"

Lexical scoping for R functions

Scoping is the set of rules that govern how R looks up the value of a symbol.

Name masking
fresh start
dynamic lookup

Name masking

If a function doesn’t find a variable/function inside a function, it will search one level up

x <- 1
h <- function(){
  y <- 2
  i <- function(){
    z <- 3
    c(x, y, z)
  }
  i()
}

h()

## [1] 1 2 3

A fresh start

j <- function(){
  if(!exists("a")){
    a <- 1
  } else {
    a <- a + 1
  }
  print(a)
}
j()

## [1] 1

j()

## [1] 1

Dynamic lookup

x <- 15
f <- function() x
f()

## [1] 15

x <- 20
f()

## [1] 20

Every operation is a function call

infix operators: +, -, *, /

x <- 10; y <- 5; 
x + y

## [1] 15

'+'(x, y)

## [1] 15

Control flow operators: for, if while

for(i in 1:2) cat(i,' ')

## 1  2

'for'(i, 1:2, cat(i,' '))

## 1  2

Subsetting operator

x <- 1:10; x[3]

## [1] 3

'['(x,3)

## [1] 3

Other infix functions

## %%: Remainder operator
5 %% 3

## [1] 2

## %/%: Integer division
5 %/% 3

## [1] 1

## %*% Matrix multiplication
## to be discussed next week.

## %o% Outer product
1:2 %o% 1:3

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    2    4    6

## %in% Match operator
c(1,2,3,4,5,6) %in% c(2,6,8)

## [1] FALSE  TRUE FALSE FALSE FALSE  TRUE

Function arguments – default value

default argument of a function can be defined in the function formals

f <- function(arg, brg=10) 
  list(arg = arg, brg = brg)
str(f(arg=2, brg=10))

## List of 2
##  $ arg: num 2
##  $ brg: num 10

str(f(arg=2))

## List of 2
##  $ arg: num 2
##  $ brg: num 10

revisit = and <-
- = can be used for assigning value and specifying function argument
- <- can only be used for assigning value.

Function arguments

How to specify function arguments
- complete name
- partial name.
- by position
Arguments match order
1. first by exact name (perfect matching),

f <- function(arg, brg=10) 
  list(arg = arg, brg = brg)
str(f(arg=2, brg=10))

## List of 2
##  $ arg: num 2
##  $ brg: num 10

2. then by prefix matching,

f <- function(arg, brg=10) 
  list(arg = arg, brg = brg)
str(f(b=10, ar=2))

## List of 2
##  $ arg: num 2
##  $ brg: num 10

3. finally by position matching

f <- function(arg, brg=10) 
  list(a = arg, b = brg)
str(f(2, 10))

## List of 2
##  $ a: num 2
##  $ b: num 10

function calling habits

good calls

mean(1:10); mean(1:10, trim = 0.05)

## [1] 5.5

## [1] 5.5

overkill

mean(x=1:10)

## [1] 5.5

confusing

mean(1:10, n = T); mean(1:10, , FALSE); mean(1:10, 0.05); mean(, TRUE, x = c(1:10, NA))

## [1] 5.5

## [1] 5.5

## [1] 5.5

## [1] 5.5

Extra argument: …

…: allows variability in the arguments given to R functions
First check the function argument of plot()
We will create a new function which will pass extra argument to another function

myPlot <- function(x,y,...){
  plot(x,y,...)
}
myPlot(1:4,1:4,col=1:4)

calling a function given a list of arguments

If you have a list of function arguments:
- args <- list(1:10, na.rm=TRUE)
How could you then send that list to mean()?
- Need do.call()

do.call(mean, list(1:10, na.rm=TRUE))

## [1] 5.5

mean(1:10, na.rm=TRUE)

## [1] 5.5

R path function

getwd(): get current working directory

getwd()

## [1] "/Users/zhuo/Dropbox (UFL)/teaching/2018FALL/lectures/Week2_basic/functions"

setwd(): set current working directly

WD0 <- getwd()
setwd("~") ## ~ represent user home directly 
getwd()

## [1] "/Users/zhuo"

setwd(WD0)
getwd()

## [1] "/Users/zhuo/Dropbox (UFL)/teaching/2018FALL/lectures/Week2_basic/functions"

Directory information

dir(): show all files available under current working directory.

dir()

## [1] "functions_files" "functions.html"  "functions.rmd"   "tmp"

dir("~")

##  [1] "Applications"                        
##  [2] "Desktop"                             
##  [3] "Documents"                           
##  [4] "Downloads"                           
##  [5] "Dropbox"                             
##  [6] "Dropbox (Personal)"                  
##  [7] "Dropbox (UFL)"                       
##  [8] "httpsanalysis.ingenuity.compa_206442"
##  [9] "Library"                             
## [10] "Movies"                              
## [11] "Music"                               
## [12] "Pictures"                            
## [13] "Public"

file.exist(): check a certain filename already

file.exists("result.Rdata")

## [1] FALSE

file.exists("amatrix.csv")

## [1] FALSE

dir.create(): create a new folder

dir.create("tmp")

## Warning in dir.create("tmp"): 'tmp' already exists

dir.create("tmp") # there is a warning message if the folder already exist.

## Warning in dir.create("tmp"): 'tmp' already exists

dir.exists(): chech if a directory already exists.

Time function - Keep track the time of your progress

Sys.time()

start <- Sys.time()
sum(1:1e4)

## [1] 50005000

end <- Sys.time()
end - start

## Time difference of 0.001954079 secs

start <- Sys.time()
result <- 0
for(i in 1:1e4){
  result <- result + i
}
result

## [1] 50005000

end <- Sys.time()
end - start

## Time difference of 0.008316994 secs

Lazy Evaluation

R performs lazy evaluation of function arguments. The arguments are not evaluated until they are required.
- Lazy evaluation can save time and memory if the arguments are not needed.
- The function force() forces the evaluation of a function argument.
Consider the function

f <- function(x, y=x){
  x <- x + 1
  return(y)
}
f(1)

## [1] 2

Since there is no argument for y, R evaluates the default value of y only when it is needed.

f <- function(x, y=x){
  force(y) # force to evaluate y.
  x <- x + 1
  return(y)
}
f(1)

## [1] 1

Anonymous functions

You use an anonymous function when it’s not worth the effort to give it a name
Like all functions in R, anonymous functions have formals(), body() and a parent environment()

function(x=4) g(x) + h(x)

## function(x=4) g(x) + h(x)

formals(function(x=4) g(x) + h(x))

## $x
## [1] 4

body(function(x=4) g(x) + h(x))

## g(x) + h(x)

environment(function(x=4) g(x) + h(x))

## <environment: R_GlobalEnv>

Call functions from another library

You can use functions from other packages by library

library(survival)

## seek for help about the survival package
help(package=survival)

## get help for a specific function
?coxph

## head(coxph)
## args(coxph)
## coxph


detach("package:survival", unload=TRUE) ## remove survival package from R loaded environment

Environments

Only introduce the basic structure of R environments.
Refer to Chapter 8 of Advanced R book for details. http://adv-r.had.co.nz/Environments.html

Basic Environments

The job of an environment is to associate, or bind, a set of names to a set of values.
You can think of an environment as a bag of names.

e <- new.env()
e$a <- FALSE
e$b <- "a"
e$c <- 2:3
e$d <- 1:3

Basic environments

Generally, an environment is similar to a list, with following important exceptions.

Every object in an environment has a unique name.
The objects in an environment are not ordered.
An environment has a parent.

There are four special environments:

The globalenv(), or global environemnt, is the interactive workspace which you normally work on. The parent of the global environment is the last package you attached.
The baseenv(), or base environment, is the environment of the base package. Its parent is the empty environment.
emptyenv(), or empty environment, is the ultimate ancestor of all environments, and the only environment without a parent.
environment() is the current environment.

Environment search path

search() ## lists all parents of the global environment.

## [1] ".GlobalEnv"        "package:stats"     "package:graphics" 
## [4] "package:grDevices" "package:utils"     "package:datasets" 
## [7] "package:methods"   "Autoloads"         "package:base"

parent.env(globalenv())

## <environment: package:stats>
## attr(,"name")
## [1] "package:stats"
## attr(,"path")
## [1] "/Library/Frameworks/R.framework/Versions/3.5/Resources/library/stats"

Load a new package

library(mclust)

## Package 'mclust' version 5.4.1
## Type 'citation("mclust")' for citing this R package in publications.

search() ## lists all parents of the global environment.

##  [1] ".GlobalEnv"        "package:mclust"    "package:stats"    
##  [4] "package:graphics"  "package:grDevices" "package:utils"    
##  [7] "package:datasets"  "package:methods"   "Autoloads"        
## [10] "package:base"

Load a new package 2

environment(combn)

## <environment: namespace:utils>

library(combinat)

## 
## Attaching package: 'combinat'

## The following object is masked from 'package:utils':
## 
##     combn

search() ## lists all parents of the global environment.

##  [1] ".GlobalEnv"        "package:combinat"  "package:mclust"   
##  [4] "package:stats"     "package:graphics"  "package:grDevices"
##  [7] "package:utils"     "package:datasets"  "package:methods"  
## [10] "Autoloads"         "package:base"

environment(combn)

## <environment: namespace:combinat>

environment(utils::combn)

## <environment: namespace:utils>

environment(combinat::combn)

## <environment: namespace:combinat>

Use :: to specify namespace

An experiment

x <- matrix(1:4,nrow=2,ncol=2)
nrow

## function (x) 
## dim(x)[1L]
## <bytecode: 0x7fcd37861258>
## <environment: namespace:base>

nrow(x)

## [1] 2

dim(x)

## [1] 2 2

dim <- function() c(1,1)
## guess what is nrow(x)

nrow(x)

## [1] 2

environment(nrow)

## <environment: namespace:base>

environment(dim)

## <environment: R_GlobalEnv>

environment(base::nrow)

## <environment: namespace:base>

Biostatistical Computing, PHC 6068

R function

Why do we need functions

Component of a R function

Primitive functions

Example - Odds Ratio Function

Example - Odds Ratio Function

Returning Objects

Returning Objects

Return multiple variables

tracking progress of a function

Verifying Arguments

examples for verifying arguments

Try catch

Lexical scoping for R functions

Name masking

A fresh start

Dynamic lookup

Every operation is a function call

Other infix functions

Function arguments – default value

Function arguments

function calling habits

Extra argument: …

calling a function given a list of arguments

R path function

Directory information

Time function - Keep track the time of your progress

Lazy Evaluation

Anonymous functions

Call functions from another library

Environments

Basic Environments

Basic environments

Environment search path

Load a new package

Load a new package 2

An experiment

References

Other exercises and readings