Biostatistical Computing, PHC 6068

Simulation study

Zhiguang Huo (Caleb)

Monday November 8, 2021

Outlines

What is a simulation study?

Simulation (a.k.a Monte Carlo simulation):

Why we need simulation studies

Common questions simulation study can answer

How to approximate

Typical Monte Carlo simulation involves the following procedures:

  1. Generate \(B\) independent datasets \((1, 2, \ldots, B)\) under conditions of interest.
  2. Compute the numerical value of an estimator/test statistic \(T\) from each dataset. Then we have \(T_1, T_2, \ldots, T_b, \ldots, T_B\).
  3. If \(B\) is large enough, summary statistics across \(T_1, \ldots, T_B\) should be good approximations to the true properties of the estimator/test statistic.

Illustration:

\(X \sim f(\theta)\), \(T = T(X)\), where \(T\) is an estimator, or a procedure

Example 1, Compare estimators (normal distribution)

Note: 20% trimmed mean indicates average value after deleting first 10% quantile and last 10% quantile.

E.g. 1,2,…,10

Evaluation criteria

k: estimator index, including: mean, median, mean20%

\(T^{mean}\)

\[T^{mean} = \frac{1}{n}\sum_{i=1}^n X_i\]

Can be calculated analytically

\(T^{median}\)

\[T^{median} = Median_{i=1}^n X_i\]

Variance is hard to be calculated analytically

\(T^{mean20\%}\)

Can be calculated

\[T^{mean20\%} = \frac{1}{0.8n}\sum_{i=0.1n+1}^{0.9n} X_i\]

Variance is hard to be calculated analytically

Evaluation via simulation

No analytic solution, we can evaluate using simulation studies.

Simulation experiment

mu <- 1; B <- 1000; n <- 20

Ts <- matrix(0,nrow=B,ncol=3)
colnames(Ts) <- c("mean", "median", "mean 20% trim")


for(b in 1:B){
  set.seed(b)
  x <- rnorm(n, mean = mu)
  
  Ts[b,1] <- mean(x)
  Ts[b,2] <- median(x)
  Ts[b,3] <- mean(x, trim = 0.1)
}

bias <- abs(apply(Ts,2,function(x) mean(x) - mu))
vars <- apply(Ts,2,var)

MSE <- bias^2 + vars
RE <- MSE[1]/MSE

comparisonTable <- rbind(bias, vars, MSE, RE)
colnames(comparisonTable) <- c("mean", "median", "mean 20% trim")
knitr::kable(comparisonTable)
mean median mean 20% trim
bias 0.0000383 0.0066731 0.0006889
vars 0.0517483 0.0746801 0.0545505
MSE 0.0517483 0.0747247 0.0545509
RE 1.0000000 0.6925196 0.9486234

Difference between simulation and real data application

Example 2, Compare estimators (exponential distribution)

Simulation experiment

mu <- 1; B <- 1000; n <- 20

Ts <- matrix(0,nrow=B,ncol=3)
colnames(Ts) <- c("mean", "median", "mean 20% trim")

for(b in 1:B){
  set.seed(b)
  x <- rexp(n, rate = mu)
  
  Ts[b,1] <- mean(x)
  Ts[b,2] <- median(x)
  Ts[b,3] <- mean(x, trim = 0.1)
}

bias <- abs(apply(Ts,2,function(x) mean(x) - mu))
vars <- apply(Ts,2,var)

MSE <- bias^2 + vars
RE <- MSE[1]/MSE

comparisonTable <- rbind(bias, vars, MSE, RE)
colnames(comparisonTable) <- c("mean", "median", "mean 20% trim")
knitr::kable(comparisonTable)
mean median mean 20% trim
bias 0.0104002 0.2653976 0.1324314
vars 0.0502112 0.0505222 0.0424702
MSE 0.0503194 0.1209581 0.0600083
RE 1.0000000 0.4160068 0.8385403

Example 3, Compare estimators (Cauchy distribution)

Simulation experiment

mu <- 0; B <- 1000; n <- 20

Ts <- matrix(0,nrow=B,ncol=3)
colnames(Ts) <- c("mean", "median", "mean 20% trim")

for(b in 1:B){
  set.seed(b)
  x <- rcauchy(n, location  = mu)
  
  Ts[b,1] <- mean(x)
  Ts[b,2] <- median(x)
  Ts[b,3] <- mean(x, trim = 0.1)
}

bias <- abs(apply(Ts,2,function(x) mean(x) - mu))
vars <- apply(Ts,2,var)

MSE <- bias^2 + vars
RE <- MSE[1]/MSE

comparisonTable <- rbind(bias, vars, MSE, RE)
colnames(comparisonTable) <- c("mean", "median", "mean 20% trim")
knitr::kable(comparisonTable)
mean median mean 20% trim
bias 0.1187952 0.0058119 0.0029693
vars 742.5037597 0.1467307 0.3738797
MSE 742.5178721 0.1467645 0.3738885
RE 1.0000000 5059.2468890 1985.9341052

UMVUE

\[\hat{\beta}^{ols} = \arg \min_\beta \frac{1}{2}\|y - X\beta\|^2_2\]

\[\hat{\beta}^{lasso} = \arg \min_\beta \frac{1}{2} \|y - X\beta\|^2_2 + \lambda \|\beta\|_1\]

Example 4, evaluate bias, variance and MSE of OLS, lasso

Model setup:

Goal:

For simulation of OLS (ordinary least square)

For simulation of lasso

Simulation experiment

library(lars)
## Loaded lars 1.2
beta0 <- 1; beta1 <- 3; beta2 <- 5
beta3 <- beta4 <- beta5 <- 0
beta <- c(beta0,beta1,beta2,beta3,beta4,beta5)
p <- 5; n <- 30
set.seed(32611)
X0 <- matrix(rnorm(p*n),nrow=n)
X <- cbind(1, X0) ## X fixed

Y0 <- X %*% as.matrix(beta, ncol=1)

B <- 1000
beta_ols <- replicate(n = B, expr = list())
beta_lasso <- replicate(n = B, expr = list())

for(b in 1:B){
  set.seed(b)

  error <- rnorm(length(Y0), sd = 3)
  Y <- Y0 + error
  
  abeta_ols <- lm(Y~X0)$coefficients
  beta_ols[[b]] <- abeta_ols
  
  alars <- lars(x = X, y = Y, normalize = FALSE, intercept=F)
  beta_lasso[[b]] <- alars
}

Simulation experiment (continue 2)

beta_ols_hat <- Reduce("+",beta_ols)/length(beta_ols)
beta_ols_bias <- sqrt(sum((beta_ols_hat - beta)^2))
beta_ols_variance <-  sum(Reduce("+", lapply(beta_ols,function(x) (x - beta_ols_hat)^2))/(length(beta_ols) - 1))
beta_ols_MSE <- beta_ols_bias + beta_ols_variance

as <- seq(0,30,0.1)
lassoCoef <- lapply(beta_lasso, function(alars){
  alassoCoef <- t(coef(alars, s=as, mode="lambda"))
  alassoCoef
} )

beta_lasso_hat <- Reduce("+",lassoCoef)/length(lassoCoef)
beta_lasso_bias <- sqrt(colSums((beta_lasso_hat - beta)^2))
beta_lasso_variance <-  colSums(Reduce("+", lapply(lassoCoef,function(x) (x - beta_lasso_hat)^2))/(length(beta_ols) - 1))
beta_lasso_MSE <- beta_lasso_bias + beta_lasso_variance

Simulation experiment (continue 3)

par(mfrow=c(2,2))
plot(x = as, y = beta_lasso_bias, type = "l", xlab="lambda", main="bias")
abline(h = beta_ols_bias, lty=2)
legend("topleft", legend = c("lasso", "OLS"), col = 1, lty = c(1,2))

plot(x = as, y = beta_lasso_variance, col=4, type = "l", xlab="lambda", main="variance")
abline(h = beta_ols_variance, lty=2, col=4)
legend("bottomleft", legend = c("lasso", "OLS"), col = 4, lty = c(1,2))

plot(x = as, y = beta_lasso_MSE, col=2, type = "l", xlab="lambda", main="MSE")
abline(h = beta_ols_MSE, lty=2, col=2)
legend("bottomright", legend = c("lasso", "OLS"), col = 2, lty = c(1,2))


plot(x = as, y = beta_lasso_MSE, ylim = c(min(beta_lasso_bias), max(beta_lasso_MSE)), 
     type="n", xlab="lambda", main="lasso MSE = bias^2 + variance")
lines(x = as, y = beta_lasso_MSE, type = "l", col=2)
lines(x = as, y = beta_lasso_variance, col=4)
lines(x = as, y = beta_lasso_bias, col=1)
legend("bottomright", legend = c("bias", "variance", "MSE"), col = c(1,4,2), lty = c(1))

Tuning parameter selection for lasso?

  1. Split the data to part A and part B.
  2. Using part A, find the tuning parameter such that the MSE is minimized.
  3. Evaluate the performance (with the tuning parameter in [2]) using part B.

Hypothesis testing (one-sided)

Hypothesis testing (two-sided)

Size and Power

Does a hypothesis testing procedure attain the nominal (advertised) size?

Consider the following tests:

Simulation for sizes of hypothesis tests

Consider using the following simulation setting

Example 5, t.test() size

alpha <- 0.05; mu=0; n <- 20; B <- 1000

counts <- 0
for(b in 1:B){
  set.seed(b)
  
  x <- rnorm(n, mean=mu)
  atest <- t.test(x)
  
  if(atest$p.value < alpha){
    counts <- counts + 1
  }
}

counts/B
## [1] 0.05

Example 6, wilcox.test() size

alpha <- 0.05; mu=0; n <- 20; B <- 1000

counts <- 0
for(b in 1:B){
  set.seed(b)
  
  x <- rnorm(n, mean=mu)
  atest <- wilcox.test(x)
  
  if(atest$p.value < alpha){
    counts <- counts + 1
  }
}

counts/B
## [1] 0.047

Example 7, t.test() only use first 10 samples size

alpha <- 0.05; mu=0; n <- 20; B <- 1000

counts <- 0
for(b in 1:B){
  set.seed(b)
  
  x <- rnorm(n, mean=mu)[1:10]
  atest <- t.test(x)
  
  if(atest$p.value < alpha){
    counts <- counts + 1
  }
}

counts/B
## [1] 0.056

Compare these test procedures

Example 8, t.test() power

alpha <- 0.05; mu=0; mu1 <- 1; n <- 20; B <- 1000

counts <- 0
for(b in 1:B){
  set.seed(b)
  
  x <- rnorm(n, mean=mu1)
  atest <- t.test(x)
  
  if(atest$p.value < alpha){
    counts <- counts + 1
  }
}
counts / B
## [1] 0.985

Example 8, Validate the result using R function

library(pwr)
alpha <- 0.05; mu=0; mu1 <- 1; n <- 20; B <- 1000
pwr.t.test(n = n, d = mu1, sig.level = alpha, type = "one.sample", alternative = "two.sided")
## 
##      One-sample t test power calculation 
## 
##               n = 20
##               d = 1
##       sig.level = 0.05
##           power = 0.9885913
##     alternative = two.sided

Example 9, wilcox.test() power

alpha <- 0.05; mu=0; mu1 <- 1; n <- 20; B <- 1000

counts <- 0
for(b in 1:B){
  set.seed(b)
  
  x <- rnorm(n, mean=mu1)
  atest <- wilcox.test(x)
  
  if(atest$p.value < alpha){
    counts <- counts + 1
  }
}
counts / B
## [1] 0.979

Example 10, t.test() only use first 10 samples power

alpha <- 0.05; mu=0; mu1 <- 1; n <- 20; B <- 1000

counts <- 0
for(b in 1:B){
  set.seed(b)
  
  x <- rnorm(n, mean=mu1)[1:10]
  atest <- t.test(x)
  
  if(atest$p.value < alpha){
    counts <- counts + 1
  }
}
counts / B
## [1] 0.805

In general, the larger sample sizes, the larger power.

Example 10, Validate the result using R function

alpha <- 0.05; mu=0; mu1 <- 1; n <- 20; B <- 1000
pwr.t.test(n = 10, d = mu1, sig.level = alpha, type = "one.sample", alternative = "two.sided")
## 
##      One-sample t test power calculation 
## 
##               n = 10
##               d = 1
##       sig.level = 0.05
##           power = 0.8030969
##     alternative = two.sided

Confidence interval

A \(1 - \alpha\) confidence interval for parameter \(\mu\) is an interval \(C_n = (a,b)\) where \(a = a(X_1, \ldots, X_n)\) and \(b = b(X_1, \ldots, X_n)\) are functions of the data such that \[P(\mu \in C_n) \ge 1 - \alpha\] In other words, \((a,b)\) traps \(\mu\) with probability \(1 - \alpha\). We call \(1 - \alpha\) the coverage of the confidence interval.

Interpretation:

Example 11, Confidence interval

mu <- 1
B <- 1000
n <- 10

x <- rnorm(n, mean=mu)
c(mean(x) - qt(0.975, n-1) * sd(x) / sqrt(n), mean(x) + qt(0.975, n-1) * sd(x) / sqrt(n))
## [1] 0.400803 2.219041
t.test(x)
## 
##  One Sample t-test
## 
## data:  x
## t = 3.2595, df = 9, p-value = 0.009847
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.400803 2.219041
## sample estimates:
## mean of x 
##  1.309922
counts <- 0
for(b in 1:B){
  set.seed(b)

  x <- rnorm(n, mean=mu)
  atest <- t.test(x)
  
  if(atest$conf.int[1] < mu & mu < atest$conf.int[2]){
    counts <- counts + 1
  }
}

counts/B
## [1] 0.944
Repeats <- 100
plot(c(-2,4),c(0,Repeats), type="n", xlab="95% CI", ylab="repeats index")
abline(v = mu, col=2)

counts <- 0
for(r in 1:Repeats){
  set.seed(r)
  x <- rnorm(n, mean=mu)
  atest <- t.test(x)
  
  segments(atest$conf.int[1], r, atest$conf.int[2], r)
  if(atest$conf.int[1] < mu & mu < atest$conf.int[2]){
    counts <- counts + 1
  }
}

counts/Repeats
## [1] 0.97

Example 12, Confidence interval (using normal approximation, n=100)

set.seed(32611)
B <- 1000 ## number of test
counts <- 0 ## number of accepted tests
n <- 100
mu <- 1
alpha <- 0.05
for(b in 1:B){
  x <- rnorm(n, mean=mu, sd = 1)
  lb <- mean(x) - qnorm(alpha/2, lower.tail = F) * sd(x)/sqrt(n)
  ub <- mean(x) + qnorm(alpha/2, lower.tail = F) * sd(x)/sqrt(n)
  if(lb < mu & ub > mu){
    counts <- counts + 1
  }
}
print(counts/B)
## [1] 0.945

Example 12, Confidence interval (using normal approximation, n=10)

set.seed(32611)
B <- 1000 ## number of test
counts <- 0 ## number of accepted tests
n <- 10
mu <- 1
alpha <- 0.05
for(b in 1:B){
  x <- rnorm(n, mean=mu, sd = 1)
  lb <- mean(x) - qnorm(alpha/2, lower.tail = F) * sd(x)/sqrt(n)
  ub <- mean(x) + qnorm(alpha/2, lower.tail = F) * sd(x)/sqrt(n)
  if(lb < mu & ub > mu){
    counts <- counts + 1
  }
}
print(counts/B)
## [1] 0.909

Multiple testing

\(\begin{aligned} P(\mbox{at least one significant result}) &= 1 - P(\mbox{no significant result}) \\ & = 1 - (1 - 0.05)^{20} \\ & \approx 0.64 \\ \end{aligned}\)

Bonferroni correction

\(\begin{aligned} P(\mbox{reject at least one test}|H_0) &= 1 - P(\mbox{reject nothing}|H_0) \\ & = 1 - (1 - 0.0025)^{20} \\ & \approx 0.0488 \\ \end{aligned}\)

Simulation experiment

Under the null hypothesis, \(X \sim N(0,1)\).

B <- 1000
n <- 20 ## number of tests
N <- 15 ## number of samples per test
alpha <- 0.05

count <- 0
for(b in 1:B){
  set.seed(b)
  ax <- replicate(n, rnorm(N),simplify = F)
  ap <- sapply(ax,function(avec) t.test(avec)$p.value)
  
  if(any(ap < alpha/n)){
    count <- count + 1
  }
}
count/B
## [1] 0.045

False discovery rate

Q-value

Simulation setting

B1 <- 9000 ## from null
B2 <- 1000 ## from alternative
B <- B1 + B2
n <- 10

SigLabel <- c(rep(F,  B1), rep(T, B2))

pval <- numeric(B)

for(b in 1:B){
    set.seed(b)
    if(b <= B1){
    x <- rnorm(n) ## null
  } else {
    x <- rnorm(n, mean = 2) ## alternative
  }
  
  pval[b] <- t.test(x=x)$p.value ## get raw p-value
}

False discovery rate example

qvalue <- p.adjust(pval, "BH")
test_qvalue <- qvalue < 0.05 ## set a nominal FDR
table_qvalue <- table(SigLabel, test_qvalue)
table_qvalue
##         test_qvalue
## SigLabel FALSE TRUE
##    FALSE  8951   49
##    TRUE     27  973
fdr_qvalue <-  sum(test_qvalue & !SigLabel) / sum(test_qvalue)
fdr_qvalue ## this is the actual FDR
## [1] 0.04794521

Comparison - raw p-value

test_raw <- pval < 0.05
table_raw <- table(SigLabel, test_raw)
table_raw
##         test_raw
## SigLabel FALSE TRUE
##    FALSE  8562  438
##    TRUE      0 1000
fdr_raw <-  sum(test_raw & !SigLabel) / sum(test_raw)
fdr_raw ## fdr = 30%, though reject 1438 tests, too loose
## [1] 0.3045897

Comparison - bonferroni correction

test_bonferroni <- pval < 0.05/length(pval)
table_bonferroni <- table(SigLabel, test_bonferroni)
table_bonferroni
##         test_bonferroni
## SigLabel FALSE TRUE
##    FALSE  9000    0
##    TRUE    862  138
fdr_bonferroni <-  sum(test_bonferroni & !SigLabel) / sum(test_bonferroni)
fdr_bonferroni ## though fdr = 0, but only reject 138 tests, too conservative
## [1] 0

Principle for simulation study

  1. A Monte Carlo simulation is just like any other experiment
    • Each combination of factors is a separate simulation, so that many factors can lead to very large number of combinations and thus number of simulations (time consuming)
    • Results must be recorded and saved in a systematic way
    • Sample size B (number of data sets) must deliver acceptable precision.
  2. Save everything!
    • Save the individual estimates in a file and then analyze
    • as opposed to computing these summaries and saving only them
    • Critical if the simulation takes a long time to run!
  3. Keep B small at first
    • Test and refine code until you are sure everything is working correctly before carrying out final production runs
    • Get an idea of how long it takes to process one data set
  4. Set a different seed for each run and keep records
    • Ensure simulation runs are independent
    • Runs may be replicated if necessary
  5. Document your code

Reference