Biostatistical Computing, PHC 6068

Monte Carlo methods

Zhiguang Huo (Caleb)

Wednesday Nov 20, 2019

Sampling method

Monte Carlo methods.
- Direct sampling.
- Rejection sampling.
- Importance sampling.
- Sampling-importance resampling.
Markov chain Monte Carlo (MCMC).
- Metropolis-Hasting.
- Gibbs sampling.

Notation

Denote \(p\), \(q\) as unnormalized density function.
- E.g. \(p(x) = x(1 - x)\), \(x \in (0, 1)\).
Denote \(p^*\), \(q^*\) as normalized density function.
- E.g. \(p^*(x) = \frac{p(x)}{\int_x p(x) dx} = 6x(1 - x)\), \(x \in (0, 1)\).
- E.g. \(q^*(x) = dnorm(x,0,1) = \frac{1}{\sqrt{2\pi}} \exp(-\frac{x^2}{2})\).

Our targets

Target 1, To generate Monte Carlo samples \(x_m\) from a given probability distribution \(p^*(x)\) or \(p(x)\).
Target 2, To estimate expectations of functions under this distribution, for example \[\mathbb{E}(g (x) | p^*(x)) = \int g(x) p^*(x) dx,\]

Examples:

\(\mathbb{E} (x | p^*(x)) \doteq A\)
\(\mathbb{V}\mbox{ar} (x | p^*(x)) = \mathbb{E} [(x - A)^2| p^*(x)]\)

A simulation approach

Problem: We want to estimate \[\mathbb{E}(g (x) | p^*(x)) = \int g(x) p^*(x) dx,\] Given distribution \(p^*(x)\).

Examples: \(\mathbb{E} (x | p^*(x))\) or \(\mathbb{V}\mbox{ar} (x | p^*(x))\)

To generate samples \(x_m\) from a given probability distribution \(p^*(x)\).
- sample using R: (e.g. rnorm).
- CDF transformation: sample from UNIF(0,1). Then use inverse CDF transformation.
To estimate expectations of functions under this emperical distribution \[\mathbb{E}(g (x) | p^*(x)) \approx \frac{1}{M} \sum_{m=1}^M g(x^{(m)})\]
- \(M\) is number of Monte Carlo samples.
- Via law of large number, this is valid.

Un-normalized density function problems

Unnormalized density function: \(p(x) = \exp [ 0.4(x-0.4)^2 - 0.08x^4 ]\)
- \(x \in (-4, 4)\)

p <- function(x, a=.4, b=.08){
  res <- exp(a*(x-a)^2 - b*x^4)
  res[res < -4 | res > 4] <- 0
  res
}
x <- seq(-4, 4, 0.01)
plot(x,p(x),type="l",  main = expression(p(x) == exp (0.4(x-0.4)^{2}  - 0.08 * x^{4})))

Hard to compute the normalization factor \(Z\).

integrate(f = p, lower = -4, upper = 4)

## 7.852178 with absolute error < 9.1e-06

Even if we know \(Z\), it is still challenge to draw samples.
Direct solution: partition the distribution into bins and direct sampling.

Direct Sampling

Partition the distribution into bins and perform direct sampling.

x <- seq(-4, 4, 0.01)
plot(x,p(x),type="l",  main = expression(p(x) == exp (0.4(x-0.4)^{2}  - 0.08 * x^{4})))

x2 <- seq(-4, 4, 0.1)
plot(x,p(x),type="n",  main = expression(p(x) == exp (0.4(x-0.4)^{2}  - 0.08 * x^{4})))
segments(x2,0,x2,p(x2))

Diffculty
- Higher-dimensional: say \(p = 5\)
- For each dimension, we divide the domain equally into \(50\) bins
- There will be \(50^{5}\) sampling space, huge!

Rejection sampling

\(p^*(x)\) is difficult to directly draw samples, but \(p(x)\) is easy to evaluate..

(e.g. \(p(x) = \exp [ 0.4(x-0.4)^2 - 0.08x^4 ]\))

Sample from a simpler distribution \(q^*(x)\).
Rejection sampling algorithm:

\(x \sim q^*(x)\)
accept \(x\) with prob \(p(x)/c q^*(x)\):
- Sample \(u \sim \mbox{UNIF}(0,1)\), accept if \(p(x)/c q^*(x) > u\).
Repeat Step 1 and 2 many times.

Rejection sampling

x <- seq(-4, 4, 0.01)
plot(x,p(x),type="l",  main = expression(p(x) == exp (0.4(x-0.4)^{2}  - 0.08 * x^{4})))

x <- seq(-4, 4, 0.01)
qstar <- function(x, C = 30){
  C*dnorm(x,sd = 3) 
}
plot(x,p(x),type="l", ylim = c(0,5))
curve(qstar,add = T)
text(0, 5, expression({q^"*"} (x) == N (x , 0, 3^2) ))
text(0, 4.5, expression({cq^"*"} (x) == 30* N (x , 0, 3^2) ))
text(1, 2, expression(p(x) == exp (0.4(x-0.4)^{2}  - 0.08 * x^{4})))
x0 <- -2.5
segments(x0,0,x0,qstar(x0),col=2)
N <- 10
for(i in 1:N){
  set.seed(i)
  ay <- runif(1,0,qstar(x0))
  acol = ifelse(ay < p(x0),2,4)
  points(x0,ay,col=acol,pch=19)
}

Rejection sampling

Proof: \[\begin{align*} p^*(x) &= \frac{p(x)}{Z} \\ &= \frac{p(x)}{\int_x p(x) dx} \\ &= \frac{[p(x)/c q^*(x)]q^*(x)}{\int_x [p(x)/c q^*(x)]q^*(x)dx} \\ \end{align*}\]

Interpretation of the numerator:

\(q^*(x):\) Sampling from the proposed distribution.
\(p(x)/c q^*(x):\) Rejection probability.

Rejection sampling

## rejection sampling
#p <- function(x, a=.4, b=.08){exp(a*(x-a)^2 - b*x^4)}
x <- seq(-4, 4, 0.1)
qstar <- function(x){
  dnorm(x,sd = 3) 
}
# we can find M in this case:
C <- round(max(p(x)/qstar(x))) + 1; C

## [1] 28

# number of samples
N <- 1000
# generate proposals and u
x.h <- rnorm( N, sd = 3)
u <- runif( N )
acc <- u < p(x.h) / (C * qstar(x.h))
x.acc <- x.h[ acc ]
# how many proposals are accepted
sum( acc ) /N

## [1] 0.285

# calculate some statistics
c(m=mean(x.acc), s=sd(x.acc))

##          m          s 
## -0.6207873  1.4258200

par(mfrow=c(1,2), mar=c(2,2,1,1))
plot(x,p(x),type="l")
barplot(table(round(x.acc,1))/length(x.acc))

Discussion: What does the acceptance rate depend on?

Importance sampling

Importance sampling is not a method for generating samples from \(p(x)\) (target 1), it is just a method for estimating the expectation of a function \(g(x)\) (target 2).

Goal: want to calculate expectation of \(\phi(x)\) under \(p^*(x)\).
Sampling from \(p^*(x)\) is hard.
Suppose we can sample from a simpler proposal distribution \(q^*\) instead.
If \(q^*\) dominates \(p^*\) (i.e., \(q^*(x)>0\) whenever \(p^*(x)>0\)), we can sample from \(q^*\) and reweight: \(w(x) = \frac{p^*(x)}{q^*(x)}\)

Importance sampling, algorithm

Underlying distribution: \(p(x)\) or \(p^*(x) = \frac{p(x)}{Z}\).
Function of interest \(\phi (x)\).

\[\mathbb{E} (\phi (x) | p^* ) = \int \phi (x) p^*(x) dx\]

If we can sample \(x_m\) from \(p^*(x)\), then we can use \(\frac{1}{M} \sum_{m=1}^M \phi(x_m)\) to estimate \(\mathbb{E} (\phi (x) | p^* )\)
If we cannot sample \(x_m\) from \(p^*(x)\),
- Rely on a proposed distribution (Sampler): \(q^*(x)\).

\[\begin{align*} \mathbb{E} (\phi (x) | p^* ) &= \int \phi (x) p^*(x) dx\\ &= \frac{\int \phi (x) p^*(x) dx}{\int p^*(x) dx}\\ &= \frac{\int [\phi (x) p(x)/Z] dx}{\int [p(x)/Z] dx}\\ &= \frac{\int [\phi (x) p(x)/q^*(x)] q^*(x) dx}{\int [p(x)/q^*(x)] q^*(x) dx}, \end{align*}\]

\(\mathbb{E} (\phi (x) | p^* )\) can be estimated using M draws \(x_1, \ldots, x_M\) from \(q^*(x)\) by the following expression.

\[\hat{\mathbb{E}} (\phi (x) | p^* ) = \frac{\frac{1}{M} \sum_{m=1}^M[\phi (x_m) p(x_m)/q^*(x_m)] }{ \frac{1}{M} \sum_{m=1}^M[p(x_m)/q^*(x_m)]}\]

\(w(x_m) =\frac{p(x_m)}{q^*(x_m)}\)

\[\hat{\mathbb{E}} (\phi (x) | p^* ) = \frac{ \sum_{m=1}^M \phi(x_m) w(x_m)}{ \sum_{m=1}^M w(x_m)} \]

Importance ratio: \(\frac{w(x_m)}{ \sum_{m=1}^M w(x_m)}\)
when \(q^* = p\), the regular mean estimator is a special case of the importance sampling.

Importance sampling, examples

Problem setting:

par(mfrow=c(1,2), mar=c(2,2,2,1))

x <- seq(-4, 4, 0.01)
plot(x,p(x),type="l",  main = expression(p(x) == exp (0.4(x-0.4)^{2}  - 0.08 * x^{4})))

phi <- function(x){ (- 1/3*x^3 + 1/2*x^2 + 12*x - 12) / 30 + 1.3}
x <- seq(-4, 4, 0.01)
plot(x,phi(x),type="l",main= expression(phi(x)))

\(p(x) = \exp [0.4 (x - 0.4) ^ 2 - 0.08 x^4]\)
\(\phi (x) = (- 1/3x^3 + 1/2x^2 + 12x - 12) / 30 + 1.3\) = right panel.

Underlying solution

ep <- function(x) p(x)*phi(x)
truthE <- integrate(f = ep, lower = -4, upper = 4)$value/integrate(f = p, lower = -4, upper = 4)$value
truthE

## [1] 0.6971733

Importance sampling, examples (2)

q.r <- rnorm
q.d <- dnorm

par(mfrow=c(1,2))
plot(x,q.d(x),type="l",main='sampler distribution Gaussian')
curve(p, from = -4,to = 4 ,col=2 ,  main = expression(p(x) == exp (0.4(x-0.4)^{2}  - 0.08 * x^{4})))

M <- 1000
x.m <- q.r(M)
ww <- p(x.m) / q.d(x.m)
qq <- ww / sum(ww)
x.g <- phi(x.m)
sum(x.g * qq)

## [1] 0.7022795

Number of samples for importance sampling

M <- 10^seq(1,7,length.out = 30)

result.g <- numeric(length(M))
for(i in 1:length(M)){
  aM <- M[i]
  x.m <- q.r(aM)
  ww <- p(x.m) / q.d(x.m)
  
  qq.g <- ww / sum(ww)
  x.g <- phi(x.m)
  
  result.g[i] <- sum(x.g * qq.g)/sum(qq.g)
}

plot(M,result.g,log = "x", main='importance sampling result Gaussian')
abline(h = truthE, col = 2)

Sampling from a narrow Gaussian distribution

q.r_narrow <- function(x){rnorm(x,0,1/2)}
q.d_narrow <- function(x){dnorm(x,0,1/2)}

par(mfrow=c(1,2))
plot(x,q.d_narrow(x),type="l",main='sampler narrow distribution Gaussian')
curve(p, from = -4,to = 4 ,col=2 ,  main = expression(p(x) == exp (0.4(x-0.4)^{2}  - 0.08 * x^{4})))

Number of samples for importance sampling (narrow Gaussian distribution)

M <- 10^seq(1,7,length.out = 30)

result.narrow <- numeric(length(M))
for(i in 1:length(M)){
  aM <- M[i]
  x.m <- q.r_narrow(aM)
  ww <- p(x.m) / q.d_narrow(x.m)
  
  qq.c <- ww / sum(ww)
  x.c <- phi(x.m)
  
  result.narrow[i] <- sum(x.c * qq.c)/sum(qq.c)
}
plot(M,result.narrow, log="x")
abline(h = truthE, col = 2)

Remarks for importance sampling

Want to estimate the \(\mathbb{E} (\phi (x) | p^* )\).
If the proposal density \(q^*(x)\) is small in a region where \(|\phi(x) p^*(x)|\) is large, it is quite possible that after many points \(x_m\) have been generated, none of them fell in that region. This leads to a wrong estimate of \(\mathbb{E} (\phi (x) | p^* )\).
Importance sampler should have heavy tails.
If \(q^* (x)\) can be chosen such that \(\frac{\phi p}{q^*}\) is roughly constant, then fairly precise estimates of the integral can be obtained.
Importance sampling is not a useful method if the importance ratios vary substantially. The worst possible scenario occurs when the importance ratios are small with high probability but with a low probability are huge.

Importance resampling (SIR)

SIR: sampling-importance resampling. (target 1, generate samples)
This is an alternative when rejection sampling constant \(c\) is not immediately available.
Algorithm (BDA3, reference)
- Draw samples \(x_1, \ldots, x_M \sim q^*(x)\).
- Calculated importance weights \(w_m = p(x_i)/q^*(x_i)\).
- Normalize the weights as \(W_m = \frac{w_m}{\sum_m w_m}\) (importance ratio).
- Resample (\(K\) out of \(M\)) from \(\{ x_1, \ldots, x_M \}\) where \(y_k, 1\le k \le K\) is drawn with probability \(W_m\). (without replacement)

Remark:

also see other people Resample (\(M\) out of \(M\)) with replacement.

Implement importance resampling (SIR)

#p <- function(x, a=.4, b=.08){exp(a*(x-a)^2 - b*x^4)}
x <- seq(-4, 4, 0.01)
plot(x,p(x),type="l")

qstar <- function(x){rep.int(0.125,length(x))} ## proposal distribution: uniform distribution.
N <- 10000
S <- 1000
x.qstar <- runif( N, -4, 4 )
ww <- p(x.qstar) / qstar(x.qstar)
qq <- ww / sum(ww)
x.acc <-sample(x.qstar, size = S, prob=qq, replace=F)
par(mfrow=c(1,2), mar=c(2,2,1,1))
plot(x,p(x),type="l")
barplot(table(round(x.acc,1))/length(x.acc))

Summarize

Direct sampling. (target 1)
Rejection sampling. (target 1)
Importance sampling. (target 2)
Sampling-importance resampling. (target 1)

limitation of Monte Carlo method

Direct sampling
- Often hard to compute the normalization factor \(Z\).
- Hard to get rare events, especially in higher dimensional spaces.
Rejection sampling, importance sampling.
- Do not work well if the proposed distribution \(q^*(x)\) is very different from \(p(x)\).
- Constructing a \(q^*(x)\) similar to \(p(x)\) can be difficult.
  - Making a good proposal usually requires knowledge of the analytic form of \(p(x)\) - but if we had that, we wouldn’t even need to sample!
Solution: instead of a fixed proposed distribution \(q^*(x)\), we can use an adaptive proposal.

Motivation of Metropolis-Hastings

Drawbacks of rejection sampling and SIR are that it is difficult to propose an efficient distribution \(q^*(x)\)
For rejection sampling, it is also difficult to find \(M\).
A smart idea is let the proposed distribution depends on the last accepted value.
Instead of fixed \(q(x')\), we use \(q(x'|x)\) where \(x'\) is the new state being sampled and \(x\) is the previous sample.
As \(x\) changes, \(q(x'|x)\) can also change (as a function of \(x'\).)

Monte Carlo vs Markov Chain Monte Carlo

Monte Carlo methods: simulation/sampling methods.
- Simulation
- Rejection sampling
- SIR
Markov Chain Monte Carlo: A type of Monte Carlo method – next step samples depend on previous samples.
- Metropolis-Hastings
- Gibbs Sampling