Zhiguang Huo (Caleb)
Monday Nov 26, 2017
Examples:
Problem: We want to estimate \[\mathbb{E}(g (x) | p^*(x)) = \int g(x) p^*(x) dx,\] Given distribution \(p^*(x)\).
Examples: \(\mathbb{E} (x | p^*(x))\) or \(\mathbb{V}\mbox{ar} (x | p^*(x))\)
## 7.852178 with absolute error < 9.1e-06
x <- seq(-4, 4, 0.01)
plot(x,p(x),type="l", main = expression(p(x) == exp (0.4(x-0.4)^{2} - 0.08 * x^{4})))
x2 <- seq(-4, 4, 0.1)
plot(x,p(x),type="n", main = expression(p(x) == exp (0.4(x-0.4)^{2} - 0.08 * x^{4})))
segments(x2,0,x2,p(x2))
(e.g. \(p(x) = \exp [ 0.4(x-0.4)^2 - 0.08x^4 ]\))
Sample from a simpler distribution \(q^*(x)\).
Rejection sampling algorithm:
x <- seq(-4, 4, 0.01)
plot(x,p(x),type="l", main = expression(p(x) == exp (0.4(x-0.4)^{2} - 0.08 * x^{4})))
x <- seq(-4, 4, 0.01)
qstar <- function(x, C = 30){
C*dnorm(x,sd = 3)
}
plot(x,p(x),type="l", ylim = c(0,5))
curve(qstar,add = T)
text(0, 5, expression({q^"*"} (x) == N (x , 0, 3^2) ))
text(0, 4.5, expression({cq^"*"} (x) == 30* N (x , 0, 3^2) ))
text(1, 2, expression(p(x) == exp (0.4(x-0.4)^{2} - 0.08 * x^{4})))
x0 <- -2.5
segments(x0,0,x0,qstar(x0),col=2)
N <- 10
for(i in 1:N){
set.seed(i)
ay <- runif(1,0,qstar(x0))
acol = ifelse(ay < p(x0),2,4)
points(x0,ay,col=acol,pch=19)
}
Proof: \[\begin{align*} p^*(x) &= \frac{p(x)}{Z} \\ &= \frac{p(x)}{\int_x p(x) dx} \\ &= \frac{[p(x)/c q^*(x)]q^*(x)}{\int_x [p(x)/c q^*(x)]q^*(x)dx} \\ \end{align*}\]
Interpretation of the numerator:
## rejection sampling
#p <- function(x, a=.4, b=.08){exp(a*(x-a)^2 - b*x^4)}
x <- seq(-4, 4, 0.1)
qstar <- function(x){
dnorm(x,sd = 3)
}
# we can find M in this case:
C <- round(max(p(x)/qstar(x))) + 1; C
## [1] 28
# number of samples
N <- 1000
# generate proposals and u
x.h <- rnorm( N, sd = 3)
u <- runif( N )
acc <- u < p(x.h) / (C * qstar(x.h))
x.acc <- x.h[ acc ]
# how many proposals are accepted
sum( acc ) /N
## [1] 0.285
## m s
## -0.6207873 1.4258200
par(mfrow=c(1,2), mar=c(2,2,1,1))
plot(x,p(x),type="l")
barplot(table(round(x.acc,1))/length(x.acc))
Discussion: What does the acceptance rate depend on?
Importance sampling is not a method for generating samples from \(p(x)\) (target 1), it is just a method for estimating the expectation of a function \(g(x)\) (target 2).
\[\mathbb{E} (\phi (x) | p^* ) = \int \phi (x) p^*(x) dx\]
If we can sample \(x_m\) from \(p^*(x)\), when we can use \(\frac{1}{M} \phi(x_m)\) to estimate \(\mathbb{E} (\phi (x) | p^* )\)
\[\hat{\mathbb{E}} (\phi (x) | p^* ) = \frac{\frac{1}{M} \sum_{m=1}^M[\phi (x_m) p(x_m)/q^*(x_m)] }{ \frac{1}{M} \sum_{m=1}^M[p(x_m)/q^*(x_m)]}\]
\(w(x_m) =\frac{p(x_m)}{q^*(x_m)}\)
\[\hat{\mathbb{E}} (\phi (x) | p^* ) = \frac{ \sum_{m=1}^M \phi(x_m) w(x_m)}{ \sum_{m=1}^M w(x_m)} \]
Importance ratio: \(\frac{w(x_m)}{ \sum_{m=1}^M w(x_m)}\)
when \(q^* = p\), the regular mean estimator is a special case of the importance sampling.
par(mfrow=c(1,2), mar=c(2,2,2,1))
x <- seq(-4, 4, 0.01)
plot(x,p(x),type="l", main = expression(p(x) == exp (0.4(x-0.4)^{2} - 0.08 * x^{4})))
phi <- function(x){ (- 1/3*x^3 + 1/2*x^2 + 12*x - 12) / 30 + 1.3}
x <- seq(-4, 4, 0.01)
plot(x,phi(x),type="l",main= expression(phi(x)))
ep <- function(x) p(x)*phi(x)
truthE <- integrate(f = ep, lower = -4, upper = 4)$value/integrate(f = p, lower = -4, upper = 4)$value
truthE
## [1] 0.6971733
q.r <- rnorm
q.d <- dnorm
par(mfrow=c(1,2))
plot(x,q.d(x),type="l",main='sampler distribution Gaussian')
curve(p, from = -4,to = 4 ,col=2 , main = expression(p(x) == exp (0.4(x-0.4)^{2} - 0.08 * x^{4})))
## [1] 0.7022795
M <- 10^seq(1,7,length.out = 30)
result.g <- numeric(length(M))
for(i in 1:length(M)){
aM <- M[i]
x.m <- q.r(aM)
ww <- p(x.m) / q.d(x.m)
qq.g <- ww / sum(ww)
x.g <- phi(x.m)
result.g[i] <- sum(x.g * qq.g)/sum(qq.g)
}
plot(log10(M),result.g,main='importance sampling result Gaussian')
abline(h = truthE, col = 2)
Remark:
#p <- function(x, a=.4, b=.08){exp(a*(x-a)^2 - b*x^4)}
x <- seq(-4, 4, 0.01)
plot(x,p(x),type="l")
qstar <- function(x){rep.int(0.125,length(x))}
N <- 10000
S <- 1000
x.qstar <- runif( N, -4, 4 )
ww <- p(x.qstar) / qstar(x.qstar)
qq <- ww / sum(ww)
x.acc <-sample(x.qstar, size = S, prob=qq, replace=F)
par(mfrow=c(1,2), mar=c(2,2,1,1))
plot(x,p(x),type="l")
barplot(table(round(x.acc,1))/length(x.acc))
\(\frac{p(x')}{q(x'|x)}/\frac{p(x)}{q(x|x')}\) is a ratio of importance sampling weights.
N <- 10000
x.acc5 <- rep.int(NA, N)
u <- runif(N)
acc.count <- 0
std <- 1 ## Spread of proposal distribution
xc <- 0; ## Starting value
for (ii in 1:N){
xp <- rnorm(1, mean=xc, sd=std) ## proposal
alpha <- min(1, (p(xp)/p(xc)) *
(dnorm(xc, mean=xp,sd=std)/dnorm(xp, mean=xc,sd=std)))
x.acc5[ii] <- xc <- ifelse(u[ii] < alpha, xp, xc)
## find number of acccepted proposals:
acc.count <- acc.count + (u[ii] < alpha)
}
## Fraction of accepted *new* proposals
acc.count/N
## [1] 0.7341
N <- 1000
x.acc5 <- rep.int(NA, N)
u <- runif(N)
acc.count <- 0
std <- 1 ## Spread of proposal distribution
xc <- 8; ## Starting value
for (ii in 1:N){
xp <- rnorm(1, mean=xc, sd=std) ## proposal
alpha <- min(1, (p(xp)/p(xc)) *
(dnorm(xc, mean=xp,sd=std)/dnorm(xp, mean=xc,sd=std)))
x.acc5[ii] <- xc <- ifelse(u[ii] < alpha, xp, xc)
## find number of acccepted proposals:
acc.count <- acc.count + (u[ii] < alpha)
}
## Fraction of accepted *new* proposals
acc.count/N
## [1] 0.732
N <- 1000
x.acc5 <- rep.int(NA, N)
u <- runif(N)
acc.count <- 0
std <- 0.1 ## Spread of proposal distribution
xc <- 0; ## Starting value
for (ii in 1:N){
xp <- rnorm(1, mean=xc, sd=std) ## proposal
alpha <- min(1, (p(xp)/p(xc)) *
(dnorm(xc, mean=xp,sd=std)/dnorm(xp, mean=xc,sd=std)))
x.acc5[ii] <- xc <- ifelse(u[ii] < alpha, xp, xc)
## find number of acccepted proposals:
acc.count <- acc.count + (u[ii] < alpha)
}
## Fraction of accepted *new* proposals
acc.count/N
## [1] 0.974
\[ A(x'|x) = \frac{p(x')q(x|x')}{p(x)q(x'|x)}\] \[ p(x)q(x'|x) A(x'|x) = p(x')q(x|x') \] \[ p(x)q(x'|x) A(x'|x) = p(x')q(x|x') A(x|x')\] \[ p(x)T(x'|x) = p(x')T(x|x') \] The last line is called detailed balance condition.
\[ \int_{x} p(x)T(x'|x) dx = \int_{x} p(x')T(x|x') dx\] \[ p(x') = \int_{x} p(x)T(x'|x) dx\]
Since p(x) is the true distribution. MH algorithm will eventually converges to the true distribution.
Similarly
Iteration | I | G | S |
---|---|---|---|
init | \(1\) | \(1\) | \(1\) |
1 | |||
2 | |||
3 | |||
… | … | … | … |
K |
I <- 1; G <- 1; S <- 1
pG <- c(0.5, 0.8)
pS <- c(0.5, 0.7)
pI <- c(0.19, 0.36, 0.49,0.69)
i <- 1
plot(1:3,ylim=c(0,10),type="n", xaxt="n", ylab="iteration")
axis(1, at=1:3,labels=c("I", "G", "S"), col.axis="red")
text(x = 1:3, y= i, label = c(I, G, S))
set.seed(32611)
while(i<10){
I <- rbinom(1,1,pI[2*G+S+1])
G <- rbinom(1,1,pG[I+1])
S <- rbinom(1,1,pS[I+1])
i <- i + 1
text(x = 1:3, y= i, label = c(I, G, S))
}
Posterior can be derived using Bayes rule:
\(var_\alpha \doteq 1/(1/\tau_a^2 + n/\sigma^2)\), \(var_\beta \doteq 1/(1/\tau_b^2 + \sum_{i=1}^n x_i^2/\sigma^2)\)
\(\alpha | x_1^n, y_1^n, \beta, \sigma^2 \sim N(var_\alpha(\sum_{i=1}^n (y_i - \beta x_i)/\sigma^2 + \alpha_0 / \tau_a^2), var_\alpha)\)
\(\beta | x_1^n, y_1^n, \alpha, \sigma^2 \sim N(var_\beta(\sum_{i=1}^n \big( (y_i - \alpha) x_i \big) /\sigma^2 + \beta_0 / \tau_b^2), var_\beta)\)
\(\sigma^2 | x_1^n, y_1^n, \alpha, \beta \sim IG(\nu + n/2, \mu + \sum_{i=1}^n (y_i - \alpha - \beta x_i)^2/2)\)
set.seed(32611)
n = 100; alpha = 0; beta=2; sig2=0.5;true=c(alpha,beta,sig2)
x=rnorm(n)
y=rnorm(n,alpha+beta*x,sqrt(sig2))
# Prior hyperparameters
alpha0=0;tau2a=10;beta0=0;tau2b=10;nu0=3;mu0=3
# Setting up starting values
alpha=0;beta=0;sig2=1
# Gibbs sampler
M = 1000
draws = matrix(0,M,3)
draws[1,] <- c(alpha,beta,sig2)
for(i in 2:M){
var_alpha = 1/(1/tau2a + n/sig2)
mean = var_alpha*(sum(y-beta*x)/sig2 + alpha0/tau2a)
alpha = rnorm(1,mean,sqrt(var_alpha))
var_beta = 1/(1/tau2b + sum(x^2)/sig2)
mean = var_beta*(sum((y-alpha)*x)/sig2+beta0/tau2b)
beta = rnorm(1,mean,sqrt(var_beta))
sig2 = 1/rgamma(1,(nu0+n/2),(mu0+sum((y-alpha-beta*x)^2)/2))
draws[i,] = c(alpha,beta,sig2)
}
# Markov chain + marginal posterior
names = c('alpha','beta','sig2')
colnames(draws) <- names
ind = 101:M
par(mfrow=c(3,3))
for(i in 1:3){
ts.plot(draws[,i],xlab='iterations',ylab="",main=names[i])
abline(v=ind[1],col=4)
abline(h=true[i],col=2,lwd=2)
acf(draws[ind,i],main="")
hist(draws[ind,i],prob=T,main="",xlab="")
abline(v=true[i],col=2,lwd=2)
}
\[R_x(k) = \frac {\sum_{t=1}^{n-k} (x_t - \bar{x})(x_{t+k} - \bar{x})} {\sum_{t=1}^{n-k} (x_t - \bar{x})^2} \]
High autocorrelation leads to smaller effective sample size.