This paper by Andrès Christen and Colin Fox is an reexamination of an earlier paper by Fox and Nicholls (1997) where the idea of testing for acceptance before computing the exact likelihood was first suggested. In the event of a low acceptance rate, and of a good enough approximation, the overall cost can be significantly reduced obviously. The issue is then to recover the exact target. In practice, the original proposal q is used to generate a value θ’ that is tested against the approximate target f⁰. If accepted, θ’ is then tested against the true target f, using a pseudo proposal q⁰ which is simply translating the earlier step. A sort of delayed acceptance that only efficiently reduces the computing cost when the approximation is good enough. Since, overall, the probability of acceptance of a given θ’ is smaller with this scheme. (Or, as stated within the paper, the original kernel dominates the new one in Peskun’s sense.) The most interesting question about this paper is how to compute the approximation. Subsampling the original data and hence using a small enough part of the likelihood function reduces the computing cost but induces variability.
#prior p~Be(7.5,.5) #data x~B(N,p) #posterior p|x~Be(x+7.5,N+.5-x) T=10^4;N=100;y=32;a=7.5;b=.5 p=rep(.2,T) for (t in 2:T){ pp=p[t-1]+runif(1,-.1,.1) #random walk p[t]=p[t-1] #by default rejection if ((pp>0)&(pp<1)){ ratio=dbinom(y,N,pp)/dbinom(y,N,p[t-1]) if ((runif(1)<ratio)&(runif(1)< dbeta(pp,a,b)/dbeta(p[t-1],a,b))) p[t]=pp }}
One application of the method I checked during my stay in Leuven was to separate the likelihood from the prior and compare only one term to a uniform at a time. For instance, with a costly prior, the approximate target f⁰ could be the likelihood, in which case the first acceptance probability would be based on the ratio of the likelihoods and the second acceptance probability on the ratio of the priors. As shown by the above graph, on a standard normal-normal example, the true posterior and the density estimate of the simulated sample almost coincide (for 10⁵ MCMC iterations)…
Actually, the principle also applies to a product of individual terms as in a likelihood so each term can be evaluated separately! This increases the variability and the potential for rejection but if each likelihood term is costly to compute this could bring some improvement. The graphs and the R code below illustrate an implementation in the Beta-binomial case when the binomial observation is replaced with a sequence of Bernoulli observations. The fit is adequate on 10⁵ iterations, but the autocorrelation in the sequence is high (the ACF is for the 100 times thinned sequence) and the acceptance rate down to 5%.
#even better, use the likelihood decomposition T=10^5 yful=rep(0,N) yful[sample(1:N,y)]=1 #pseudo Bernoulli sequence p=rep(.2,T) for (t in 2:T){ pp=p[t-1]+runif(1,-.1,.1) #random walk p[t]=p[t-1] #by default rejection if ((pp>0)&(pp<1)){ n=1 ratio=dbinom(yful[n],1,pp)/dbinom(yful[n],1,p[t-1]) while ((n<N)&(runif(1)<ratio)){ n=n+1 ratio=dbinom(yful[n],1,pp)/dbinom(yful[n],1,p[t-1])} if ((n==N)&(runif(1)<ratio)){ if (runif(1)<dbeta(pp,a,b)/dbeta(p[t-1],a,b)) p[t]=pp } }}