## transformation MCMC

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , on January 3, 2022 by xi'an

For reasons too long to describe here, I recently came across a 2013 paper by Dutta and Bhattacharya (from ISI Kolkata) entitled MCMC based on deterministic transforms, which sounded a bit dubious until I realised the deterministic label apply to the choice of the transformation and not to the Metropolis-Hastings proposal… The core of the proposed method is to make a proposal that simultaneously considers a move and its inverse, namely from x to either x’=T(x,ε) or x”=T⁻¹(x,ε) , where ε is an independent random noise, possibly degenerated to a manifold of lesser dimension. Due to the symmetry the acceptance probability is then a ratio of the target, multiplied by the x-Jacobian of T (as in reversible jump). I tried the method on a mixture of Gamma distributions target (in red) with an Exponential scale change and the resulting sample indeed fitted said target.

The authors even make an argument in favour of a unidimensional noise, although this amounts to running an implicit Gibbs sampler. Argument based on a reduced simulation cost for ε, albeit the full dimensional transform x’=T(x,ε) still requires to be computed. And as noted in the paper this also requires checking for irreducibility. The claim for higher efficiency found therein is thus mostly unsubstantiated…

“The detailed balance requirement also demands that, given x, the regions covered by the forward and the backward transformations are disjoint.”

The above statement is also surprising in that the generic detailed balance condition does not impose such a restriction.

## distributed evidence

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on December 16, 2021 by xi'an

Alexander Buchholz (who did his PhD at CREST with Nicolas Chopin), Daniel Ahfock, and my friend Sylvia Richardson published a great paper on the distributed computation of Bayesian evidence in Bayesian Analysis. The setting is one of distributed data from several sources with no communication between them, which relates to consensus Monte Carlo even though model choice has not been particularly studied from that perspective. The authors operate under the assumption of conditionally conjugate models, i.e., the existence of a data augmentation scheme into an exponential family so that conjugate priors can be used. For a division of the data into S blocks, the fundamental identity in the paper is

$p(y) = \alpha^S \prod_{s=1}^S \tilde p(y_s) \int \prod_{s=1}^S \tilde p(\theta|y_s)\,\text d\theta$

where α is the normalising constant of the sub-prior exp{log[p(θ)]/S} and the other terms are associated with this prior. Under the conditionally conjugate assumption, the integral can be approximated based on the latent variables. Most interestingly, the associated variance is directly connected with the variance of

$p(z_{1:S}|y)\Big/\prod_{s=1}^S \tilde p(z_s|y_s)$

under the joint:

“The variance of the ratio measures the quality of the product of the conditional sub-posterior as an importance sample proposal distribution.”

Assuming this variance is finite (which is likely). An approximate alternative is proposed, namely to replace the exact sub-posterior with a Normal distribution, as in consensus Monte Carlo, which should obviously require some consideration as to which parameterisation of the model produces the “most normal” (or the least abnormal!) posterior. And ensures a finite variance in the importance sampling approximation (as ensured by the strong bounds in Proposition 5). A problem shared by the bridgesampling package.

“…if the error that comes from MCMC sampling is relatively small and that the shard sizes are large enough so that the quality of the subposterior normal approximation is reasonable, our suggested approach will result in good approximations of the full data set marginal likelihood.”

The resulting approximation can also be handy in conjunction with reversible jump MCMC, in the sense that RJMCMC algorithms can be run in parallel on different chunks or shards of the entire dataset. Although the computing gain may be reduced by the need for separate approximations.

## averaged acceptance ratios

Posted in Statistics with tags , , , , , , , , , , , , , on January 15, 2021 by xi'an

In another recent arXival, Christophe Andrieu, Sinan Yıldırım, Arnaud Doucet, and Nicolas Chopin study the impact of averaging estimators of acceptance ratios in Metropolis-Hastings algorithms. (It is connected with the earlier arXival rephrasing Metropolis-Hastings in terms of involutions discussed here.)

“… it is possible to improve performance of this algorithm by using a modification where the acceptance ratio r(ξ) is integrated with respect to a subset of the proposed variables.”

This interpretation of the current proposal makes it a form of Rao-Blackwellisation, explicitly mentioned on p.18, where, using a mixture proposal, with an adapted acceptance probability, it depends on the integrated acceptance ratio only. Somewhat magically using this ratio and its inverse with probability ½. And it increases the average Metropolis-Hastings acceptance probability (albeit with a larger number of simulations). Since the ideal averaging is rarely available, the authors implement a Monte Carlo averaging version. With applications to the exchange algorithm and to reversible jump MCMC. The major application is to pseudo-marginal settings with a high complexity (in the number T of terms) and where the authors’ approach does scale efficiently with T. There is even an ABC side to the story as one illustration is made of the ABC approximation to the posterior of an α-stable sample. As an encompassing proposal for handling Metropolis-Hastings environments with latent variables and several versions of the acceptance ratios, this is quite an interesting paper that I think we will study in further detail with our students.

## deterministic moves in Metropolis-Hastings

Posted in Books, Kids, R, Statistics with tags , , , , , , , , on July 10, 2020 by xi'an

A curio on X validated where an hybrid Metropolis-Hastings scheme involves a deterministic transform, once in a while. The idea is to flip the sample from one mode, ν, towards the other mode, μ, with a symmetry of the kind

μ-α(x+μ) and ν-α(x+ν)

with α a positive coefficient. Or the reciprocal,

-μ+(μ-x)/α and -ν+(ν-x)/α

for… reversibility reasons. In that case, the acceptance probability is simply the Jacobian of the transform to the proposal, just as in reversible jump MCMC.

Why the (annoying) Jacobian? As explained in the above slides (and other references), the Jacobian is there to account for the change of measure induced by the transform.

Returning to the curio, the originator of the question had spotted some discrepancy between the target and the MCMC sample, as the moments did not fit well enough. For a similar toy model, a balanced Normal mixture, and an artificial flip consisting of

x’=±1-x/2 or x’=±2-2x

implemented by

  u=runif(5)
if(u[1]<.5){
mhp=mh[t-1]+2*u[2]-1
mh[t]=ifelse(u[3]<gnorm(mhp)/gnorm(mh[t-1]),mhp,mh[t-1])
}else{
dx=1+(u[4]<.5)
mhp=ifelse(dx==1,
ifelse(mh[t-1]<0,1,-1)-mh[t-1]/2,
2*ifelse(mh[t-1]<0,-1,1)-2*mh[t-1])
mh[t]=ifelse(u[5]<dx*gnorm(mhp)/gnorm(mh[t-1])/(3-dx),mhp,mh[t-1])


I could not spot said discrepancy beyond Monte Carlo variability.

## non-reversible jump MCMC

Posted in Books, pictures, Statistics with tags , , , , , , , on June 29, 2020 by xi'an

Philippe Gagnon and et Arnaud Doucet have recently arXived a paper on a non-reversible version of reversible jump MCMC, the methodology introduced by Peter Green in 1995 to tackle Bayesian model choice/comparison/exploration. Whom Philippe presented at BayesComp20.

“The objective of this paper is to propose sampling schemes which do not suffer from such a diffusive behaviour by exploiting the lifting idea (…)”

The idea is related to lifting, creating non-reversible behaviour by adding a direction index (a spin) to the exploration of the models, assumed to be totally ordered, as with nested models (mixtures, changepoints, &tc.).  As with earlier versions of lifting, the chain proceeds along one (spin) direction until the proposal is rejected in which case the spin spins. The acceptance probability in the event of a change of model (upwards or downwards) is essentially the same as the reversible one (meaning it includes the dreaded Jacobian!). The original difficulty with reversible jump remains active with non-reversible jump in that the move from one model to the next must produce plausible values. The paper recalls two methods proposed by Christophe Andrieu and his co-authors. One consists in buffering a tempering sequence, but this proves costly.  Pursuing the interesting underlying theme that both reversible and non-reversible versions are noisy approximations of the marginal ratio, the other one consists in marginalising out the parameter to approximate the marginal probability of moving between nearby models. Combined with multiple choice to preserve stationarity and select more likely moves at the same time. Still requiring a multiplication of the number of simulations but parallelisable. The paper contains an exact comparison result that non-reversible jump leads to a smaller asymptotic variance than reversible jump, but it is unclear to me whether or not this accounts for the extra computing time resulting from the multiple paths in the proposed algorithms. (Even though the numerical illustration shows an improvement brought by the non-reversible side for the same computational budget.)