efficient MCMC sampling

Maxime Vono, Daniel Paulin and Arnaud Doucet recently arXived a paper about a regularisation technique that allows for efficient sampling from a complex posterior which potential function factorises as a large sum of transforms of linear projections of the parameter θ

U(\theta)=\sum_i U_i(A_i\theta)

The central idea in the paper [which was new to me] is to introduce auxiliary variates for the different terms in the sum, replacing the projections in the transforms, with an additional regularisation forcing these auxiliary variates to be as close as possible from the corresponding projection

U(\theta,\mathbf z)=\sum_i U_i(z_i)+\varrho^{-1}||z_i-A_i\theta||^2

This is only an approximation to the true target but it enjoys the possibility to run a massive Gibbs sampler in quite a reduced dimension. As the variance ρ of the regularisation term goes to zero the marginal posterior on the parameter θ converges to the true posterior. The authors manage to achieve precise convergence rates both in total variation and in Wasserstein distance.

From a practical point of view, only judging from the logistic example, it is hard to fathom how much this approach improves upon other approaches (provided they still apply) as the impact of the value of ρ should be assessed on top of the convergence of the high-dimensional Gibbs sampler. Or is there an annealing version in the pipe-line? While parallelisation is a major argument, it also seems that the Gibbs sampler need a central monitoring for each new simulation of θ. Unless some asynchronous version can be implemented.

One Response to “efficient MCMC sampling”

  1. Thanks for commenting on our paper. The main goal of this work is to provide a theoretical explanation for the good empirical performance of this type of algorithms (first introduced independently by Rendell, Johansen, Lee & Whiteley and Maxime, Dobigeon and Chainais), although we also propose therein a useful methodological extension.

    As you explained in your post, the state space for \theta is extended by auxiliary variables \mathbf z=(z_1,\ldots z_b) and the joint log-likelihood is defined as U(\theta,\mathbf z)=\sum_i U_i(z_i)+\varrho^{-1}||z_i-A_i\theta||^2, where \varrho is a regularisation parameter. When \varrho\to 0, the marginal of theta tends to the original target. Split Gibbs Sampling proceeds by updating z_1,\ldots, z_b conditioned on \theta, and then \theta conditioned on \mathbf z.

    The mixing and approximation error results in this paper show how the parameter \varrho governs the trade-off between faster mixing (when \varrho is large) and lower bias for the Gibbs sampler (when \varrho is small). An interesting feature of these bounds is that they only depend on the strong convexity constants of the potentials U_1, \ldots, U_b and neither require differentiability nor smoothness, hence they are applicable to problems with non-differentiable priors (e.g. our image inpainting example).

    We would like to emphasize that the running times reported in the paper are for an implementation of the algorithm on a serial computer. Even implemented on a serial computer, the algorithm provides state-of-the-art results for both high-dimensional logistic regression (d=784) and image inpainting. This is in agreement with earlier empirical results reported for such algorithms.

    Maxime, Daniel and Arnaud

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.