Archive for variable dimension models

a day for comments

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on April 21, 2014 by xi'an

As I was flying over Skye (with [maybe] a first if hazy perspective on the Cuillin ridge!) to Iceland, three long sets of replies to some of my posts appeared on the ‘Og:

Thanks to them for taking the time to answer my musings…


shrinkage-thresholding MALA for Bayesian variable selection

Posted in Statistics, University life with tags , , , , , on March 10, 2014 by xi'an

IMG_2515Amandine Shreck along with her co-authors Gersende Fort, Sylvain LeCorff, and Eric Moulines, all from Telecom Paristech, has undertaken to revisit the problem of large p small n variable selection. The approach they advocate mixes Langevin algorithms with trans-model moves with shrinkage thresholding. The corresponding Markov sampler is shown to be geometrically ergodic, which may be a première in that area. The paper was arXived in December but I only read it on my flight to Calgary, not overly distracted by the frozen plains of Manitoba and Saskatchewan. Nor by my neighbour watching Hunger Games II.)

A shrinkage-thresholding operator is defined as acting on the regressor matrix towards producing sparse versions of this matrix. (I actually had trouble picturing the model until Section 2.2 where the authors define the multivariate regression model, making the regressors a matrix indeed. With a rather unrealistic iid Gaussian noise. And with an unknown number of relevant rows, hence a varying dimension model. Note that this is a strange regression in that the regression coefficients are known and constant across all models.) Because the Langevin algorithm requires a gradient to operate, the log target is divided between a differentiable and a non-differentiable parts, the later accommodating the Dirac masses in the dominating measure. The new MALA moves involve applying the above shrinkage-thresholding operator to a regular Langevin proposal, hence moving to sub-spaces and sparser representations.

The thresholding functions are based on positive part operators, which means that the Markov chain does not visit some neighbourhoods of zero in the embedding and in the sparser spaces. In other words, the proposal operates between models of varying dimensions without further ado because the point null hypotheses are replaced with those neighbourhoods. Hence it is not exactly simulating from the “original” posterior, which may be a minor caveat or not. Not if defining the neighbourhoods is driven by an informed or at least spelled-out choice of a neighbourhood of zero where the coefficients are essentially identified with zero. The difficulty is then in defining how close is close enough. Especially since the thresholding functions seem to all depend on a single number which does not depend on the regressor matrix. It would be interesting to see if the g-prior version could be developed as well… Actually, I would have also included a dose of g-prior in the Langevin move, rather than using an homogeneous normal noise.

The paper contains a large experimental part where the performances of the method are evaluated on various simulated datasets. It includes a comparison with reversible jump MCMC, which slightly puzzles me: (a) I cannot see from the paper whether or not the RJMCMC is applied to the modified (thresholded) posterior, as a regular RJMCMC would not aim at the same target, but the appendix does not indicate a change of target; (b) the mean error criterion for which STMALA does better than RJMCMC is not defined, but the decrease of this criterion along iterations seems to indicate that convergence has not yet occured, since it does not completely level up after 3 10⁵ iterations.

I must have mentioned it in another earlier post, but I find somewhat ironical to see those thresholding functions making a comeback after seeing the James-Stein and smooth shrinkage estimators taking over the then so-called pre-test versions in the 1970’s (Judge and Bock, 1978) and 1980’s. There are obvious reasons for this return, moving away from quadratic loss being one.

new MCMC algorithm for Bayesian variable selection

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on February 25, 2014 by xi'an

Flight from Bristol to Amsterdam, April 03, 2011Unfortunately, I will miss the incoming Bayes in Paris seminar next Thursday (27th February), as I will be flying to Montréal and then Québec at the time (despite having omitted to book a flight till now!). Indeed Amandine Shreck will give a talk at 2pm in room 18 of ENSAE, Malakoff, on A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection, a work written jointly with Gersende Fort, Sylvain Le Corff, and Eric Moulines, and arXived at the end of 2013 (which may explain why I missed it!). Here is the abstract:

This paper introduces a new Markov Chain Monte Carlo method to perform Bayesian variable selection in high dimensional settings. The algorithm is a Hastings-Metropolis sampler with a proposal mechanism which combines (i) a Metropolis adjusted Langevin step to propose local moves associated with the differentiable part of the target density with (ii) a shrinkage-thresholding step based on the non-differentiable part of the target density which provides sparse solutions such that small components are shrunk toward zero. This allows to sample from distributions on spaces with different dimensions by actually setting some components to zero. The performances of this new procedure are illustrated with both simulated and real data sets. The geometric ergodicity of this new transdimensional Markov Chain Monte Carlo sampler is also established.

(I will definitely get a look at the paper over the coming days!)