As CHANCE book editor, I received the other day from Oxford University Press acts from an École de Physique des Houches on Statistical Physics, Optimisation, Inference, and Message-Passing Algorithms that took place there in September 30 – October 11, 2013. While it is mostly unrelated with Statistics, and since Igor Caron already reviewed the book a year and more ago, I skimmed through the few chapters connected to my interest, from Devavrat Shah’s chapter on graphical models and belief propagation, to Andrea Montanari‘s denoising and sparse regression, including LASSO, and only read in some detail Manfred Opper’s expectation propagation chapter. This paper made me realise (or re-realise as I had presumably forgotten an earlier explanation!) that expectation propagation can be seen as a sort of variational approximation that produces by a sequence of iterations the distribution within a certain parametric (exponential) family that is the closest to the distribution of interest. By writing the Kullback-Leibler divergence the opposite way from the usual variational approximation, the solution equates the expectation of the natural sufficient statistic under both models… Another interesting aspect of this chapter is the connection with estimating normalising constants. (I noticed a slight typo on p.269 in the final form of the Kullback approximation q() to p().
Archive for sparsity
Today, at JSM 2015, in Seattle, I attended several Bayesian sessions, having sadly missed the Dennis Lindley memorial session yesterday, as it clashed with my own session. In the morning sessions on Bayesian model choice, David Rossell (Warwick) defended non-local priors à la Johnson (& Rossell) as having better frequentist properties. Although I appreciate the concept of eliminating a neighbourhood of the null in the alternative prior, even from a Bayesian viewpoint since it forces us to declare explicitly when the null is no longer acceptable, I find the asymptotic motivation for the prior less commendable and open to arbitrary choices that may lead to huge variations in the numerical value of the Bayes factor. Another talk by Jin Wang merged spike and slab with EM with bootstrap with random forests in variable selection. But I could not fathom what the intended properties of the method were… Besides returning another type of MAP.
The second Bayesian session of the morn was mostly centred on sparsity and penalisation, with Carlos Carvalho and Rob McCulloch discussing a two step method that goes through a standard posterior construction on the saturated model, before using a utility function to select the pertinent variables. Separation of utility from prior was a novel concept for me, if not for Jay Kadane who objected to Rob a few years ago that he put in the prior what should be in the utility… New for me because I always considered the product prior x utility as the main brick in building the Bayesian edifice… Following Herman Rubin’s motto! Veronika Rocková linked with this post-LASSO perspective by studying spike & slab priors based on Laplace priors. While Veronicka’s goal was to achieve sparsity and consistency, this modelling made me wonder at the potential equivalent in our mixtures for testing approach. I concluded that having a mixture of two priors could be translated in a mixture over the sample with two different parameters, each with a different prior. A different topic, namely multiple testing, was treated by Jim Berger, who showed convincingly in my opinion that a Bayesian approach provides a significant advantage.
In the afternoon finalists of the ISBA Savage Award presented their PhD work, both in the theory and methods section and in the application section. Besides Veronicka Rocková’s work on a Bayesian approach to factor analysis, with a remarkable resolution via a non-parametric Indian buffet prior and a variable selection interpretation that avoids MCMC difficulties, Vinayak Rao wrote his thesis on MCMC methods for jump processes with a finite number of observations, using a highly convincing completion scheme that created independence between blocks and which reminded me of the Papaspiliopoulos et al. (2005) trick for continuous time processes. I do wonder at the potential impact of this method for processing the coalescent trees in population genetics. Two talks dealt with inference on graphical models, Masanao Yajima and Christine Peterson, inferring the structure of a sparse graph by Bayesian methods. With applications in protein networks. And with again a spike & slab prior in Christine’s work. The last talk by Sayantan Banerjee was connected to most others in this Savage session in that it also dealt with sparsity. When estimating a large covariance matrix. (It is always interesting to try to spot tendencies in awards and conferences. Following the Bayesian non-parametric era, are we now entering the Bayesian sparsity era? We will see if this is the case at ISBA 2016!) And the winner is..?! We will know tomorrow night! In the meanwhile, congrats to my friends Sudipto Banerjee, Igor Prünster, Sylvia Richardson, and Judith Rousseau who got nominated IMS Fellows tonight.
11am: Fast Dynamic Posterior Exploration for Factor Augmented Multivariate Regression byVeronika Rockova
Advancements in high-throughput experimental techniques have facilitated the availability of diverse genomic data, which provide complementary information regarding the function and organization of gene regulatory mechanisms. The massive accumulation of data has increased demands for more elaborate modeling approaches that combine the multiple data platforms. We consider a sparse factor regression model, which augments the multivariate regression approach by adding a latent factor structure, thereby allowing for dependent patterns of marginal covariance between the responses. In order to enable the identication of parsimonious structure, we impose spike and slab priors on the individual entries in the factor loading and regression matrices. The continuous relaxation of the point mass spike and slab enables the implementation of a rapid EM inferential procedure for dynamic posterior model exploration. This is accomplished by considering a nested sequence of spike and slab priors and various factor space cardinalities. Identied candidate models are evaluated by a conditional posterior model probability criterion, permitting trans-dimensional comparisons. Patterned sparsity manifestations such as an orthogonal allocation of zeros in factor loadings are facilitated by structured priors on the binary inclusion matrix. The model is applied to a problem of integrating two genomic datasets, where expression of microRNA’s is related to the expression of genes with an underlying connectivity pathway network.
The València 9 meeting in Benidorm is now over, even for those who stay till the end of the party (!)… In retrospect, I found the scientific quality of this last meeting of the series quite high and I am thus sad this series comes to an end. This mythical gathering of “true believers” on a Valencianos beach town certainly had a charm not found in other meetings (even though I have no particular love of beaches, of beach towns or of cabarets) in that it brought people really together for a rather long time in an intense and sometime heated exchange of ideas. (This secluded perspective of course reinforced the caricatures of Bayesians as sectarians!) This was particularly true this time as the huge majority of people stayed in the same (awful) hotel. Also, the fact that there was no parallel sessions was a major factor to keep people together… (The fact that the afternoon sessions were administered by ISBA rather than the València 9 scientific committee had the drawback of sometimes producing similar talks.) In my personal view, there were somehow too many non-parametric and sparsity sessions/talks, but this follows the research trends in the community (after all in the 1994 meeting, there were also “too many” MCMC talks!) And the discussions from the floor were much more limited than in the earlier meetings (but most invited discussions were a clear added value to the talks). Maybe this is due to the growing Bayesian community. As in earlier editions, the poster sessions were a strong moment with the frustrating drawback of having too many posters in a single session to allow for a complete coverage (unless you were ready to stay up till 2am…) Again a consequence of the size of the audience. But it was a pleasure to see how Bayesian statistics was well and alive and how the community was bridging old-timers having attending all of the nine Valencia meetings with newcomers still writing their PhD. (Congrats to Emily Fox and to James Scott for their respective Savage awards!)
Darren Wilkinson also gives an overview of the “last Valencia meeting” on his blog. This post includes a detailed analysis of the GPU solution enthusiatically defended by Chris Holmes. Since I came back from the meeting with ideas towards parallel accelerations for MCMC algorithms, I will look carefully at his arguments.
For the final day of the meeting, after a good one hour run to the end of the Benidorm bay (for me at least!), we got treated to great talks, culminating with the fitting conclusion given by the conference originator, José Bernardo. The first talk of the day was Guido Consonni’s, who introduced a new class of non-local priors to deal with variable selection. From my understanding, those priors avoid a neighbourhood of zero by placing a polynomial prior on the regression coefficients in order to discriminate better between the null and the alternative,
but the influence of the power h seems to be drastic, judging from the example showed by Guido where a move from h=0 to h=1, modified the posterior probability from 0.091 to 0.99 for the same dataset. The discussion by Jim Smith was a perfect finale to the Valencia meetings, Jim being much more abrasive than the usual discussant (while always giving the impression of being near a heart attack//!) The talk from Sylvia Früwirth-Schnatter purposely borrowed Nick Polson’ s title Shrink globally, act locally, and was also dealing with the Bayesian (re)interpretation of Lasso. (I was again left with the impression of hyperparameters that needed to be calibrated but this impression may change after I read the paper!) The talk by Xiao-Li Meng was as efficient as ever with Xiao-Li! Despite the penalising fact of being based on a discussion he wrote for Statistical Science, he managed to convey a global and convincing picture of likelihood inference in latent variable models, while having the audience laugh most of the talk, a feat repeated by his discussant, Ed George. The basic issue of treating latent variables as parameters offers no particular difficulty in Bayesian inference but this is not true for likelihood models, as shown by both Xiao-Li and Ed. The last talk of the València series managed to make a unifying theory out of the major achievements of José Bernardo and, while I have some criticisms about the outcome, this journey back to decision theory, intrinsic losses and reference priors was nonetheless a very appropriate supplementary contribution of José to this wonderful series of meetings…. Luis Perricchi discussed the paper in a very opinionated manner, defending the role of the Bayes factor, and the debate could have gone forever…Hopefully, I will find time to post my comments on José’s paper.
I am quite sorry I had to leave before the Savage prize session where the four finalists to the prize gave a lecture. Those finalists are of the highest quality as the prize is not given in years when the quality of the theses is not deemed high enough. I will also miss the final evening during which the DeGroot Prize is attributed. (When I received the prize for Bayesian Core. in 2004, I had also left in the morning Valparaiso, just before the banquet!)
At the monthly meeting of the Apprentissage et Sparsité group run by Sacha Tsybakov at CREST, Ismael Castillo will discuss tomorrow the recent paper General maximum likelihood empirical Bayes estimation of normal means by Wenhua Jiang and Cun-Hui Zhang just published in the Annals of Statistics (37(4), 1647-1684). (The paper is available on arXiv.) An interesting sentence from the abstract is that “the GMLEB outperforms the JamesStein and several state-of-the-art threshold estimators in a wide range of settings without much down side”. This attracted my attention given my earlier work on James-Stein estimators and I took a quick look at the paper to see what new aspects could be uncovered about 50 years after the original James and Stein’s paper. The setting is the original normal mean estimation problem under squared error loss and the GLMEB estimate is based on the non-parametric estimate of the mixing distribution
as the (empirical) Bayes estimator associated with . The domination advertised in the abstract seems to be related to an integrated squared error loss under an unknown , which thus does not clash with the robust minimaxity of the original James-Stein estimator… Anyway, if you are interested and in Paris next Thursday, Dec. 3, the discussion is from 3pm to 4:30pm at ENSAE, Salle S8.