Marco Banterle, Clara Grazian, Anthony Lee, and myself just arXived our paper “Accelerating Metropolis-Hastings algorithms by delayed acceptance“, which is an major revision and upgrade of our “Delayed acceptance with prefetching” paper of last June. Paper that we submitted at the last minute to NIPS, but which did not get accepted. The difference with this earlier version is the inclusion of convergence results, in particular that, while the original Metropolis-Hastings algorithm dominates the delayed version in Peskun ordering, the later can improve upon the original for an appropriate choice of the early stage acceptance step. We thus included a new section on optimising the design of the delayed step, by picking the optimal scaling à la Roberts, Gelman and Gilks (1997) in the first step and by proposing a ranking of the factors in the Metropolis-Hastings acceptance ratio that speeds up the algorithm. The algorithm thus got adaptive. Compared with the earlier version, we have not pursued the second thread of prefetching as much, simply mentioning that prefetching and delayed acceptance could be merged. We have also included a section on the alternative suggested by Philip Nutzman on the ‘Og of using a growing ratio rather than individual terms, the advantage being the probability of acceptance stabilising when the number of terms grows, with the drawback being that expensive terms are not always computed last. In addition to our logistic and mixture examples, we also study in this version the MALA algorithm, since we can postpone computing the ratio of the proposals till the second step. The gain observed in one experiment is of the order of a ten-fold higher efficiency. By comparison, and in answer to one comment on Andrew’s blog, we did not cover the HMC algorithm, since the preliminary acceptance step would require the construction of a proxy to the acceptance ratio, in order to avoid computing a costly number of derivatives in the discretised Hamiltonian integration.
Archive for MALA
Last and maybe most exciting day of the “High-dimensional Stochastic Simulation and Optimisation in Image Processing” in Bristol as it was exclusively about simulation (MCMC) methods. Except my own talk on ABC. And Peter Green’s on consistency of Bayesian inference in non-regular models. The talks today were indeed about using convex optimisation devices to speed up MCMC algorithms with tools that were entirely new to me, like the Moreau transform discussed by Marcelo Pereyra. Or using auxiliary variables à la RJMCMC to bypass expensive Choleski decompositions. Or optimisation steps from one dual space to the original space for the same reason. Or using pseudo-gradients on partly differentiable functions in the talk by Sylvain Lecorff on a paper commented earlier in the ‘Og. I particularly liked the notion of Moreau regularisation that leads to more efficient Langevin algorithms when the target is not regular enough. Actually, the discretised diffusion itself may be geometrically ergodic without the corrective step of the Metropolis-Hastings acceptance. This obviously begs the question of an extension to Hamiltonian Monte Carlo. And to multimodal targets, possibly requiring as many normalisation factors as there are modes. So, in fine, a highly informative workshop, with the perfect size and the perfect crowd (which happened to be predominantly French, albeit from a community I did not have the opportunity to practice previously). Massive kudos to Marcello for putting this workshop together, esp. on a week where family major happy events should have kept him at home!
As the workshop ended up in mid-afternoon, I had plenty of time for a long run with Florence Forbes down to the Avon river and back up among the deers of Ashton Court, avoiding most of the rain, all of the mountain bikes on a bike trail that sounded like trail running practice, and building enough of an appetite for the South Indian cooking of the nearby Thali Café. Brilliant!
After a nice morning run down Leigh Woods and on the muddy banks of the Avon river, I attended a morning session on hyperspectral image non-linear modelling. Topic about which I knew nothing beforehand. Hyperspectral images are 3-D images made of several wavelengths to improve their classification as a mixture of several elements. The non-linearity is due to the multiple reflections from the ground as well as imperfections in the data collection. I found this new setting of clear interest, from using mixtures to exploring Gaussian processes and Hamiltonian Monte Carlo techniques on constrained spaces… Not to mention the “debate” about using Bayesian inference versus optimisation. It was overall a day of discovery as I am unaware of the image processing community (being the outlier in this workshop!) and of their techniques. The problems mostly qualify as partly linear high-dimension inverse problems, with rather standard if sometimes hybrid MCMC solutions. (The day ended even more nicely with another long run in the fields of Ashton Court and a conference diner by the river…)
Even though I flew through Birmingham (and had to endure the fundamental randomness of trains in Britain), I managed to reach the “High-dimensional Stochastic Simulation and Optimisation in Image Processing” conference location (in Goldney Hall Orangery) in due time to attend the (second) talk by Christophe Andrieu. He started with an explanation of the notion of controlled Markov chain, which reminded me of our early and famous-if-unpublished paper on controlled MCMC. (The label “controlled” was inspired by Peter Green who pointed out to us the different meanings of controlled in French [meaning checked or monitored] and in English . We use it here in the English sense, obviously.) The main focus of the talk was on the stability of controlled Markov chains. With of course connections with out controlled MCMC of old, for instance the case of the coerced acceptance probability. Which happened to be not that stable! With the central tool being Lyapounov functions. (Making me wonder whether or not it would make sense to envision the meta-problem of adaptively estimating the adequate Lyapounov function from the MCMC outcome.)
As I had difficulties following the details of the convex optimisation talks in the afternoon, I eloped to work on my own and returned to the posters & wine session, where the small number of posters allowed for the proper amount of interaction with the speakers! Talking about the relevance of variational Bayes approximations and of possible tools to assess it, about the use of new metrics for MALA and of possible extensions to Hamiltonian Monte Carlo, about Bayesian modellings of fMRI and of possible applications of ABC in this framework. (No memorable wine to make the ‘Og!) Then a quick if reasonably hot curry and it was already bed-time after a rather long and well-filled day!z
As the second day at MCqMC 2014, was mostly on multi-level Monte Carlo and quasi-Monte Carlo methods, I did not attend many talks but had a long run in the countryside (even saw a pheasant and a heron), worked at “home” on pressing recruiting evaluations and had a long working session with Pierre Jacob. Plus an evening out sampling (just) a few Belgian beers in the shade of the city hall…
Today was more in my ballpark as there were MCMC talks the whole day! The plenary talk was not about MCMC as Erich Novak presented a survey on the many available results bounding the complexity of approximating an integral based on a fixed number of evaluations of the integrand, some involving the dimension (and its curse), some not, some as fast as √n and some not as fast, all this depending on the regularity and the size of the classes of integrands considered. In some cases, the solution was importance sampling, in other cases, quasi-Monte Carlo, and yet other cases were still unsolved. Then Yves Atchadé gave a new perspective on computing the asymptotic variance in the central limit theorem on Markov chains when truncating the autocovariance, Matti Vihola talked about theoretical orderings of Markov chains that transmuted into the very practical consequence that using more simulations in a pseudo-marginal likelihood approximation improved acceptance rate and asymptotic variances (and this applies to aBC-MCMC as well), Radu Craiu proposed a novel processing of adaptive MCMC by treating various approximations to the true target as food for a multiple-try Metropolis algorithm, and Luca Martino had a go at resuscitating the ARMS algorithm of Gilks and Wild (used for a while in BUGS), although the talk did not dissipate all of my misgivings about the multidimensional version! I had more difficulties following the “Warwick session” which was made of four talks by current or former students from Warwick, although I appreciated the complexity of the results in infinite dimensional settings and novel approximations to diffusion based Metropolis algorithms. No further session this afternoon as the “social” activity was to visit the nearby Stella Artois brewery! This activity made us very social, for certain, even though there was hardly a soul around in this massively automated factory. (Maybe an ‘Og post to come one of those days…)
The intense pace of the two first days of our workshop on MCMC at ICMS had apparently taken an heavy toll on the participants as a part of the audience was missing this morning! Although not as a consequence of the haggis of the previous night at the conference dinner, nor even as a result of the above pace. In fact, the missing participants had opted ahead of time for leaving the workshop early, which is understandable given everyone’s busy schedule, esp. for those attending both Bristol and Edinburgh workshops, however slightly impacting the atmosphere of the final day. (Except for Mark Girolami who most unfortunately suffered such a teeth infection that he had to seek urgent medical assistance yesterday afternoon. Best wishes to Mark for a prompt recovery, say I with a dental appointment tomorrow…!)
The plenary talk of the day was delivered by Heikki Haario, who provided us with a survey of the (adaptive) MCMC advances he and his collaborators had made in the analysis of complex and immensely high-dimensional weather models. This group of Finnish researchers, who started from inverse problem analysis rather than from MCMC, have had a major impact on the design and validation of adaptive MCMC algorithms, especially in the late 1990’s. (Heikki also was a co-organizer of the Adap’ski workshops, workshops that may be continued, stay tuned!) The next talk, by Marko Laine, was also about adaptive MCMC algorithms, with the difference that the application was climate modelling. It contained interesting directions about early stopping (“early rejection”, as opposed to “delayed rejection”) of diverging proposals (gaining 80% in computing time!) and about parallel adaptation. Still in the same theme, Gersende Fort explained the adaptive version of the equi-energy sampler she and co-authors had recently developed. Although she had briefly presented this paper in Banff a month ago, I found the talk quite informative about the implementation of the method and at the perfect technical level (for me!).
In [what I now perceive as] another recurrent theme of the workshop, namely the recourse to Gaussian structures like Gaussian processes (see, e.g., Ian Murray’s talk yesterday), Andrew Stuart gave us a light introduction to random walk Metropolis-Hastings algorithms on Hilbert spaces. In particular, he related to Ian Murray’s talk of yesterday as to the definition of a “new” random walk (due to Radford Neal) that makes a proposal
that still preserves the acceptance probability of the original (“old”) random walk proposal. The final talks of the morning were Krys Latuszynski’s and Nick Whiteley’s very pedagogical presentations of the convergence properties of manifold MALA and of particle filters for hidden Markov models. In both cases, the speakers avoided the overly technical details and provided clear intuition in the presented results, a great feat after those three intense days of talks! (Having attended Nick’s talk in Paris two weeks ago helped of course.)
Unfortunately, due to very limited flight options (after one week of traveling around the UK) and also being slightly worried at the idea of missing my flight!, I had to leave the meeting along with all my French colleagues right after Jean-Michel Marin’s talk on (hidden) Potts driven mixtures, explaining the computational difficulties in deriving marginal likelihoods. I thus missed the final talk of the workshop by Gareth Tribello. And delivering my final remarks at the lunch break.
Overall, when reflecting on those two Monte Carlo workshops, I feel I preferred the pace of the Bristol workshop, because it allowed for more interactions between the participants by scheduling less talks… This being said, the organization at ICMS was superb (as usual!) and the talks were uniformly very good so it also was a very profitable meeting, of a different kind! As written earlier, among other things, it induced (in me) some reflections on a possible new research topic with friends there. Looking forward to visit Scotland again, of course!
Parsimonious Bayesian Factor Analysis when the Number of Factors is Unknown
We introduce a new and general set of identifiability conditions for factor models which handles the ordering problem associated with current common practice. In addition, the new class of parsimonious Bayesian factor analysis leads to a factor loading matrix representation which is an intuitive and easy to implement factor selection scheme. We argue that the structuring the factor loadings matrix is in concordance with recent trends in applied factor analysis. Our MCMC scheme for posterior inference makes several improvements over the existing alternatives while outlining various strategies for conditional posterior inference in a factor selection scenario. Four applications, two based on synthetic data and two based on well known real data, are introduced to illustrate the applicability and generality of the new class of parsimonious factor models, as well as to highlight features of the proposed sampling schemes. (Joint work with Sylvia Fruhwirth-Schnatter, Univ. of Linz – Austria).
The seminar is at 3pm (maybe a wee later if I am running late, as I am registered for the annual 10k Bercy race two hours before in the Bois de Vincennes!), at Institut Henri Poincaré. It will be followed by a second seminar by Andreas Eberle on the Metropolis-adjusted Langevin algorithm, MALA (a topic my coauthor Natesh Pillai recently worked on. A pity he only arrives in Paris the next Monday and thus misses the talk!)