Archive for Bayesian optimisation

parallel tempering on optimised paths

Posted in Statistics with tags , , , , , , , , , , , , , , , on May 20, 2021 by xi'an

Saifuddin Syed, Vittorio Romaniello, Trevor Campbell, and Alexandre Bouchard-Côté, whom I met and discussed with on my “last” trip to UBC, on December 2019, just arXived a paper on parallel tempering (PT), making the choice of tempering path an optimisation problem. They address the touchy issue of designing a sequence of tempered targets when the starting distribution π⁰, eg the prior, and the final distribution π¹, eg the posterior, are hugely different, eg almost singular.

“…theoretical analysis of reversible variants of PT has shown that adding too many intermediate chains can actually deteriorate performance (…) [while] on non reversible regime adding more chains is guaranteed to improve performances.”

The above applies to geometric combinations of π⁰ and π¹. Which “suffers from an arbitrarily suboptimal global communication barrier“, according to the authors (although the counterexample is not completely convincing since π⁰ and π¹ share the same variance). They propose a more non-linear form of tempering with constraints on the dependence of the powers on the temperature t∈(0,1).  Defining the global communication barrier as an average over temperatures of the rejection rate, the path characteristics (e.g., the coefficients of a spline function) can then be optimised in terms of this objective. And the temperature schedule is derived from the fact that the non-asymptotic round trip rate is maximized when the rejection rates are all equal. (As a side item, the technique exposed in the earlier tempering paper by Syed et al. was recently exploited for a night high resolution imaging of a black hole from the M87 galaxy.)

indecent exposure

Posted in Statistics with tags , , , , , , , , , , , , on July 27, 2018 by xi'an

While attending my last session at MCqMC 2018, in Rennes, before taking a train back to Paris, I was confronted by this radical opinion upon our previous work with Matt Moores (Warwick) and other coauthors from QUT, where the speaker, Maksym Byshkin from Lugano, defended a new approach for maximum likelihood estimation using novel MCMC methods. Based on the point fixe equation characterising maximum likelihood estimators for exponential families, when theoretical and empirical moments of the natural statistic are equal. Using a Markov chain with stationary distribution the said exponential family, the fixed point equation can be turned into a zero divergence equation, requiring simulation of pseudo-data from the model, which depends on the unknown parameter. Breaking this circular argument, the authors note that simulating pseudo-data that reproduce the observed value of the sufficient statistic is enough. Which is related with Geyer and Thomson (1992) famous paper about Monte Carlo maximum likelihood estimation. From there I was and remain lost as I cannot see why a derivative of the expected divergence with respect to the parameter θ can be computed when this divergence is found by Monte Carlo rather than exhaustive enumeration. And later used in a stochastic gradient move on the parameter θ… Especially when the null divergence is imposed on the parameter. In any case, the final slide shows an application to a large image and an Ising model, solving the problem (?) in 140 seconds and suggesting indecency, when our much slower approach is intended to produce a complete posterior simulation in this context.

European statistics in Finland [EMS17]

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 2, 2017 by xi'an

While this European meeting of statisticians had a wide range of talks and topics, I found it to be more low key than the previous one I attended in Budapest, maybe because there was hardly any talk there in applied probability. (But there were some sessions in mathematical statistics and Mark Girolami gave a great entry to differential geometry and MCMC, in the spirit of his 2010 discussion paper. Using our recent trip to Montréal as an example of geodesic!) In the Bayesian software session [organised by Aki Vetahri], Javier Gonzáles gave a very neat introduction to Bayesian optimisation: he showed how optimisation can be turned into Bayesian inference or more specifically as a Bayesian decision problem using a loss function related to the problem of interest. The point in following a Bayesian path [or probabilist numerics] is to reduce uncertainty by the medium of prior measures on functions, although resorting [as usual] to Gaussian processes whose arbitrariness I somehow dislike within the infinity of priors (aka stochastic processes) on functions! One of his strong arguments was that the approach includes the possibility for design in picking the next observation point (as done in some ABC papers of Michael Guttman and co-authors, incl. the following talk at EMS 2017) but again the devil may be in the implementation when looking at minimising an objective function… The notion of the myopia of optimisation techniques was another good point: only looking one step ahead in the future diminishes the returns of the optimisation and an alternative presented at AISTATS 2016 [that I do not remember seeing in Càdiz] goes against this myopia.

Umberto Piccini also gave a talk on exploiting synthetic likelihoods in a Bayesian fashion (in connection with the talk he gave last year at MCqMC 2016). I wondered at the use of INLA for this Gaussian representation, as well as at the impact of the parameterisation of the summary statistics. And the session organised by Jean-Michel involved Jimmy Olson, Murray Pollock (Warwick) and myself, with great talks from both other speakers, on PaRIS and PaRISian algorithms by Jimmy, and on a wide range of exact simulation methods of continuous time processes by Murray, both managing to convey the intuition behind their results and avoiding the massive mathematics at work there. By comparison, I must have been quite unclear during my talk since someone interrupted me about how Owen & Zhou (2000) justified their deterministic mixture importance sampling representation. And then left when I could not make sense of his questions [or because it was lunchtime already].

ABC in Stockholm [on-board again]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on May 18, 2016 by xi'an

abcruiseAfter a smooth cruise from Helsinki to Stockholm, a glorious sunrise over the Ålend Islands, and a morning break for getting an hasty view of the city, ABC in Helsinki (a.k.a. ABCruise) resumed while still in Stockholm. The first talk was by Laurent Calvet about dynamic (state-space) models, when the likelihood is not available and replaced with a proximity between the observed and the simulated observables, at each discrete time in the series. The authors are using a proxy predictive for the incoming observable and derive an optimal—in a non-parametric sense—bandwidth based on this proxy. Michael Gutmann then gave a presentation that somewhat connected with his talk at ABC in Roma, and poster at NIPS 2014, about using Bayesian optimisation to reduce the rejections in ABC algorithms. Which means building a model of a discrepancy or distance by Bayesian optimisation. I definitely like this perspective as it reduces the simulation to one of a discrepancy (after a learning step). And does not require a threshold. Aki Vehtari expanded on this idea with a series of illustrations. A difficulty I have with the approach is the construction of the acquisition function… The last session while pretty late was definitely exciting with talks by Richard Wilkinson on surrogate or emulator models, which goes very much in a direction I support, namely that approximate models should be accepted on their own, by Julien Stoehr with clustering and machine learning tools to incorporate more summary statistics, and Tim Meeds who concluded with two (small) talks!, centred on the notion of deterministic algorithms that explicitly incorporate the random generators within the comparison, resulting in post-simulation recentering à la Beaumont et al. (2003), plus new advances with further incorporations of those random generators turned deterministic functions within variational Bayes inference

On Wednesday morning, we will land back in Helsinki and head back to our respective homes, after another exciting ABC in… workshop. I am terribly impressed by the way this workshop at sea operated, providing perfect opportunities for informal interactions and collaborations, without ever getting claustrophobic or dense. Enjoying very long days also helped. While it seems unlikely we can repeat this successful implementation, I hope we can aim at similar formats in the coming occurrences. Kitos paljon to our Finnish hosts!

AISTATS 2016 [#1]

Posted in pictures, R, Running, Statistics, Travel, Wines with tags , , , , , , , , , , , , on May 11, 2016 by xi'an

Travelling through Seville, I arrived in Càdiz on Sunday night, along with a massive depression [weather-speaking!]. Walking through the city from the station was nonetheless pleasant as this is an town full of small streets and nice houses. If with less churches than Seville! Richard Samworth gave the first plenary talk of AISTATS 2016  with a presentation on random projections for classification. His classifier is based on an average of a large number of linear random projections of the original data where the projections are chosen as minimising the prediction error over a subset of the components. The performances of this approach seem to be consistently higher than for random forests, which makes it definitely worth investigating further. (A related R package is available.)

The following talks that day covered Bayesian optimisation and probabilistic numerics, with Javier Gonzales introducing glasses for Bayesian optimisation in order to solve its myopia (!)—by which he meant predicting the output of the optimisation over n future steps. And a first mention of the Pima Indians by Daniel Hernandez-Lobato in his talk about EP with stochastic gradient steps towards optimisation. (As well as much larger datasets.) And Mark Girolami bringing quasi-Monte Carlo into control variates. A kernel based ABC by Mijung Park, which uses kernels and maximum mean discrepancy to avoid defining summary statistics, and a version of parallel MCMC by Guillaume Basse. Plus another session on deep learning.

As usual with AISTATS conferences, the central activity of the day was the noon poster session, including speakers discussing their paper, and I had several interesting chats about MCMC related topics, with e.g. one alternative notion of ensemble MCMC [centred on estimating the normalising constant].

We awarded the notable student paper awards before the welcoming cocktail: The winners are Bo DaiNedelina Teneva, and Ye Wang.  And this first day ended up with a companionable evening in a most genuine tapa bar, tasting local blood sausage and local blue cheese. (If you do not mind the corrida theme!)