**W**hile attending my last session at MCqMC 2018, in Rennes, before taking a train back to Paris, I was confronted by this radical opinion upon our previous work with Matt Moores (Warwick) and other coauthors from QUT, where the speaker, Maksym Byshkin from Lugano, defended a new approach for maximum likelihood estimation using novel MCMC methods. Based on the point fixe equation characterising maximum likelihood estimators for exponential families, when theoretical and empirical moments of the natural statistic are equal. Using a Markov chain with stationary distribution the said exponential family, the fixed point equation can be turned into a zero divergence equation, requiring simulation of pseudo-data from the model, which depends on the unknown parameter. Breaking this circular argument, the authors note that simulating pseudo-data that reproduce the observed value of the sufficient statistic is enough. Which is related with Geyer and Thomson (1992) famous paper about Monte Carlo maximum likelihood estimation. From there I was and remain lost as I cannot see why a derivative of the expected divergence with respect to the parameter θ can be computed when this divergence is found by Monte Carlo rather than exhaustive enumeration. And later used in a stochastic gradient move on the parameter θ… Especially when the null divergence is imposed on the parameter. In any case, the final slide shows an application to a large image and an Ising model, solving the problem (?) in 140 seconds and suggesting indecency, when our much slower approach is intended to produce a complete posterior simulation in this context.

## Archive for Bayesian optimisation

## indecent exposure

Posted in Statistics with tags ABC, Bayesian optimisation, Bretagne, Brittany, exponential families, image analysis, image processing, inference, Lugano, maximum likelihood estimation, MCqMC 2018, pre-processing, Rennes on July 27, 2018 by xi'an## European statistics in Finland [EMS17]

Posted in Books, pictures, Running, Statistics, Travel, University life with tags ABC, AISTATS 2016, Amazon, AMIS, Bayesian optimisation, deterministic mixtures, EMS 2017, Europe, European Meeting of Statisticians, exact Monte Carlo, Helsinki, INLA, particle filters, probabilistic numerics, University of Helsinki on August 2, 2017 by xi'an**W**hile this European meeting of statisticians had a wide range of talks and topics, I found it to be more low key than the previous one I attended in Budapest, maybe because there was hardly any talk there in applied probability. (But there were some sessions in mathematical statistics and Mark Girolami gave a great entry to differential geometry and MCMC, in the spirit of his 2010 discussion paper. Using our recent trip to Montréal as an example of geodesic!) In the Bayesian software session [organised by Aki Vetahri], Javier Gonzáles gave a very neat introduction to Bayesian optimisation: he showed how optimisation can be turned into Bayesian inference or more specifically as a Bayesian decision problem using a loss function related to the problem of interest. The point in following a Bayesian path [or probabilist numerics] is to reduce uncertainty by the medium of prior measures on functions, although resorting [as usual] to Gaussian processes whose arbitrariness I somehow dislike within the infinity of priors (aka stochastic processes) on functions! One of his strong arguments was that the approach includes the possibility for design in picking the next observation point (as done in some ABC papers of Michael Guttman and co-authors, incl. the following talk at EMS 2017) but again the devil may be in the implementation when looking at minimising an objective function… The notion of the myopia of optimisation techniques was another good point: only looking one step ahead in the future diminishes the returns of the optimisation and an alternative presented at AISTATS 2016 [that I do not remember seeing in Càdiz] goes against this myopia.

Umberto Piccini also gave a talk on exploiting synthetic likelihoods in a Bayesian fashion (in connection with the talk he gave last year at MCqMC 2016). I wondered at the use of INLA for this Gaussian representation, as well as at the impact of the parameterisation of the summary statistics. And the session organised by Jean-Michel involved Jimmy Olson, Murray Pollock (Warwick) and myself, with great talks from both other speakers, on PaRIS and PaRISian algorithms by Jimmy, and on a wide range of exact simulation methods of continuous time processes by Murray, both managing to convey the intuition behind their results and avoiding the massive mathematics at work there. By comparison, I must have been quite unclear during my talk since someone interrupted me about how Owen & Zhou (2000) justified their deterministic mixture importance sampling representation. And then left when I could not make sense of his questions [or because it was lunchtime already].

## ABC in Stockholm [on-board again]

Posted in Kids, pictures, Statistics, Travel, University life with tags ABC, ABC in Helsinki, ABCruise, acquisition function, Baltic Sea, Bayesian optimisation, cabin, conference fees, cruise, Finland, gaussian process, Helsinki, sea, state space model, Stockholm, Sweden, workshop on May 18, 2016 by xi'an**A**fter a smooth cruise from Helsinki to Stockholm, a glorious sunrise over the Ålend Islands, and a morning break for getting an hasty view of the city, ABC in Helsinki (a.k.a. ABCruise) resumed while still in Stockholm. The first talk was by Laurent Calvet about dynamic (state-space) models, when the likelihood is not available and replaced with a proximity between the observed and the simulated observables, at each discrete time in the series. The authors are using a proxy predictive for the incoming observable and derive an optimal—in a non-parametric sense—bandwidth based on this proxy. Michael Gutmann then gave a presentation that somewhat connected with his talk at ABC in Roma, and poster at NIPS 2014, about using Bayesian optimisation to reduce the rejections in ABC algorithms. Which means building a model of a discrepancy or distance by Bayesian optimisation. I definitely like this perspective as it reduces the simulation to one of a discrepancy (after a learning step). And does not require a threshold. Aki Vehtari expanded on this idea with a series of illustrations. A difficulty I have with the approach is the construction of the acquisition function… The last session while pretty late was definitely exciting with talks by Richard Wilkinson on surrogate or emulator models, which goes very much in a direction I support, namely that approximate models should be accepted on their own, by Julien Stoehr with clustering and machine learning tools to incorporate more summary statistics, and Tim Meeds who concluded with two (small) talks!, centred on the notion of deterministic algorithms that explicitly incorporate the random generators within the comparison, resulting in post-simulation recentering à la Beaumont et al. (2003), plus new advances with further incorporations of those random generators turned deterministic functions within variational Bayes inference…

On Wednesday morning, we will land back in Helsinki and head back to our respective homes, after another exciting ABC in… workshop. I am terribly impressed by the way this workshop at sea operated, providing perfect opportunities for informal interactions and collaborations, without ever getting claustrophobic or dense. Enjoying very long days also helped. While it seems unlikely we can repeat this successful implementation, I hope we can aim at similar formats in the coming occurrences. Kitos paljon to our Finnish hosts!

## AISTATS 2016 [#1]

Posted in pictures, R, Running, Statistics, Travel, Wines with tags AISTATS 2016, Bayesian optimisation, Cadiz, conference, corrida, ensemble Monte Carlo, machine learning, MCMC, R, random forests, reproducing kernel Hilbert space, Spain, tapas on May 11, 2016 by xi'an**T**ravelling through Seville, I arrived in Càdiz on Sunday night, along with a massive depression [weather-speaking!]. Walking through the city from the station was nonetheless pleasant as this is an town full of small streets and nice houses. If with less churches than Seville! Richard Samworth gave the first plenary talk of AISTATS 2016 with a presentation on random projections for classification. His classifier is based on an average of a large number of linear random projections of the original data where the projections are chosen as minimising the prediction error over a subset of the components. The performances of this approach seem to be consistently higher than for random forests, which makes it definitely worth investigating further. (A related R package is available.)

The following talks that day covered Bayesian optimisation and probabilistic numerics, with Javier Gonzales introducing *glasses* for Bayesian optimisation in order to solve its myopia (!)—by which he meant predicting the output of the optimisation over n future steps. And a first mention of the Pima Indians by Daniel Hernandez-Lobato in his talk about EP with stochastic gradient steps towards optimisation. (As well as much larger datasets.) And Mark Girolami bringing quasi-Monte Carlo into control variates. A kernel based ABC by Mijung Park, which uses kernels and maximum mean discrepancy to avoid defining summary statistics, and a version of parallel MCMC by Guillaume Basse. Plus another session on deep learning.

As usual with AISTATS conferences, the central activity of the day was the noon poster session, including speakers discussing their paper, and I had several interesting chats about MCMC related topics, with e.g. one alternative notion of ensemble MCMC [centred on estimating the normalising constant].

We awarded the notable student paper awards before the welcoming cocktail: The winners are Bo Dai, Nedelina Teneva, and Ye Wang. And this first day ended up with a companionable evening in a most genuine tapa bar, tasting local blood sausage and local blue cheese. (If you do not mind the corrida theme!)

## Inference for stochastic simulation models by ABC

Posted in Books, Statistics, University life with tags ABC, ABC validation, Bayesian optimisation, non-parametrics, sufficiency, synthetic likelihood on February 13, 2015 by xi'an**H**artig et al. published a while ago (2011) a paper in *Ecology Letters* entitled “Statistical inference for stochastic simulation models – theory and application”, which is mostly about ABC. (Florian Hartig pointed out the paper to me in a recent blog comment. about my discussion of the early parts of Guttman and Corander’s paper.) The paper is largely a tutorial and it reminds the reader about related methods like indirect inference and methods of moments. The authors also insist on presenting ABC as a particular case of likelihood approximation, whether non-parametric or parametric. Making connections with pseudo-likelihood and pseudo-marginal approaches. And including a discussion of the possible misfit of the assumed model, handled by an external error model. And also introducing the notion of *informal likelihood* (which could have been nicely linked with *empirical likelihood*). A last class of approximations presented therein is called *rejection filters* and reminds me very much of Ollie Ratman’s papers.

“Our general aim is to find sufficient statistics that are as close to minimal sufficiency as possible.” (p.819)

As in other ABC papers, and as often reported on this blog, I find the stress on sufficiency a wee bit too heavy as those models calling for approximation almost invariably do not allow for any form of useful sufficiency. Hence the mathematical statistics notion of sufficiency is mostly useless in such settings.

“A basic requirement is that the expectation value of the point-wise approximation of p(S^{obs}|φ) must be unbiased” (p.823)

As stated above the paper is mostly in tutorial mode, for instance explaining what MCMC and SMC methods are. As illustrated by the above figure. There is however a final and interesting discussion section on the impact of estimating the likelihood function at different values of the parameter. However, the authors seem to focus solely on pseudo-marginal results to validate this approximation, hence on unbiasedness, which does not work for most ABC approaches that I know. And for the approximations listed in the survey. Actually, it would be quite beneficial to devise a cheap tool to assess the bias or extra-variation due to the use of approximative techniques like ABC… A sort of 21st Century bootstrap?!

## Bayesian computation: fore and aft

Posted in Books, Statistics, University life with tags ABC, adaptive MCMC methods, Bayesian Analysis, Bayesian computation, Bayesian optimisation, expectation-propagation, MCMC algorithms, pseudo-marginal MCMC, Statistics and Computing, survey, University of Bristol, University of Warwick, variational Bayes methods on February 6, 2015 by xi'an**W**ith my friends Peter Green (Bristol), Krzysztof Łatuszyński (Warwick) and Marcello Pereyra (Bristol), we just arXived the first version of “Bayesian computation: a perspective on the current state, and sampling backwards and forwards”, which first title was the title of this post. This is a survey of our own perspective on Bayesian computation, from what occurred in the last 25 years [a lot!] to what could occur in the near future [a lot as well!]. Submitted to Statistics and Computing towards the special 25th anniversary issue, as announced in an earlier post.. Pulling strength and breadth from each other’s opinion, we have certainly attained more than the sum of our initial respective contributions, but we are welcoming comments about bits and pieces of importance that we miss and even more about promising new directions that are not posted in this survey. (A warning that is should go with most of my surveys is that my input in this paper will not differ by a large margin from ideas expressed here or in previous surveys.)

## Bayesian optimization for likelihood-free inference of simulator-based statistical models

Posted in Books, Statistics, University life with tags ABC, ABC validation, Bayesian optimisation, non-parametrics, synthetic likelihood on January 29, 2015 by xi'an**M**ichael Gutmann and Jukka Corander arXived this paper two weeks ago. I read part of it (mostly the extended introduction part) on the flight from Edinburgh to Birmingham this morning. I find the reflection it contains on the nature of the ABC approximation quite deep and thought-provoking. Indeed, the major theme of the paper is to visualise ABC (which is admittedly shorter than “likelihood-free inference of simulator-based statistical models”!) as a regular computational method based on an approximation of the likelihood function at the observed value, y_{obs}. This includes for example Simon Wood’s synthetic likelihood (who incidentally gave a talk on his method while I was in Oxford). As well as non-parametric versions. In both cases, the approximations are based on repeated simulations of pseudo-datasets for a given value of the parameter θ, either to produce an estimation of the mean and covariance of the sampling model as a function of θ or to construct genuine estimates of the likelihood function. As assumed by the authors, this calls for a small dimension θ. This approach actually allows for the inclusion of the synthetic approach as a lower bound on a non-parametric version.

In the case of Wood’s synthetic likelihood, two questions came to me:

- the estimation of the mean and covariance functions is usually not smooth because new simulations are required for each new value of θ. I wonder how frequent is the case where we can always use the same basic random variates for all values of θ. Because it would then give a smooth version of the above. In the other cases, provided the dimension is manageable, a Gaussian process could be first fitted before using the approximation. Or any other form of regularization.
- no mention is made [in the current paper] of the impact of the parametrization of the summary statistics. Once again, a Cox transform could be applied to each component of the summary for a better proximity of/to the normal distribution.

When reading about a non-parametric approximation to the likelihood (based on the summaries), the questions I scribbled on the paper were:

- estimating a complete density when using this estimate at the single point y
_{obs}could possibly be superseded by a more efficient approach. - the authors study a kernel that is a function of the difference or distance between the summaries and which is maximal at zero. This is indeed rather frequent in the ABC literature, but does it impact the convergence properties of the kernel estimator?
- the estimation of the tolerance, which happens to be a bandwidth in that case, does not appear to be processed in this paper, which could explain for very low probabilities of acceptance mentioned in the paper.
- I am lost as to why lower bounds on likelihoods are relevant here. Unless this is intended for ABC maximum likelihood estimation.

Guttmann and Corander also comment on the first point, through the cost of producing a likelihood estimator. They therefore suggest to resort to regression and to avoid regions of low estimated likelihood. And rely on Bayesian optimisation. (Hopefully to be commented later.)