**L**ast Fall, George Papamakarios and Iain Murray from Edinburgh arXived an ABC paper on fast ε-free inference on simulation models with Bayesian conditional density estimation, paper that I missed. The idea there is to approximate the posterior density by maximising the likelihood associated with a parameterised family of distributions on θ, conditional on the associated x. The data being then the ABC reference table. The family chosen there is a mixture of K Gaussian components, which parameters are then estimated by a (Bayesian) neural network using x as input and θ as output. The parameter values are simulated from an adaptive proposal that aims at approximating the posterior better and better. As in population Monte Carlo, actually. Except for the neural network part, which I fail to understand why it makes a significant improvement when compared with EM solutions. The overall difficulty with this approach is that I do not see a way out of the curse of dimensionality: when the dimension of θ increases, the approximation to the posterior distribution of θ does deteriorate, even in the best of cases, as any other non-parametric resolution. It would have been of (further) interest to see a comparison with a most rudimentary approach, namely the one we proposed based on empirical likelihoods.

## Archive for Edinburgh

## fast ε-free ABC

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags ABC, ABC in Edinburgh, Arthur's Seat, arXiv, Edinburgh, empirical likelihood, Gaussian mixture, neural network, non-parametrics, Scotland on June 8, 2017 by xi'an## asymptotically exact inference in likelihood-free models [a reply from the authors]

Posted in R, Statistics with tags ABC, ABC-MCMC, arXiv, Edinburgh, generator, Hamiltonian, HMC, Jacobian, likelihood-free methods, Lotka-Volterra, manifold, normalisation, pseudo-marginal MCMC, quasi-Newton resolution, reply, Riemann manifold, simulation under restrictions, simulator model on December 1, 2016 by xi'an*[Following my post of lastTuesday, Matt Graham commented on the paper with force détails. Here are those comments. A nicer HTML version of the Markdown reply below is also available on Github.]*

Thanks for the comments on the paper!

A few additional replies to augment what Amos wrote:

This however sounds somewhat intense in that it involves a quasi-Newton resolution at each step.

The method is definitely computationally expensive. If the constraint function is of the form of a function from an M-dimensional space to an N-dimensional space, with M≥N, for large N the dominant costs at each timestep are usually the constraint Jacobian (∂c/∂u) evaluation (with reverse-mode automatic differentiation this can be evaluated at a cost of O(N) generator / constraint evaluations) and Cholesky decomposition of the Jacobian product (∂c/∂u)(∂c/∂u)‘ with O(N³) cost (though in many cases e.g. i.i.d. or Markovian simulated data, structure in the generator Jacobian can be exploited to give a significantly reduced cost). Each inner Quasi-Newton update involves a pair of triangular solve operations which have a O(N²) cost, two matrix-vector multiplications with O(MN) cost, and a single constraint / generator function evaluation; the number of Quasi-Newton updates required for convergence in the numerical experiments tended to be much less than N hence the Quasi-Newton iteration tended not to be the main cost.

The high computation cost per update is traded off however with often being able to make much larger proposed moves in high-dimensional state spaces with a high chance of acceptance compared to ABC MCMC approaches. Even in the relatively small Lotka-Volterra example we provide which has an input dimension of 104 (four inputs which map to ‘parameters’, and 100 inputs which map to ‘noise’ variables), the ABC MCMC chains using the coarse ABC kernel radius ϵ=100 with comparably very cheap updates were significantly less efficient in terms of effective sample size / computation time than the proposed constrained HMC approach. This was in large part due to the elliptical slice sampling updates in the ABC MCMC chains generally collapsing down to very small moves even for this relatively coarse ϵ. Performance was even worse using non-adaptive ABC MCMC methods and for smaller ϵ, and for higher input dimensions (e.g. using a longer sequence with correspondingly more random inputs) the comparison becomes even more favourable for the constrained HMC approach. Continue reading

## asymptotically exact inference in likelihood-free models

Posted in Books, pictures, Statistics with tags ABC, Arthur's Seat, Edinburgh, exact ABC, free energy, Hamiltonian Monte Carlo, manifold, Scotland on November 29, 2016 by xi'an

“We use the intuition that inference corresponds to integrating a density across the manifold corresponding to the set of inputs consistent with the observed outputs.”

Following my earlier post on that paper by Matt Graham and Amos Storkey (University of Edinburgh), I now read through it. The beginning is somewhat unsettling, albeit mildly!, as it starts by mentioning notions like variational auto-encoders, generative adversial nets, and simulator models, by which they mean generative models represented by a (differentiable) function g that essentially turn basic variates with density p into the variates of interest (with intractable density). A setting similar to Meeds’ and Welling’s optimisation Monte Carlo. Another proximity pointed out in the paper is Meeds et al.’s Hamiltonian ABC.

“…the probability of generating simulated data exactly matching the observed data is zero.”

The section on the standard ABC algorithms mentions the fact that ABC MCMC can be (re-)interpreted as a pseudo-marginal MCMC, albeit one targeting the ABC posterior instead of the original posterior. The starting point of the paper is the above quote, which echoes a conversation I had with Gabriel Stolz a few weeks ago, when he presented me his free energy method and when I could not see how to connect it with ABC, because having an exact match seemed to cancel the appeal of ABC, all parameter simulations then producing an exact match under the right constraint. However, the paper maintains this can be done, by looking at the joint distribution of the parameters, latent variables, and observables. Under the implicit restriction imposed by keeping the observables constant. Which defines a manifold. The mathematical validation is achieved by designing the density over this manifold, which looks like

if the constraint can be rewritten as g⁰(u)=0. (This actually follows from a 2013 paper by Diaconis, Holmes, and Shahshahani.) In the paper, the simulation is conducted by Hamiltonian Monte Carlo (HMC), the leapfrog steps consisting of an unconstrained move followed by a projection onto the manifold. This however sounds somewhat intense in that it involves a quasi-Newton resolution at each step. I also find it surprising that this projection step does not jeopardise the stationary distribution of the process, as the argument found therein about the approximation of the approximation is not particularly deep. But the main thing that remains unclear to me after reading the paper is how the constraint that the pseudo-data be equal to the observable data can be turned into a closed form condition like g⁰(u)=0. As mentioned above, the authors assume a generative model based on uniform (or other simple) random inputs but this representation seems impossible to achieve in reasonably complex settings.

## MCqMC 2016 [#4]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags Brittany, California, conference, Edinburgh, MCMC, MCqMC 2016, Monte Carlo Statistical Methods, population Monte Carlo, pseudo-marginal MCMC, quadrangle, quasi-Monte Carlo methods, Rennes, Scotland, simulation, Stanford University on August 21, 2016 by xi'anIn his plenary talk this morning, Arnaud Doucet discussed the application of pseudo-marginal techniques to the latent variable models he has been investigating for many years. And its limiting behaviour towards efficiency, with the idea of introducing correlation in the estimation of the likelihood ratio. Reducing complexity from O(T²) to O(T√T). With the very surprising conclusion that the correlation must go to 1 at a precise rate to get this reduction, since perfect correlation would induce a bias. A massive piece of work, indeed!

The next session of the morning was another instance of conflicting talks and I hoped from one room to the next to listen to Hani Doss’s empirical Bayes estimation with intractable constants (where maybe SAME could be of interest), Youssef Marzouk’s transport maps for MCMC, which sounds like an attractive idea provided the construction of the map remains manageable, and Paul Russel’s adaptive importance sampling that somehow sounded connected with our population Monte Carlo approach. (With the additional step of considering transform maps.)

An interesting item of information I got from the final announcements at MCqMC 2016 just before heading to Monash, Melbourne, is that MCqMC 2018 will take place in the city of Rennes, Brittany, on July 2-6. Not only it is a nice location on its own, but it is most conveniently located in space and time to attend ISBA 2018 in Edinburgh the week after! Just moving from one Celtic city to another Celtic city. Along with other planned satellite workshops, this occurrence should make ISBA 2018 more attractive [if need be!] for participants from oversea.

## even dogs in the wild

Posted in Books, Mountains, Travel with tags book review, Edinburgh, Glasgow, Ian Rankin, Rebus, Scotland, The Associates, The night of the hunter on August 10, 2016 by xi'an**A** new Rankin, a new Rebus! (New as in 2015 since I waited to buy the paperback version.) Sounds like Ian Rankin cannot let his favourite character rest for his retirement and hence set in back into action, along with the new Malcom Fox [working in the Complaints] and most major characters of the Rebus series. Including the unbreakable villain, Big Ger Cafferty. This as classical as you get, borrows from half a dozen former Rebus novels, not to mention this neo-Holmes novel I reviewed a while ago. But it is gritty, deadly efficient and captivating. I read the book within a few days from returning from Warwick.

About the title, this is a song by The Associates that plays a role in the book. I did not this band, but looking for it got me to a clip that used an excerpt from the Night of the Hunter. Fantastic movie, one of my favourites.

## ISBA 2016 [#7]

Posted in Mountains, pictures, Running, Statistics, Travel, Wines with tags Edinburgh, ISBA, ISBA 2016, ISBA 2018, Italian wines, Italy, pecorino sardo, poster session, Santa Margherita di Pula, Sardinia, Sardinian cheese, sunrise, trail running, Valencia conferences on June 20, 2016 by xi'an**T**his series of posts is most probably getting by now an imposition on the ‘Og readership, which either attended ISBA 2016 and does (do?) not need my impressions or did not attend and hence does (do?) not need vague impressions about talks they (it?) did not see, but indulge me in reminiscing about this last ISBA meeting (or more reasonably ignore this post altogether). Now that I am back home (with most of my Sard wine bottles intact!, and a good array of Sard cheeses).

This meeting seems to be the largest ISBA meeting ever, with hundreds of young statisticians taking part in it (despite my early misgivings about the deterrent represented by the overall cost of attending the meeting. I presume holding the meeting in Europe made it easier and cheaper for most Europeans to attend (and hopefully the same will happen in Edinburgh in 2018!), as was the (somewhat unsuspected) wide availability of rental alternatives in the close vicinity of the conference resort. I also presume the same travel opportunities would not have been true in Banff, although local costs would have been lower. It was fantastic to see so many new researchers interested in Bayesian statistics and to meet some of them. And to have more sessions run by the j-Bayes section of ISBA (although I found it counterproductive that such sessions do not focus on a thematically coherent theme). As a result, the meeting was more intense than ever and I found it truly exhausting, despite skipping most poster sessions. Maybe also because I did not skip a single session thanks to the availability of an interesting theme for each block in the schedule. (And because I attended more [great] Sard dinners than I originally intended.) Having five sessions in parallel indeed means there is a fabulous offer of themes for every taste. It also means there are inevitably conflicts when picking one’s session.

Back to poster sessions, I feel I missed an essential part of the meeting, which made ISBA meetings so unique, but it also seems to me the organisation of those sessions should be reconsidered against the rise in attendance. (And my growing inability to stay up late!) One solution suggested by my recent AISTATS experience is to select posters towards lowering the number of posters in the four poster sessions. The success rate for the Cadiz meeting was 35%.) The obvious downsizes are the selection process (but this was done quite efficiently for AISTATS) and the potential reduction in the number of participants. A medium ground could see a smaller fraction of posters to be selected by this process (and published one way or another as in machine-learning conferences) and presented during the evening poster sessions, with other posters being given during the coffee breaks [which certainly does not help in reducing the intensity of the schedule]. Another and altogether solution is to extend the parallelism of oral sessions to poster sessions, by regrouping them into five or six themes or keywords chosen by the presenters and having those presented in different rooms to split the attendance down to human level and tolerable decibels. Nothing preventing participants to visit several rooms in a given evening. Or to keep posters for several nights in a row if the number of rooms allows.

It may also be that this edition of ISBA 2016 sees the end of the resort-style meeting in the spirit of the early Valencia meetings. Edinburgh 2018 will certainly be an open-space conference in that meals and lodgings will be “on” the participants who may choose where and how much. I have heard many times the argument that conferences held in single hotels or resorts facilitated the contacts between young and senior researchers, but I fear this is not sustainable against the growth of the audience. Holding the meeting in a reasonably close and compact location, as a University building, should allow for a sufficient degree of interaction, as was the case at ISBA 2016. (Kerrie Mengersen also suggested that a few restaurants nearby could be designated as “favourites” for participants to interact at dinner time.) Another suggestion to reinforce networking and interacting would be to hold more satellite workshops before the main conference. It seems there could be a young Bayesian workshop in England the prior week as well as a summer short course on simulation methods.

Organising meetings is getting increasingly complex and provides few rewards at the academic level, so I am grateful to the organisers of ISBA 2016 to have agreed to carry the burden this year. And to the scientific committee for setting the quality bar that high. (A special thought too for my friend Walter Racugno who had the ultimate bad luck of having an accident the very week of the meeting he had contributed to organise!)

*[Even though I predict this is my last post on ISBA 2016 I would be delighted to have guest posts on others’ impressions on the meeting. Feel free to send me entries!]*