## Tractable Fully Bayesian inference via convex optimization and optimal transport theory

Posted in Books, Statistics, University life with tags , , , , , , , , on October 6, 2015 by xi'an

“Recently, El Moselhy et al. proposed a method to construct a map that pushed forward the prior measure to the posterior measure, casting Bayesian inference as an optimal transport problem. Namely, the constructed map transforms a random variable distributed according to the prior into another random variable distributed according to the posterior. This approach is conceptually different from previous methods, including sampling and approximation methods.”

Yesterday, Kim et al. arXived a paper with the above title, linking transport theory with Bayesian inference. Rather strangely, they motivate the transport theory with Galton’s quincunx, when the apparatus is a discrete version of the inverse cdf transform… Of course, in higher dimensions, there is no longer a straightforward transform and the paper shows (or recalls) that there exists a unique solution with positive Jacobian for log-concave posteriors. For instance, log-concave priors and likelihoods. This solution remains however a virtual notion in practice and an approximation is constructed via a (finite) functional polynomial basis. And minimising an empirical version of the Kullback-Leibler distance.

I am somewhat uncertain as to how and why apply such a transform to simulations from the prior (which thus has to be proper). Producing simulations from the posterior certainly is a traditional way to approximate Bayesian inference and this is thus one approach to this simulation. However, the discussion of the advantage of this approach over, say, MCMC, is quite limited. There is no comparison with alternative simulation or non-simulation methods and the computing time for the transport function derivation. And on the impact of the dimension of the parameter space on the computing time. In connection with recent discussions on probabilistic numerics and super-optimal convergence rates, Given that it relies on simulations, I doubt optimal transport can do better than O(√n) rates. One side remark about deriving posterior credible regions from (HPD)  prior credible regions: there is no reason the resulting region is optimal in volume (HPD) given that the transform is non-linear.

## the quincunx [book review]

Posted in Books, Kids, Statistics with tags , , , , , on July 1, 2013 by xi'an

“How then may we become free? Only by harmonising ourselves with the randomness of life through the untrammelled operation of the market.”

This is a 1989 book that I read about that time and had not re-read till last month…. The Quincunx is a parody of several of Charles Dickens’ novels, written by another Charles, Charles Palliser, far into the 20th Century. The name is obviously what attracted me first to this book, since it reminded me of Francis Galton’s amazing mechanical simulation device. Of course, there is nothing in the book that relates to Galton and its quincunx!

“Your employer has been speculating in bills with the company’s capital and, as you’ll conclude in the present panic, he has lost heavily. There’s no choice now but to declare the company bankrupt. And when that happens, the creditors will put you in Marshalsea.”

As I am a big fan of Dickens, I went through The Quincunx as an exercise in Dickensania, trying to spot characters and settings from the many books written by Dickens. I found connections with Great Expectations (for the John-Henrietta couple and the fantastic features in the thieves’ den, but also encounters with poverty and crime), Bleak House (for the judicial intricacies), Little Dorrit (for the jail system and the expectation of inheritance), Our Mutual Friend (for the roles of the Thames, of money, forced weddings),  Martin Chuzzlewit (again for complex inheritance stories), Oliver Twist (for the gangs of thieves, usury, the private “schools” and London underworld), David Copperfield (for the somehow idiotic mother and the fall into poverty), The Mystery of Edwin Drood (for the murder, of course!) And I certainly missed others. (Some literary critics wrote that Palliser managed to write all Dickens at once.)

“I added to the mixture a badly bent George II guinea which was the finest of all the charms.”

However, despite the perfect imitation in style, with its array of grotesque characters and unbelievable accidents, using Dickens’ irony and tongue-in-cheek circumlocutions, with maybe an excess of deliberate misspellings, Palliser delivers a much bleaker picture of Dickens’ era than Dickens himself. This was the worst of times, if any, where some multifaceted unbridled capitalism makes use of the working class through cheap salaries, savage usury, and overpriced (!) slums, forcing women into prostitution, men into cemetery desecration and sewage exploration. There is no redemption at any point in Palliser’s world and the reader is left with the impression that the central character John Huffam (it would be hard to call him the hero of The Quincunx) is about to fall into the same spiral of debt and legal swindles as his complete family tree.  A masterpiece. (Even though I do not buy the postmodern thread.)

## appliBUGS (wet)

Posted in Statistics, University life with tags , , , , , , , , , on December 27, 2012 by xi'an

This morning I gave my talk on ABC; computation or inference? at the appliBUGS seminar. Here, in Paris, BUGS stands for Bayesian United Group of Statisticians! Presumably in connection with a strong football culture, since the talk after mine was Jean-Louis Foulley’s ranking of the Euro 2012 teams. Quite an interesting talk (even though I am not particularly interested in football and even though I dozed a little, steaming out the downpour I had received on my bike-ride there…) I am also sorry I missed the next talk by Jean-Louis on Galton’s quincunx. (Unfortunately, his slides are not [yet?] on-line.)

As a coincidence, after launching a BayesComp page on Google+ (as an aside, I am quite nonplussed by the purpose of Google-), Nicolas Chopin also just started a Bayes in Paris webpage, in connection with our informal seminar/reading group at CREST. With the appropriate picture this time, i.e. a street plaque remembering…Laplace! May I suggest the RER stop Laplace and his statue in the Paris observatory as additional illustrations for the other pages…

## Galton & simulation

Posted in Books, R, Statistics with tags , , , , , , , , on September 28, 2010 by xi'an

Stephen Stigler has written a paper in the Journal of the Royal Statistical Society Series A on Francis Galton’s analysis of (his cousin) Charles Darwin’ Origin of Species, leading to nothing less than Bayesian analysis and accept-reject algorithms!

“On September 10th, 1885, Francis Galton ushered in a new era of Statistical Enlightenment with an address to the British Association for the Advancement of Science in Aberdeen. In the process of solving a puzzle that had lain dormant in Darwin’s Origin of Species, Galton introduced multivariate analysis and paved the way towards modern Bayesian statistics. The background to this work is recounted, including the recognition of a failed attempt by Galton in 1877 as providing the first use of a rejection sampling algorithm for the simulation of a posterior distribution, and the first appearance of a proper Bayesian analysis for the normal distribution.”

The point of interest is that Galton proposes through his (multi-stage) quincunx apparatus a way to simulate from the posterior of a normal mean (here is an R link to the original quincunx). This quincunx has a vertical screen at the second level that acts as a way to physically incorporate the likelihood (it also translates the fact that the likelihood is in another “orthogonal” space, compared  with the prior!):

“Take another look at Galton’s discarded 1877 model for natural selection (Fig. 6). It is nothing less that a workable simulation algorithm for taking a normal prior (the top level) and a normal likelihood (the natural selection vertical screen) and finding a normal posterior (the lower level, including the rescaling as a probability density with the thin front compartment of uniform thickness).”

Besides a simulation machinery (steampunk Monte Carlo?!), it also offers the enormous appeal of proposing the derivation of the normal-normal posterior for the very first time:

“Galton was not thinking in explicit Bayesian terms, of course, but mathematically he has posterior $\mathcal{N}(0,C_2)\propto\mathcal{N}(0,A_2)\times f(x=0|y)$. This may be the earliest appearance of this calculation; the now standard derivation of a posterior distribution in a normal setting with a proper normal prior. Galton gave the general version of this result as part of his 1885 development, but the 1877 version can be seen as an algorithm employing rejection sampling that could be used for the generation of values from a posterior distribution. If we replace $f(x)$ above by the density $\mathcal{N}(a,B_2)$, his algorithm would generate the posterior distribution of Y given X=a, namely $\mathcal{N}(aC_2/B_2, C_2)$. The assumption of normality is of course needed for the particular formulae here, but as an algorithm the normality is not essential; posterior values for any prior and any location parameter likelihood could in principle be generated by extending this algorithm.” Continue reading