## LGM 2012, Trondheim

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , on May 31, 2012 by xi'an

A break from the “snapshots from Guérande” that will be a relief for all ‘ Og readers, I am sure: I am now in Trondheim, Norway, for the second Latent Gaussian model meeting, organised by Håvard Rue and his collaborators. As in the earlier edition in Zürich, the main approach to those models (that is adopted in the talks) is the INLA methodology of Rue, Martino and Chopin. I nonetheless (given the theme) gave a presentation on Rao-Blackwellisation techniques for MCMC algorithms. As I had not printed the program of the meeting prior to my departure (blame Guérande!), I had not realised I had only 20 minutes for my talk and kept adding remarks and slides during the flight from Amsterdam to Trondheim [where the clouds prevented me from seeing Jotunheimen]. (So I had to cut the second half of the talk below on parallelisation. Even with this cut, the 20 minutes went awfully fast!) Apart from my talk, I am afraid I was not in a sufficient state of awareness [due to a really early start] to give a comprehensive of the afternoon talks….

Trondheim is a nice city that sometimes feels like a village despite its size. Walking up to the university along typical wooden houses, then going around the town and along the river tonight while running a 10k loop left me with the impression of a very pleasant place (at least in the summer months).

## snapshot from Guérande (4)

Posted in pictures, Travel with tags , , , on May 30, 2012 by xi'an

## the universe in zero words

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , on May 30, 2012 by xi'an

The universe in zero words: The story of mathematics as told through equations is a book with a very nice cover: in case you cannot see the details on the picture, what looks like stars on a bright night sky are actually equations discussed in the book (plus actual stars!)…

The universe in zero words is written by Dana Mackenzie (check his website!) and published by Princeton University Press. (I received it in the mail from John Wiley for review, prior to its publication on May 16, nice!) It reads well and quick: I took it with me in the métro one morning and was half-way through it the same evening, as the universe in zero words remains on the light side, esp. for readers with a high-school training in math. The book strongly reminded me (at times) of my high school years and of my fascination for Cardano’s formula and the non-Euclidean geometries. I was also reminded of studying quaternions for a short while as an undergraduate by the (arguably superfluous) chapter on Hamilton. So a pleasant if unsurprising read, with a writing style that is not always at its best, esp. after reading Bill Bryson’s “Seeing Further: The Story of Science, Discovery, and the Genius of the Royal Society“, and a book unlikely to bring major epiphanies to the mathematically inclined. If well-documented, free of typos, and engaging into some mathematical details (accepting to go against the folk rule that “For every equation you put in, you will lose half of your audience.” already mentioned in Diaconis and Graham’s book). With alas a fundamental omission: no trace is found therein of Bayes’ formula! (The very opposite of Bryson’s introduction, who could have arguably stayed away from it.) The closest connection with statistics is the final chapter on the Black-Scholes equation, which does not say much about probability…. It is of course the major difficulty with the exercise of picking 24 equations out of the history of maths and physics that some major and influential equations had to be set aside… Maybe the error was in covering (or trying to cover) formulas from physics as well as from maths. Now, rather paradoxically (?) I learned more from the physics chapters: for instance, the chapters on Maxwell’s, Einstein’s, and Dirac’s formulae are very well done. The chapter on the fundamental theorem of calculus is also appreciable.

## snapshot from Guérande (3)

Posted in pictures, Running, Travel with tags , , , , , on May 29, 2012 by xi'an

## optimal direction Gibbs

Posted in Statistics, University life with tags , , , , , , on May 29, 2012 by xi'an

An interesting paper appeared on arXiv today. Entitled On optimal direction gibbs sampling, by Andrés Christen, Colin Fox, Diego Andrés Pérez-Ruiz and Mario Santana-Cibrian, it defines optimality as picking the direction that brings the maximum independence between two successive realisations in the Gibbs sampler. More precisely, it aims at choosing the direction e that minimises the mutual information criterion

$\int\int f_{Y,X}(y,x)\log\dfrac{f_{Y,X}(y,x)}{f_Y(y)f_X(x)}\,\text{d}x\,\text{d}y$

I have a bit of an issue about this choice because it clashes with measure theory. Indeed, in one Gibbs step associated with e the transition kernel is defined in terms of the Lebesgue measure over the line induced by e. Hence the joint density of the pair of successive realisations is defined in terms of the product of the Lebesgue measure on the overall space and of the Lebesgue measure over the line induced by e… While the product in the denominator is defined against the product of the Lebesgue measure on the overall space and itself. The two densities are therefore not comparable since not defined against equivalent measures… The difference between numerator and denominator is actually clearly expressed in the normal example (page 3) when the chain operates over a n dimensional space, but where the conditional distribution of the next realisation is one-dimensional, thus does not relate with the multivariate normal target on the denominator. I therefore do not agree with the derivation of the mutual information henceforth produced as (3).

The above difficulty is indirectly perceived by the authors, who note “we cannot simply choose the best direction: the resulting Gibbs sampler would not be irreducible” (page 5), an objection I had from an earlier page… They instead pick directions at random over the unit sphere and (for the normal case) suggest using a density over those directions such that

$h^*(\mathbf{e})\propto(\mathbf{e}^\prime A\mathbf{e})^{1/2}$

which cannot truly be called “optimal”.

More globally, searching for “optimal” directions (or more generally transforms) is quite a worthwhile idea, esp. when linked with adaptive strategies…

## snapshot from Guérande (2)

Posted in pictures, Running, Travel with tags , , , , on May 28, 2012 by xi'an

## ABC+EL=no D(ata)

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , on May 28, 2012 by xi'an

It took us a loooong while [for various and uninteresting reasons] but we finally ended up completing a paper on ABC using empirical likelihood (EL) that was started by me listening to Brunero Liseo’s tutorial in O’Bayes-2011 in Shanghai… Brunero mentioned empirical likelihood as a semi-parametric technique w/o much Bayesian connections and this got me thinking of a possible recycling within ABC. I won’t get into the details of empirical likelihood, referring to Art Owen’s book “Empirical Likelihood” for a comprehensive entry, The core idea of empirical likelihood is to use a maximum entropy discrete distribution supported by the data and constrained by estimating equations related with the parameters of interest/of the model. As such, it is a non-parametric approach in the sense that the distribution of the data does not need to be specified, only some of its characteristics. Econometricians have been quite busy at developing this kind of approach over the years, see e.g. Gouriéroux and Monfort’s  Simulation-Based Econometric Methods). However, this empirical likelihood technique can also be seen as a convergent approximation to the likelihood and hence exploited in cases when the exact likelihood cannot be derived. For instance, as a substitute to the exact likelihood in Bayes’ formula. Here is for instance a comparison of a true normal-normal posterior with a sample of 10³ points simulated using the empirical likelihood based on the moment constraint.

The paper we wrote with Kerrie Mengersen and Pierre Pudlo thus examines the consequences of using an empirical likelihood in ABC contexts. Although we called the derived algorithm ABCel, it differs from genuine ABC algorithms in that it does not simulate pseudo-data. Hence the title of this post. (The title of the paper is “Approximate Bayesian computation via empirical likelihood“. It should be arXived by the time the post appears: “Your article is scheduled to be announced at Mon, 28 May 2012 00:00:00 GMT“.) We had indeed started looking at a simulated data version, but it was rather poor, and we thus opted for an importance sampling version where the parameters are simulated from an importance distribution (e.g., the prior) and then weighted by the empirical likelihood (times a regular importance factor if the importance distribution is not the prior). The above graph is an illustration in a toy example.

The difficulty with the method is in connecting the parameters (of interest/of the assumed distribution) with moments of the (iid) data. While this operates rather straightforwardly for quantile distributions, it is less clear for dynamic models like ARCH and GARCH, where we have to reconstruct the underlying iid process. (Where ABCel clearly improves upon ABC for the GARCH(1,1) model but remains less informative than a regular MCMC analysis. Incidentally, this study led to my earlier post on the unreliable garch() function in the tseries package!) And it is even harder for population genetic models, where parameters like divergence dates, effective population sizes, mutation rates, &tc., cannot be expressed as moments of the distribution of the sample at a given locus. In particular, the datapoints are not iid. Pierre Pudlo then had the brilliant idea to resort instead to a composite likelihood, approximating the intra-locus likelihood by a product of pairwise likelihoods over all pairs of genes in the sample at a given locus. Indeed, in Kingman’s coalescent theory, the pairwise likelihoods can be expressed in closed form, hence we can derive the pairwise composite scores. The comparison with optimal ABC outcomes shows an improvement brought by ABCel in the approximation, at an overall computing cost that is negligible against ABC (i.e., it takes minutes to produce the ABCel outcome, compared with hours for ABC.)

We are now looking for extensions and improvements of ABCel, both at the methodological and at the genetic levels, and we would of course welcome any comment at this stage. The paper has been submitted to PNAS, as we hope it should appeal to the ABC community at large, i.e. beyond statisticians…