## ABC+EL=no D(ata)

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , on May 28, 2012 by xi'an

It took us a loooong while [for various and uninteresting reasons] but we finally ended up completing a paper on ABC using empirical likelihood (EL) that was started by me listening to Brunero Liseo’s tutorial in O’Bayes-2011 in Shanghai… Brunero mentioned empirical likelihood as a semi-parametric technique w/o much Bayesian connections and this got me thinking of a possible recycling within ABC. I won’t get into the details of empirical likelihood, referring to Art Owen’s book “Empirical Likelihood” for a comprehensive entry, The core idea of empirical likelihood is to use a maximum entropy discrete distribution supported by the data and constrained by estimating equations related with the parameters of interest/of the model. As such, it is a non-parametric approach in the sense that the distribution of the data does not need to be specified, only some of its characteristics. Econometricians have been quite busy at developing this kind of approach over the years, see e.g. Gouriéroux and Monfort’s  Simulation-Based Econometric Methods). However, this empirical likelihood technique can also be seen as a convergent approximation to the likelihood and hence exploited in cases when the exact likelihood cannot be derived. For instance, as a substitute to the exact likelihood in Bayes’ formula. Here is for instance a comparison of a true normal-normal posterior with a sample of 10³ points simulated using the empirical likelihood based on the moment constraint.

The paper we wrote with Kerrie Mengersen and Pierre Pudlo thus examines the consequences of using an empirical likelihood in ABC contexts. Although we called the derived algorithm ABCel, it differs from genuine ABC algorithms in that it does not simulate pseudo-data. Hence the title of this post. (The title of the paper is “Approximate Bayesian computation via empirical likelihood“. It should be arXived by the time the post appears: “Your article is scheduled to be announced at Mon, 28 May 2012 00:00:00 GMT“.) We had indeed started looking at a simulated data version, but it was rather poor, and we thus opted for an importance sampling version where the parameters are simulated from an importance distribution (e.g., the prior) and then weighted by the empirical likelihood (times a regular importance factor if the importance distribution is not the prior). The above graph is an illustration in a toy example.

The difficulty with the method is in connecting the parameters (of interest/of the assumed distribution) with moments of the (iid) data. While this operates rather straightforwardly for quantile distributions, it is less clear for dynamic models like ARCH and GARCH, where we have to reconstruct the underlying iid process. (Where ABCel clearly improves upon ABC for the GARCH(1,1) model but remains less informative than a regular MCMC analysis. Incidentally, this study led to my earlier post on the unreliable garch() function in the tseries package!) And it is even harder for population genetic models, where parameters like divergence dates, effective population sizes, mutation rates, &tc., cannot be expressed as moments of the distribution of the sample at a given locus. In particular, the datapoints are not iid. Pierre Pudlo then had the brilliant idea to resort instead to a composite likelihood, approximating the intra-locus likelihood by a product of pairwise likelihoods over all pairs of genes in the sample at a given locus. Indeed, in Kingman’s coalescent theory, the pairwise likelihoods can be expressed in closed form, hence we can derive the pairwise composite scores. The comparison with optimal ABC outcomes shows an improvement brought by ABCel in the approximation, at an overall computing cost that is negligible against ABC (i.e., it takes minutes to produce the ABCel outcome, compared with hours for ABC.)

We are now looking for extensions and improvements of ABCel, both at the methodological and at the genetic levels, and we would of course welcome any comment at this stage. The paper has been submitted to PNAS, as we hope it should appeal to the ABC community at large, i.e. beyond statisticians…

## “Dry Red Wine”

Posted in Travel, Wines with tags , , , on June 20, 2011 by xi'an

A “dry red wine” (which exact name I do not know) I would certainly not recommend. The year (1994) is very unlikely to be related with the age of the beverage and the “Produce of France” at the bottom is at best connected to the use of French oak barrels…

## 上海, 天际线 [Shanghai skyline 2]

Posted in pictures, Travel with tags , , on June 16, 2011 by xi'an

## Snapshots from 上海

Posted in pictures, Travel with tags , , , , , on June 15, 2011 by xi'an

Before joining the O’Bayes 2011 conference, Linda kindly took me on a quick morning tour of the Bund, the historical colonial district of Shanghai. This was very nice, bringing back memories of my friend José de Sam Lazaro telling me about his childhood there in the French concession, and the views of the Huangpu River in a mist [that did not lift for the whole day] were terrific. (The photo below not only gives an hazy idea of the Pudong district, but it incorporates a well-hidden statue of the former mayor of Shanghai in typical Maoist attire, as well as a few security cameras that seem to be everywhere.) However, I felt a bit sorry (and not only from a tourist’s point of view) that there was no visible remnant of an older Shanghai (I mean, older than the colonial buildings on the Bund) that seemed to have been entirely razed to build new tall buildings in a rather haphazard fashion… The whole city is brimming with construction work, from high-rise buildings in the centre to the many housing complexes I saw from the highway.

## 上海, 天际线 [Shanghai skyline 1]

Posted in pictures, Travel with tags , , , on June 14, 2011 by xi'an