Archive for the Wines Category

likelihood-free model choice

Posted in Books, pictures, Statistics, University life, Wines with tags , , , , , , , on March 27, 2015 by xi'an

Jean-Michel Marin, Pierre Pudlo and I just arXived a short review on ABC model choice, first version of a chapter for the incoming Handbook of Approximate Bayesian computation edited by Scott Sisson, Yannan Fan, and Mark Beaumont. Except for a new analysis of a Human evolution scenario, this survey mostly argues for the proposal made in our recent paper on the use of random forests and [also argues] about the lack of reliable approximations to posterior probabilities. (Paper that was rejected by PNAS and that is about to be resubmitted. Hopefully with a more positive outcome.) The conclusion of the survey is  that

The presumably most pessimistic conclusion of this study is that the connections between (i) the true posterior probability of a model, (ii) the ABC version of this probability, and (iii) the random forest version of the above, are at best very loose. This leaves open queries for acceptable approximations of (i), since the posterior predictive error is instead an error assessment for the ABC RF model choice procedure. While a Bayesian quantity that can be computed at little extra cost, it does not necessarily compete with the posterior probability of a model.

reflecting my hope that we can eventually come up with a proper approximation to the “true” posterior probability…

light and widely applicable MCMC: approximate Bayesian inference for large datasets

Posted in Books, Statistics, University life, Wines with tags , , , , , , , , , , on March 24, 2015 by xi'an

Florian Maire (whose thesis was discussed in this post), Nial Friel, and Pierre Alquier (all in Dublin at some point) have arXived today a paper with the above title, aimed at quickly analysing large datasets. As reviewed in the early pages of the paper, this proposal follows a growing number of techniques advanced in the past years, like pseudo-marginals, Russian roulette, unbiased likelihood estimators. firefly Monte Carlo, adaptive subsampling, sub-likelihoods, telescoping debiased likelihood version, and even our very own delayed acceptance algorithm. (Which is incorrectly described as restricted to iid data, by the way!)

The lightweight approach is based on an ABC idea of working through a summary statistic that plays the role of a pseudo-sufficient statistic. The main theoretical result in the paper is indeed that, when subsampling in an exponential family, subsamples preserving the sufficient statistics (modulo a rescaling) are optimal in terms of distance to the true posterior. Subsamples are thus weighted in terms of the (transformed) difference between the full data statistic and the subsample statistic, assuming they are both normalised to be comparable. I am quite (positively) intrigued by this idea in that it allows to somewhat compare inference based on two different samples. The weights of the subsets are then used in a pseudo-posterior that treats the subset as an auxiliary variable (and the weight as a substitute to the “missing” likelihood). This may sound a wee bit convoluted (!) but the algorithm description is not yet complete: simulating jointly from this pseudo-target is impossible because of the huge number of possible subsets. The authors thus suggest to run an MCMC scheme targeting this joint distribution, with a proposed move on the set of subsets and a proposed move on the parameter set conditional on whether or not the proposed subset has been accepted.

From an ABC perspective, the difficulty in calibrating the tolerance ε sounds more accute than usual, as the size of the subset comes as an additional computing parameter. Bootstrapping options seem impossible to implement in a large size setting.

An MCMC issue with this proposal is that designing the move across the subset space is both paramount for its convergence properties and lacking in geometric intuition. Indeed, two subsets with similar summary statistics may be very far apart… Funny enough, in the representation of the joint Markov chain, the parameter subchain is secondary if crucial to avoid intractable normalising constants. It is also unclear for me from reading the paper maybe too quickly whether or not the separate moves when switching and when not switching subsets retain the proper balance condition for the pseudo-joint to still be the stationary distribution. The stationarity for the subset Markov chain is straightforward by design, but it is not so for the parameter. In case of switched subset, simulating from the true full conditional given the subset would work, but not simulated  by a fixed number L of MCMC steps.

The lightweight technology therein shows its muscles on an handwritten digit recognition example where it beats regular MCMC by a factor of 10 to 20, using only 100 datapoints instead of the 10⁴ original datapoints. While very nice and realistic, this example may be misleading in that 100 digit realisations may be enough to find a tolerable approximation to the true MAP. I was also intrigued by the processing of the probit example, until I realised the authors had integrated the covariate out and inferred about the mean of that covariate, which means it is not a genuine probit model.

Nero d’Avolla

Posted in Wines with tags , on March 20, 2015 by xi'an

avola

The synoptic problem and statistics [book review]

Posted in Books, R, Statistics, University life, Wines with tags , , , , , , , , , , , , on March 20, 2015 by xi'an

A book that came to me for review in CHANCE and that came completely unannounced is Andris Abakuks’ The Synoptic Problem and Statistics.  “Unannounced” in that I had not heard so far of the synoptic problem. This problem is one of ordering and connecting the gospels in the New Testament, more precisely the “synoptic” gospels attributed to Mark, Matthew and Luke, since the fourth canonical gospel of John is considered by experts to be posterior to those three. By considering overlaps between those texts, some statistical inference can be conducted and the book covers (some of?) those statistical analyses for different orderings of ancestry in authorship. My overall reaction after a quick perusal of the book over breakfast (sharing bread and fish, of course!) was to wonder why there was no mention made of a more global if potentially impossible approach via a phylogeny tree considering the three (or more) gospels as current observations and tracing their unknown ancestry back just as in population genetics. Not because ABC could then be brought into the picture. Rather because it sounds to me (and to my complete lack of expertise in this field!) more realistic to postulate that those gospels were not written by a single person. Or at a single period in time. But rather that they evolve like genetic mutations across copies and transmission until they got a sort of official status.

“Given the notorious intractability of the synoptic problem and the number of different models that are still being advocated, none of them without its deficiencies in explaining the relationships between the synoptic gospels, it should not be surprising that we are unable to come up with more definitive conclusions.” (p.181)

The book by Abakuks goes instead through several modelling directions, from logistic regression using variable length Markov chains [to predict agreement between two of the three texts by regressing on earlier agreement] to hidden Markov models [representing, e.g., Matthew’s use of Mark], to various independence tests on contingency tables, sometimes bringing into the model an extra source denoted by Q. Including some R code for hidden Markov models. Once again, from my outsider viewpoint, this fragmented approach to the problem sounds problematic and inconclusive. And rather verbose in extensive discussions of descriptive statistics. Not that I was expecting a sudden Monty Python-like ray of light and booming voice to disclose the truth! Or that I crave for more p-values (some may be found hiding within the book). But I still wonder about the phylogeny… Especially since phylogenies are used in text authentication as pointed out to me by Robin Ryder for Chauncer’s Canterbury Tales.

the vim cheat sheet

Posted in Kids, Linux, R, University life, Wines with tags , , , on March 18, 2015 by xi'an

at The X

Posted in pictures, Running, Travel, Wines with tags , , , , , , , on March 15, 2015 by xi'an

TheX

Domaine de Mortiès [in the New York Times]

Posted in Mountains, Travel, Wines with tags , , , , , , , , , , , , , on March 7, 2015 by xi'an

IMG_0245

“I’m not sure how we found Domaine de Mortiès, an organic winery at the foothills of Pic St. Loup, but it was the kind of unplanned, delightful discovery our previous trips to Montpellier never allowed.”

Last year,  I had the opportunity to visit and sample (!) from Domaine de Mortiès, an organic Pic Saint-Loup vineyard and winemaker. I have not yet opened the bottle of Jamais Content I bought then. Today I spotted in The New York Times a travel article on A visit to the in-laws in Montpellier that takes the author to Domaine de Mortiès, Pic Saint-Loup, Saint-Guilhem-du-Désert and other nice places, away from the overcrowded centre of town and the rather bland beach-town of Carnon, where she usually stays when visiting. And where we almost finished our Bayesian Essentials with R! To quote from the article, “Montpellier, France’s eighth-largest city, is blessed with a Mediterranean sun and a beautiful, walkable historic centre, a tourist destination in its own right, but because it is my husband’s home city, a trip there never felt like a vacation to me.” And when the author mentions the owner of Domaine de Mortiès, she states that “Mme. Moustiés looked about as enthused as a teenager working the checkout at Rite Aid”, which is not how I remember her from last year. Anyway, it is fun to see that visitors from New York City can unexpectedly come upon this excellent vineyard!

Follow

Get every new post delivered to your Inbox.

Join 794 other followers