Archive for regularisation

ISBA 2016 [#5]

Posted in Mountains, pictures, Running, Statistics, Travel with tags , , , , , , , , , , , , , on June 18, 2016 by xi'an

from above Forte Village, Santa Magherita di Pula, Sardinia, June 17, 2016On Thursday, I started the day by a rather masochist run to the nearby hills, not only because of the very hour but also because, by following rabbit trails that were not intended for my size, I ended up being scratched by thorns and bramble all over!, but also with neat views of the coast around Pula.  From there, it was all downhill [joke]. The first morning talk I attended was by Paul Fearnhead and about efficient change point estimation (which is an NP hard problem or close to). The method relies on dynamic programming [which reminded me of one of my earliest Pascal codes about optimising a dam debit]. From my spectator’s perspective, I wonder[ed] at easier models, from Lasso optimisation to spline modelling followed by testing equality between bits. Later that morning, James Scott delivered the first Bayarri Lecture, created in honour of our friend Susie who passed away between the previous ISBA meeting and this one. James gave an impressive coverage of regularisation through three complex models, with the [hopefully not degraded by my translation] message that we should [as Bayesians] focus on important parts of those models and use non-Bayesian tools like regularisation. I can understand the practical constraints for doing so, but optimisation leads us away from a Bayesian handling of inference problems, by removing the ascertainment of uncertainty…

Later in the afternoon, I took part in the Bayesian foundations session, discussing the shortcomings of the Bayes factor and suggesting the use of mixtures instead. With rebuttals from [friends in] the audience!

This session also included a talk by Victor Peña and Jim Berger analysing and answering the recent criticisms of the Likelihood principle. I am not sure this answer will convince the critics, but I won’t comment further as I now see the debate as resulting from a vague notion of inference in Birnbaum‘s expression of the principle. Jan Hannig gave another foundation talk introducing fiducial distributions (a.k.a., Fisher’s Bayesian mimicry) but failing to provide a foundational argument for replacing Bayesian modelling. (Obviously, I am definitely prejudiced in this regard.)

The last session of the day was sponsored by BayesComp and saw talks by Natesh Pillai, Pierre Jacob, and Eric Xing. Natesh talked about his paper on accelerated MCMC recently published in JASA. Which surprisingly did not get discussed here, but would definitely deserve to be! As hopefully corrected within a few days, when I recoved from conference burnout!!! Pierre Jacob presented a work we are currently completing with Chris Holmes and Lawrence Murray on modularisation, inspired from the cut problem (as exposed by Plummer at MCMski IV in Chamonix). And Eric Xing spoke about embarrassingly parallel solutions, discussed a while ago here.

kernel approximate Bayesian computation for population genetic inferences

Posted in Statistics, University life with tags , , , , on May 22, 2012 by xi'an

A new posting about ABC on arXiv by Shigeki Nakagome, Kenji Fukumizu, and Shuhei Mano entitled kernel approximate Bayesian computation for population genetic inferences argues about an improvement brought by the use of reproducing kernel Hilbert space (RKHS) perspective in ABC methodology, when compared with more standard ABC relying on a rather arbitrary choice of summary statistics and metric. However, I feel that the paper does not substantially defend this point, only using a simulation experiment to compare mean square errors. In particular, the claim of consistency is unsubstantiated, as is the counterpoint that “conventional ABC did not have consistency” (page 14) [and several papers, including the just published Read Paper by Fearnhead and Prangle, claim the opposite]. Furthermore, a considerable amount of space is taken in the paper by the description of the existing ABC algorithms, while the complete version of the new kernel ABC-RKHS algorithm is missing. In particular, the coverage of kernel Bayes is too sketchy to be comprehensible [at least to me] without additional study. Actually, I do not get the notion of kernel Bayes’ rule, which seems defined only in terms of expectations

\mathbb{E}[f(\theta)|s]=\sum_i w_i f(\theta_i),

where the weights are the ridge-like matrix

w_i=\sum_j (\mathbf{G}_S + n\epsilon_n \mathbf{I}_n)^{-1}_{ij}k(s_i,s_j)

where the parameter is generated from the prior, the data s is generated from the sampling distribution, and the matrix GS is made of the k(si,sj)‘s. The surrounding Hilbert space presentation does not seem particularly relevant, esp. in population genetics… I am also under the impression that the choice of the kernel function k(.,.) is as important as the choice of the metric in regular ABC, although this is not discussed in the paper, since it implies [among other things] the choice of a metric. The implementation uses a Gaussian kernel and an Euclidean metric, which involves assumptions on the homogeneous nature of the components of the summary statistics or of the data. Similarly, the “regularization” parameter εn needs to be calibrated and the paper is unclear about this, apparently picking the parameter that “showed the smallest MSEs” (page 10), which cannot be called a calibration. (There is a rather unimportant proposition about concentration of information on page 6 which proof relies on two densities being ordered, see top of page 7.)

Bayesian variable selection [off again]

Posted in Statistics, University life with tags , , , , , , on November 16, 2011 by xi'an

As indicated a few weeks ago, we have received very encouraging reviews from Bayesian Analysis about our [Gilles Celeux, Mohammed El Anbari, Jean-Michel Marin and myself] our comparative study of Bayesian and non-Bayesian variable selections procedures (“Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation“) to Bayesian Analysis. We have just  rearXived and resubmitted it with additional material and hope this is the last round. (I must acknowledge a limited involvement at this final stage of the paper. Had I had more time available, I would have liked to remove the numerous tables and turn them into graphs…)

Bayesian variable selection [back again]

Posted in Statistics, University life with tags , on September 16, 2011 by xi'an

We (Gilles Celeux, Mohammed El Anbari, Jean-Michel Marin and myself)  resubmitted our comparative study of Bayesian and non-Bayesian variable selections procedures (“Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation“) to Bayesian Analysis. last July and got the reviews back yesterday. They are quite positive, the main comment being that the code used to run the comparison should be put on line. This is very easy to comply with and so we will most likely resubmit very soon! Great news!

Bayesian variable selection redux

Posted in Statistics, University life with tags , , , , , on July 11, 2011 by xi'an

After a rather long interlude, and just in time for the six month deadline!, we (Gilles Celeux, Mohammed El Anbari, Jean-Michel Marin and myself) have resubmitted (and rearXived) our comparative study of Bayesian and non-Bayesian variable selections procedures to Bayesian Analysis. Why it took us so long is a combination of good and bad reasons: besides being far apart, between Morocco, Paris and Montpellier, and running too many projects at once with Jean-Michel (including the Bayesian Core revision that did not move much since last summer!), we came to realise that my earlier strong stance that invariance on the intercept did not matter was not right and that the (kind) reviewers were correct about the asymptotic impact of the scale of the intercept on the variable selection, so we had first to reconvene and think about it, before running another large round of simulations. We hope the picture is now clearer.

Terug van Eindhoven [Yes III impressions]

Posted in Statistics, Travel with tags , , , , , , , on October 8, 2009 by xi'an

First, Peter Grünwald had to cancel his lectures at Yes III due to a severe flu, which was unfortunate both for him (!) and for the participants to the workshop. Indeed, I was quite interested in hearing about the/his latest developments on the minimum length encoding priors… The lectures by Laurie Davies and Niels Hjort did take place, however, and were quite informative from my perspective: Laurie Davies gave a very general lecture on the notion of approximation and regularisation in Statistics, with a lot of good questions about the nature of “truth” and “model”, which was quite appropriate for this meeting. There also was a kind of ABC flavour in his talk—which made a sort of a connection with mine—, in that models were generally tested by running virtual datasets and checking for adequacy of the observed model. Maybe a bit too ad-hoc and frequentist, as well as fundamentally dependent on the measure of adequacy (in a Vapnik-Cervonenkis sense), but still very interesting. (Of course, a Bayesian answer would also incorporate the consequence of a rejection by looking at the action under the alternative/rejection…) The second half of his lectures was about non-parametric regression, a topic I always find incompletely covered as to why and where the assumptions are made. But I think these lectures must have had a lasting impact on the young statisticians attending the workshop.

Niels Hjort first talked about the “quiet scandal of Statistics”, a nice sentence coined by Leo Breiman, which actually replies to some extent to the previous lectures in that he complained about the lack of accounting for the randomness/bias in selecting a model before working with it as if it was the “truth”.  Another very interesting part of the lectures was dealing with his focussed information criterion (FIC), which adds to the menagerie of information criteria, but also has an interesting link with the pre-test and shrinkage literature of the 70’s and the 80’s. Selecting a model according to its estimated performances in terms of a common loss function is certainly of interest, even though incorporating everything within a single Bayesian framework would certainly be more coherent. Niels also included a fairly exciting data analysis about the authorship of the Novel Prize novel “Quiet flows the Don“, which he attributed to the Nobel Prize winner Sholokhov (solely on the basis of the length of the sentences). Most of his lecture covers material related to his recent book Model Selection and Model Averaging co-authored with Gerda Claeskens.

My only criticism about the meeting is that, despite the relatively small audience, there was little interaction and discussion during the talks (which makes sense for my talk as there was hardly anyone, besides Nils Hjort, interested in computational Bayes!). The questions during the talks were mostly asked by the three senior lecturers and the debates as well. This certainly occurs in other young statisticians meetings, but I think the audience should be encouraged to participate, to debate and to criticise, because this is part of the job of being a researcher. Having for instance registered discussants would help.

Another personnal regret is to have missed the opportunity to attend a concert of Jordi Savall who was playing on Tuesday night Marais’ Lecons de Ténèbres in Eindhoven…