Archive for Luminy

Overfitting Bayesian mixture models with an unknown number of components

Posted in Statistics with tags , , , , , , , , on March 4, 2015 by xi'an

During my Czech vacations, Zoé van Havre, Nicole White, Judith Rousseau, and Kerrie Mengersen1 posted on arXiv a paper on overfitting mixture models to estimate the number of components. This is directly related with Judith and Kerrie’s 2011 paper and with Zoé’s PhD topic. The paper also returns to the vexing (?) issue of label switching! I very much like the paper and not only because the author are good friends!, but also because it brings a solution to an approach I briefly attempted with Marie-Anne Gruet in the early 1990’s, just before finding about the reversible jump MCMC algorithm of Peter Green at a workshop in Luminy and considering we were not going to “beat the competition”! Hence not publishing the output of our over-fitted Gibbs samplers that were nicely emptying extra components… It also brings a rebuke about a later assertion of mine’s at an ICMS workshop on mixtures, where I defended the notion that over-fitted mixtures could not be detected, a notion that was severely disputed by David McKay…

What is so fantastic in Rousseau and Mengersen (2011) is that a simple constraint on the Dirichlet prior on the mixture weights suffices to guarantee that asymptotically superfluous components will empty out and signal they are truly superfluous! The authors here cumulate the over-fitted mixture with a tempering strategy, which seems somewhat redundant, the number of extra components being a sort of temperature, but eliminates the need for fragile RJMCMC steps. Label switching is obviously even more of an issue with a larger number of components and identifying empty components seems to require a lack of label switching for some components to remain empty!

When reading through the paper, I came upon the condition that only the priors of the weights are allowed to vary between temperatures. Distinguishing the weights from the other parameters does make perfect sense, as some representations of a mixture work without those weights. Still I feel a bit uncertain about the fixed prior constraint, even though I can see the rationale in not allowing for complete freedom in picking those priors. More fundamentally, I am less and less happy with independent identical or exchangeable priors on the components.

Our own recent experience with almost zero weights mixtures (and with Judith, Kaniav, and Kerrie) suggests not using solely a Gibbs sampler there as it shows poor mixing. And even poorer label switching. The current paper does not seem to meet the same difficulties, maybe thanks to (prior) tempering.

The paper proposes a strategy called Zswitch to resolve label switching, which amounts to identify a MAP for each possible number of components and a subsequent relabelling. Even though I do not entirely understand the way the permutation is constructed. I wonder in particular at the cost of the relabelling.

packed off!!!

Posted in Books, pictures, R, Statistics with tags , , , , , , , , , , on February 9, 2013 by xi'an

La Défense, Paris, Feb. 04, 2013Deliverance!!! We have at last completed our book! Bayesian Essentials with R is off my desk! In a final nitty-gritty day of compiling and recompiling the R package bayess and the LaTeX file, we have reached versions that were in par with our expectations. The package has been submitted to CRAN (it has gone back and forth a few times, with requests to lower the computing time in the examples: each example should take less than 10s, then 5s…), then accepted by CRAN, incl. a Windows version, and the book has be sent to Springer-Verlag. This truly is a deliverance for me as this book project has been on my work horizon almost constantly for more than the past two years, led to exciting times in Luminy, Carnon and Berlin, has taken an heavy toll on my collaborations and research activities, and was slowly turning into a unsavoury chore! I am thus delighted Jean-Michel and I managed to close the door before any disastrous consequence on either the book or our friendship could develop. Bayesian Essentials with R is certainly an improvement compared with Bayesian Core, primarily by providing a direct access to the R code. We dearly hope it will attract a wider readership by reducing the mathematical requirements (even though some parts are still too involved for most undergraduates) and we will keep testing it with our own students in Montpellier and Paris over the coming months. In the meanwhile, I just enjoy this feeling of renewed freedom!!!

books versus papers [for PhD students]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on July 7, 2012 by xi'an

Before I run out of time, here is my answer to the ISBA Bulletin Students’ corner question of the term: “In terms of publications and from your own experience, what are the pros and cons of books vs journal articles?

While I started on my first book during my postdoctoral years in Purdue and Cornell [a basic probability book made out of class notes written with Arup Bose, which died against the breakers of some referees’ criticisms], my overall opinion on this is that books are never valued by hiring and promotion committees for what they are worth! It is a universal constant I met in the US, the UK and France alike that books are not helping much for promotion or hiring, at least at an early stage of one’s career. Later, books become a more acknowledge part of senior academics’ vitae. So, unless one has a PhD thesis that is ready to be turned into a readable book without having any impact on one’s publication list, and even if one has enough material and a broad enough message at one’s disposal, my advice is to go solely and persistently for journal articles. Besides the above mentioned attitude of recruiting and promotion committees, I believe this has several positive aspects: it forces the young researcher to maintain his/her focus on specialised topics in which she/he can achieve rapid prominence, rather than having to spend [quality research] time on replacing the background and building reference. It provides an evaluation by peers of the quality of her/his work, while reviews of books are generally on the light side. It is the starting point for building a network of collaborations, few people are interested in writing books with strangers (when knowing it is already quite a hardship with close friends!). It is also the entry to workshops and international conferences, where a new book very rarely attracts invitations.

Writing a book is of course exciting and somewhat more deeply rewarding, but it is awfully time-consuming and requires a high level of organization young faculty members rarely possess when starting a teaching job at a new university (with possibly family changes as well!). I was quite lucky when writing The Bayesian Choice and Monte Carlo Statistical Methods to mostly be on leave from teaching, as it would have otherwise be impossible! That we are not making sufficient progress on our revision of Bayesian Core, started two years ago, is a good enough proof that even with tight planning, great ideas, enthusiasm, sale prospects, and available material, completing a book may get into trouble for mere organisational issues…

Carnon [and Core]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , on June 14, 2012 by xi'an

I am now for a few days in Carnon, near Montpellier, to work on the completion (!) of Bayesian Core, started two years ago not that far in Luminy… The small beach town is right on the Mediterranean Sea, located on a spit (or lido), itself carrying a canal between the Lez river and the sea. A quiet enough place, far from interruptions of all sorts! Although we are really not that far from completion, various commitments here and there kept Jean-Michel and myself from doing it over the past months. I am thus looking forward those two and a half days of hard work (and not even a break to go climbing in the back country!).

ABC-MCMC for parallel tempering

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on February 9, 2012 by xi'an

In this paper a new algorithm combining population-based MCMC methods with ABC requirements is proposed, using an analogy with the Parallel Tempering algorithm (Geyer, 1991).

Another of those arXiv papers that had sat on my to-read pile for way too long: Likelihood-free parallel tempering by Meïli Baragtti, Agnès Grimaud, and Denys Pommeret, from Luminy, Marseilles. The paper mentions our population Monte Carlo (PMC) algorithm (Beaumont et al., 2009) and other ABC-SMC algorithms, but opts instead for an ABC-MCMC basis. The purpose is to build a parallel tempering method. Tolerances and temperatures evolve simultaneously. I however fail to see where the tempering occurs in the algorithm (page 7): there is a set of temperatures T1,….,TN, but they do not appear within the algorithm. My first idea of a tempering mechanism in a likelihood-free setting was to replicate our SAME algorithm (Doucet, Godsill, and Robert, 2004), by creating Tj copies of the [pseudo-]observations to mimic the likelihood taken to the power Tj. But this is annealing, not tempering, and I cannot think of the opposite of copies of the data. Unless of course a power of the likelihood can be simulated (and even then, what would the equivalent be for the data…?) Maybe a natural solution would be to operate some kind of data-attrition, e.g. by subsampling the original vector of observations.

Discussing the issue with Jean-Michel Marin, during a visit to Montpellier today, I realised that the true tempering came from the tolerances εi, while the temperatures Tj were there to calibrate the proposal distributions. And that the major innovation contained in the thesis (if not so clearly in the paper) was to boost exchanges between different tolerances, improving upon the regular ABC-MCMC sampler by an equi-energy move.

le théorème de l’engambi

Posted in Books, Statistics with tags , , , , , , , on May 20, 2011 by xi'an

When I climbed in Luminy last year, one of the ways was called le théorème de l’engambi. Looking on the internet, I found this was the title of a book written by a local, Maurice Gouiran. The other evening, at the airport, the book was on sale in the bookstore, so I bought it and read it in the plane back to Paris. It is a local crime novel with highly local characters (to the point I do not understand all they say), local places like l’Estaque, the OM football club, La Gineste, Luminy, and what is apparently the most appealing theorem in novels, Fermat’s last theorem! (Engambi means messy affair in local dialect.) Overall the book is more pleasant to read for the local flavour than for the crime enquiry per se, especially because it involves scenes that take place in CIRM itself (including the restaurant and the terrace outside under the old oaks!). There is of course no indication on the nature of the three page proof produced by the first corpse of the book, but the description of the mathematical community is rather accurate, overall. The author mentions in a postnote that he is aware of Wiles’ proof, but believes (as a poet) in an alternative proof that Fermat had really found. (This book is not to be confused with Guedj’s parrot theorem, which is a novelesque story of mathematics, even though it ends up on the same premise that a parrot could recite Fermat’s proof…)

CoRe in CiRM [end]

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on July 18, 2010 by xi'an

Back home after those two weeks in CiRM for our “research in pair” invitation to work on the new edition of Bayesian Core, I am very grateful for the support we received from CiRM and through it from SMF and CNRS. Being “locked” away in such a remote place brought a considerable increase in concentration and decrease in stress levels. Although I was planning for more, we have made substantial advances on five chapters of the book (out of nine), including a completely new chapter (Chapter 8) on hierarchical models and a thorough rewriting of the normal chapter (Chapter 2), which along with Chapter 1 (largely inspired from  Chapter 1 of Introducing Monte Carlo Methods with R, itself inspired from the first edition of Bayesian Core,!). is nearly done. Chapter 9 on image processing is also quite close from completion, with just the result of a batch simulation running on the Linux server in Dauphine to include in the ABC section. As the only remaining major change is the elimination of reversible jump from the mixture chapter (to be replaced with Chib’s approximation) and from the time-series chapter (to be simplified into a birth-and-death process). Going back to the CiRM environment, I think we were lucky to come during the vacation season as there is hardly anyone on the campus, which means no car and no noise. The (good) feeling of remoteness is not as extreme as in Oberwolfach, but it is truly a quality environment. Besides, being able to work 24/7 in the math library is a major plus. as we could go and grab any reference we needed to check. (Presumably, CiRM is lacking in terms of statistics books, compared with Oberwolfach, still providing most of the references we were looking for.) At last, the freedom to walk right out of the Centre into the national park for a run, a climb or even a swim (in Morgiou, rather than Sugiton) makes working there very tantalising indeed! I thus dearly hope I can enjoy again this opportunity in a near future…


Get every new post delivered to your Inbox.

Join 792 other followers