Archive for Bayesian Analysis

David Blei smile in Paris (seminar)

Posted in Statistics, Travel, University life with tags , , , , , , , , on October 30, 2013 by xi'an

Nicolas Chopin just reminded me of a seminar given by David Blei in Paris tomorrow (at 4pm, SMILE seminarINRIA 23 avenue d’Italie, 5th floor, orange room) on Stochastic Variational Inference and Scalable Topic Models, machine learning seminar that I will alas miss, being busy on giving mine at CMU. Here is the abstract:

Probabilistic topic modeling provides a suite of tools for analyzing
large collections of electronic documents.  With a collection as
input, topic modeling algorithms uncover its underlying themes and
decompose its documents according to those themes.  We can use topic
models to explore the thematic structure of a large collection of
documents or to solve a variety of prediction problems about text.

Topic models are based on hierarchical mixed-membership models,
statistical models where each document expresses a set of components
(called topics) with individual per-document proportions. The
computational problem is to condition on a collection of observed
documents and estimate the posterior distribution of the topics and
per-document proportions. In modern data sets, this amounts to
posterior inference with billions of latent variables.

How can we cope with such data?  In this talk I will describe
stochastic variational inference, a general algorithm for
approximating posterior distributions that are conditioned on massive
data sets.  Stochastic inference is easily applied to a large class of
hierarchical models, including time-series models, factor models, and
Bayesian nonparametric models.  I will demonstrate its application to
topic models fit with millions of articles.  Stochastic inference
opens the door to scalable Bayesian computation for modern data

Statistics for spatio-temporal data [book review]

Posted in Books, Statistics, University life with tags , , , , , , on October 14, 2013 by xi'an

Here is the new reference book about spatial and spatio-temporal statistical modelling!  Noel Cressie wrote the earlier classic Statistics for Spatial Data in 1993 and he has now co-authored with Christopher Wikle (a plenary speaker at ISBA 2014 in Cancún) the new bible on the topic. And with a very nice cover of a Guatemaltec lienzo about the Spanish conquest. (Disclaimer: as I am a good friend of Noel, do not expect this review to remain unbiased!)

“…we state the obvious, that political boundaries cannot hold back a one-meter rise in sea level; our environment is ultimately a global resource and its stewardship is an international responsibility.” (p.11)

The book is a sum (in the French/Latin meaning of somme/summa when applied to books—I am not sure this explanation makes any sense!) and, as its predecessor, it covers an enormous range of topics and methods. So do not expect a textbook coverage of most notions and prepare to read further articles referenced in the text. One of the many differences with the earlier book is that MCMC appears from the start as a stepping stone that is necessary to handle

“…there are model-selection criteria that could be invoked (e.g., AIC, BIC, DIC, etc.), which concentrate on the twin pillars of predictability and parsimony. But they do not address the third pillar, namely scientific interpretability (i.e., knowledge).” (p.33)

The first chapter of the book is actually a preface motivating the topics covered by the book, which may be confusing on a first read, esp. for a graduate student, as there is no math formula and no model introduced at this stage. Anyway, this is not really a book made for a linear read. It is quite  witty (with too many quotes to report here!) and often funny (I learned for instance that Einstein’s quote “Everything should be made as simple as possible, but not simpler” was a paraphrase of an earlier lecture, invented by the Reader’s Digest!).

“Thus, we believe that it is not helpful to try to classify probability distributions that determine the statistical models, as subjective or objective. Better questions to ask are about the sensitivity of inferences to model choices and whether such choices make sense scientifically.” (p.32)

The overall tone of the book is mostly Bayesian, in a non-conflictual conditional probability way, insisting on hierarchical (Bayesian) model building. Incidentally, it uses the same bracket notation for generic distributions (densities) as in Gelfand and Smith (JASA, 1990), i.e. [X|Y] and [X|Z,y][Z|y,θ], notation that did not get much of a fan club. (I actually do not know where it stemmed from.) The second chapter contains an illustration of the search for the USS Scorpion using a Bayesian model (including priors built from experts’ opinions), example which is also covered [without the maths!] in Sharon McGrayne’s Theory that would not die.

The book is too rich and my time is too tight (!) to cover each chapter in details.  (For instance, I am not so happy with the temporal chapter in that it moves away from the Bayesian perspective without much of a justification.) Suffice to say then that it appears like an updated and improved version of its predecessor, with 45 pages of references, some of them quite recent. If I was to teach from this book at a Master level, it would take the whole academic year and then some, assuming enough mathematical culture from the student audience.

As an addendum, I noticed several negative reviews on amazon due to the poor quality of the printing, but the copy I received from John Wiley was quite fine, with the many colour graphs well-rendered. Maybe an earlier printing or a different printing agreement?

from Jakob Bernoulli to Hong Kong

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , on August 24, 2013 by xi'an

Here are my slides (or at least the current version thereof) for my talk in Hong Kong at the 2013 (59th ISI) World Statistical Congress(I stopped embedding my slideshare links in the posts as they freeze my broswer. I wonder if anyone else experiences the same behaviour.)

This talk will feature in the History I: Jacob Bernoulli’s “Ars Conjectandi” and the emergence of probability invited paper session organised by Adam Jakubowski. While my own research connection with Bernoulli is at most tenuous, besides using the Law of Large Numbers and Bernoulli rv’s…,  I [of course!] borrowed from earlier slides on our vanilla Rao-Blackwellisation paper (if only  because of the Bernoulli factory connection!) and ask Mark Girolami for his Warwick slides on the Russian roulette (another Bernoulli factory connection!), before recycling my Budapest slides on ABC. The other talks in the session are by Edith Dudley Sylla on Ars Conjectandi and by Krzys Burdzy on his book The Search for Certainty. Book that I critically reviewed in Bayesian Analysis. This will be the first time I meet Krzys in person and I am looking forward to the opportunity!

proper likelihoods for Bayesian analysis

Posted in Books, Statistics, University life with tags , , , , , , , on April 11, 2013 by xi'an

While in Montpellier yesterday (where I also had the opportunity of tasting an excellent local wine!), I had a look at the 1992 Biometrika paper by Monahan and Boos on “Proper likelihoods for Bayesian analysis“. This is a paper I missed and that was pointed out to me during the discussions in Padova. The main point of this short paper is to decide when a method based on an approximative likelihood function is truly (or properly) Bayes. Just the very question a bystander would ask of ABC methods, wouldn’t it?! The validation proposed by Monahan and Boos is one of calibration of credible sets, just as in the recent arXiv paper of Dennis Prangle, Michael Blum, G. Popovic and Scott Sisson I reviewed three months ago. The idea is indeed to check by simulation that the true posterior coverage of an α-level set equals the nominal coverage α. In other words, the predictive based on the likelihood approximation should be uniformly distributed and this leads to a goodness-of-fit test based on simulations. As in our ABC model choice paper, Proper likelihoods for Bayesian analysis notices that Bayesian inference drawn upon an insufficient statistic is proper and valid, simply less accurate than the Bayesian inference drawn upon the whole dataset. The paper also enounces a conjecture:

A [approximate] likelihood L is a coverage proper Bayesian likelihood if and inly if L has the form L(y|θ) = c(s) g(s|θ) where s=S(y) is a statistic with density g(s|θ) and c(s) some function depending on s alone.

conjecture that sounds incorrect in that noisy ABC is also well-calibrated. (I am not 100% sure of this argument, though.) An interesting section covers the case of pivotal densities as substitute likelihoods and of the confusion created by the double meaning of the parameter θ. The last section is also connected with ABC in that Monahan and Boos reflect on the use of large sample approximations, like normal distributions for estimates of θ which are a special kind of statistics, but do not report formal results on the asymptotic validation of such approximations. All in all, a fairly interesting paper!

Reading this highly interesting paper also made me realise that the criticism I had made in my review of Prangle et al. about the difficulty for this calibration method to address the issue of summary statistics was incorrect: when using the true likelihood function, the use of an arbitrary summary statistics is validated by this method and is thus proper.

Bayesian non-parametrics

Posted in Statistics with tags , , , , , , , , , , , on April 8, 2013 by xi'an

Here is a short discussion I wrote yesterday with Judith Rousseau of a paper by Peter Müller and Riten Mitra to appear in Bayesian Analysis.

“We congratulate the authors for this very pleasant overview of the type of problems that are currently tackled by Bayesian nonparametric inference and for demonstrating how prolific this field has become. We do share the authors viewpoint that many Bayesian nonparametric models allow for more flexible modelling than parametric models and thus capture finer details of the data. BNP can be a good alternative to complex parametric models in the sense that the computations are not necessarily more difficult in Bayesian nonparametric models. However we would like to mitigate the enthusiasm of the authors since, although we believe that Bayesian nonparametric has proved extremely useful and interesting, we think they oversell the “nonparametric side of the Force”! Our main point is that by definition, Bayesian nonparametric is based on prior probabilities that live on infinite dimensional spaces and thus are never completely swamped by the data. It is therefore crucial to understand which (or why!) aspects of the model are strongly influenced by the prior and how.

As an illustration, when looking at Example 1 with the censored zeroth cell, our reaction is that this is a problem with no proper solution, because it is lacking too much information. In other words, unless some parametric structure of the model is known, in which case the zeroth cell is related with the other cells, we see no way to infer about the size of this cell. The outcome produced by the authors is therefore unconvincing to us in that it seems to only reflect upon the prior modelling (α,G*) and not upon the information contained in the data. Now, this prior modelling may be to some extent justified based on side information about the medical phenomenon under study, however its impact on the resulting inference is palatable.

Recently (and even less recently) a few theoretical results have pointed out this very issue. E.g., Diaconis and Freedman (1986) showed that some priors could surprisingly lead to inconsistent posteriors, even though it was later shown that many priors lead to consistent posteriors and often even to optimal asymptotic frequentist estimators, see for instance van der Vaart and van Zanten (2009) and Kruijer et al. (2010). The worry about Bayesian nonparametrics truly appeared when considering (1) asymptotic frequentist properties of semi-parametric procedures; and (2) interpretation of inferential aspects of Bayesian nonparametric procedures. It was shown in various instances that some nonparametric priors which behaved very nicely for the estimation of the whole parameter could have disturbingly suboptimal behaviour for some specific functionals of interest, see for instance Arbel et al. (2013) and Rivoirard and Rousseau (2012). We do not claim here that asymptotics is the answer to everything however bad asymptotic behaviour shows that something wrong is going on and this helps understanding the impact of the prior. These disturbing bad results are an illustration that in these infinite dimensional models the impact of the prior modelling is difficult to evaluate and that although the prior looks very flexible it can in fact be highly informative and/or restrictive for some aspects of the parameter. It would thus be wrong to conclude that every aspect of the parameter is well-recovered because some are. It has been a well-known fact for Bayesian parametric models, leading to extensive research on reference and other types of objective priors. It is even more crucial in the nonparametric world. No (nonparametric) prior can be suited for every inferential aspect and it is important to understand which aspects of the parameter are well-recovered and which ones are not.

We also concur with the authors that Dirichlet mixture priors provide natural clustering mechanisms, but one may question the “natural” label as the resulting clustering is quite unstructured, growing in the number of clusters as the number of observations increases and not incorporating any prior constraint on the “definition” of a cluster, except the one implicit and well-hidden behind the non-parametric prior. In short, it is delicate to assess what is eventually estimated by this clustering methods.

These remarks are not to be taken criticisms of the overall Bayesian nonparametric approach, just the contrary. We simply emphasize (or recall) that there is no such thing as a free lunch and that we need to post the price to pay for potential customers. In these models, this is far from easy and just as far from being completed.”


  • Arbel, J., Gayraud, G., and Rousseau, J. (2013). Bayesian adaptive optimal estimation using a sieve prior. Scandinavian Journal of Statistics, to appear.

  • Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Statist., 14:1-26.

  • Kruijer, W., Rousseau, J., and van der Vaart, A. (2010). Adaptive Bayesian density estimation with location-scale mixtures. Electron. J. Stat., 4:1225-1257.

  • Rivoirard, V. and Rousseau, J. (2012). On the Bernstein Von Mises theorem for linear functionals of the density. Ann. Statist., 40:1489-1523.

  • van der Vaart, A. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth. Ann. Statist., 37:2655-2675.


Get every new post delivered to your Inbox.

Join 557 other followers