## proper likelihoods for Bayesian analysis

Posted in Books, Statistics, University life with tags , , , , , , , on April 11, 2013 by xi'an

While in Montpellier yesterday (where I also had the opportunity of tasting an excellent local wine!), I had a look at the 1992 Biometrika paper by Monahan and Boos on “Proper likelihoods for Bayesian analysis“. This is a paper I missed and that was pointed out to me during the discussions in Padova. The main point of this short paper is to decide when a method based on an approximative likelihood function is truly (or properly) Bayes. Just the very question a bystander would ask of ABC methods, wouldn’t it?! The validation proposed by Monahan and Boos is one of calibration of credible sets, just as in the recent arXiv paper of Dennis Prangle, Michael Blum, G. Popovic and Scott Sisson I reviewed three months ago. The idea is indeed to check by simulation that the true posterior coverage of an α-level set equals the nominal coverage α. In other words, the predictive based on the likelihood approximation should be uniformly distributed and this leads to a goodness-of-fit test based on simulations. As in our ABC model choice paper, Proper likelihoods for Bayesian analysis notices that Bayesian inference drawn upon an insufficient statistic is proper and valid, simply less accurate than the Bayesian inference drawn upon the whole dataset. The paper also enounces a conjecture:

A [approximate] likelihood L is a coverage proper Bayesian likelihood if and inly if L has the form L(y|θ) = c(s) g(s|θ) where s=S(y) is a statistic with density g(s|θ) and c(s) some function depending on s alone.

conjecture that sounds incorrect in that noisy ABC is also well-calibrated. (I am not 100% sure of this argument, though.) An interesting section covers the case of pivotal densities as substitute likelihoods and of the confusion created by the double meaning of the parameter θ. The last section is also connected with ABC in that Monahan and Boos reflect on the use of large sample approximations, like normal distributions for estimates of θ which are a special kind of statistics, but do not report formal results on the asymptotic validation of such approximations. All in all, a fairly interesting paper!

Reading this highly interesting paper also made me realise that the criticism I had made in my review of Prangle et al. about the difficulty for this calibration method to address the issue of summary statistics was incorrect: when using the true likelihood function, the use of an arbitrary summary statistics is validated by this method and is thus proper.

## Bayesian non-parametrics

Posted in Statistics with tags , , , , , , , , , , , on April 8, 2013 by xi'an

Here is a short discussion I wrote yesterday with Judith Rousseau of a paper by Peter Müller and Riten Mitra to appear in Bayesian Analysis.

“We congratulate the authors for this very pleasant overview of the type of problems that are currently tackled by Bayesian nonparametric inference and for demonstrating how prolific this field has become. We do share the authors viewpoint that many Bayesian nonparametric models allow for more flexible modelling than parametric models and thus capture finer details of the data. BNP can be a good alternative to complex parametric models in the sense that the computations are not necessarily more difficult in Bayesian nonparametric models. However we would like to mitigate the enthusiasm of the authors since, although we believe that Bayesian nonparametric has proved extremely useful and interesting, we think they oversell the “nonparametric side of the Force”! Our main point is that by definition, Bayesian nonparametric is based on prior probabilities that live on infinite dimensional spaces and thus are never completely swamped by the data. It is therefore crucial to understand which (or why!) aspects of the model are strongly influenced by the prior and how.

As an illustration, when looking at Example 1 with the censored zeroth cell, our reaction is that this is a problem with no proper solution, because it is lacking too much information. In other words, unless some parametric structure of the model is known, in which case the zeroth cell is related with the other cells, we see no way to infer about the size of this cell. The outcome produced by the authors is therefore unconvincing to us in that it seems to only reflect upon the prior modelling (α,G*) and not upon the information contained in the data. Now, this prior modelling may be to some extent justified based on side information about the medical phenomenon under study, however its impact on the resulting inference is palatable.

Recently (and even less recently) a few theoretical results have pointed out this very issue. E.g., Diaconis and Freedman (1986) showed that some priors could surprisingly lead to inconsistent posteriors, even though it was later shown that many priors lead to consistent posteriors and often even to optimal asymptotic frequentist estimators, see for instance van der Vaart and van Zanten (2009) and Kruijer et al. (2010). The worry about Bayesian nonparametrics truly appeared when considering (1) asymptotic frequentist properties of semi-parametric procedures; and (2) interpretation of inferential aspects of Bayesian nonparametric procedures. It was shown in various instances that some nonparametric priors which behaved very nicely for the estimation of the whole parameter could have disturbingly suboptimal behaviour for some specific functionals of interest, see for instance Arbel et al. (2013) and Rivoirard and Rousseau (2012). We do not claim here that asymptotics is the answer to everything however bad asymptotic behaviour shows that something wrong is going on and this helps understanding the impact of the prior. These disturbing bad results are an illustration that in these infinite dimensional models the impact of the prior modelling is difficult to evaluate and that although the prior looks very flexible it can in fact be highly informative and/or restrictive for some aspects of the parameter. It would thus be wrong to conclude that every aspect of the parameter is well-recovered because some are. It has been a well-known fact for Bayesian parametric models, leading to extensive research on reference and other types of objective priors. It is even more crucial in the nonparametric world. No (nonparametric) prior can be suited for every inferential aspect and it is important to understand which aspects of the parameter are well-recovered and which ones are not.

We also concur with the authors that Dirichlet mixture priors provide natural clustering mechanisms, but one may question the “natural” label as the resulting clustering is quite unstructured, growing in the number of clusters as the number of observations increases and not incorporating any prior constraint on the “definition” of a cluster, except the one implicit and well-hidden behind the non-parametric prior. In short, it is delicate to assess what is eventually estimated by this clustering methods.

These remarks are not to be taken criticisms of the overall Bayesian nonparametric approach, just the contrary. We simply emphasize (or recall) that there is no such thing as a free lunch and that we need to post the price to pay for potential customers. In these models, this is far from easy and just as far from being completed.”

References

• Arbel, J., Gayraud, G., and Rousseau, J. (2013). Bayesian adaptive optimal estimation using a sieve prior. Scandinavian Journal of Statistics, to appear.

• Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Statist., 14:1-26.

• Kruijer, W., Rousseau, J., and van der Vaart, A. (2010). Adaptive Bayesian density estimation with location-scale mixtures. Electron. J. Stat., 4:1225-1257.

• Rivoirard, V. and Rousseau, J. (2012). On the Bernstein Von Mises theorem for linear functionals of the density. Ann. Statist., 40:1489-1523.

• van der Vaart, A. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth. Ann. Statist., 37:2655-2675.

## Do we need…yes we do (with some delay)!

Posted in Books, Statistics, University life with tags , , , , , , on April 4, 2013 by xi'an

## ISBA Autumn sales

Posted in Statistics, University life with tags , , , , on October 4, 2012 by xi'an

Some (legitimate) advertising from ISBA: (And make sure you join BayesComp in the process!):

Today ISBA reached 1000 Members!

Please encourage your colleagues to take advantage of ISBA‘s Annual Member Promotion and save on 2013 membership rates! All new ISBAISBA Section and now Chapter memberships are extended until Dec, 31 2013, while current members can save $10 by renewing ISBA dues now. Member dues are modest –$15 for student or reduced rate memberships for developing countries or \$50 for regular memberships. ISBA also offers 25% savings on annual IMS dues. ISBA has specialized sections in Bayesian Computation, Bayesian Nonparametrics, Biostat & Pharma, Finance and Business, Industrial Statistics, Objective Bayes, and coming soon Environmental Statistics! The new Junior ISBA section for students and new researchers that are within 5 years of completing a degree provides networking and additional benefits for new researchers! Join before October 15th to be eligible to participate in the ISBA and ISBA Section Elections!

ISBA provides a focal point for those interested in Bayesian analysis and its applications through the organization of conferences, workshops short courses, and (soon!) webinars and ISBA publications, including the open access electronic journal Bayesian Analysis, the ISBA Bulletin and the ISBA website. The ISBA Bayes News forums provide up-to-date information to a community of 2000+ statisticians and scientists interested in Bayesian statistics.

We have an exciting array of events lined up for 2013 as part of the International Year of Statistics and the 250th Anniversary of Bayes Theorem. ISBA members receive discounts on registration for ISBA sponsored or endorsed events, and ISBA and SBSS Webinars! Junior researchers are eligible for the ISBA Lifetime Members Junior Researcher Award and other travel awards. For more info on ISBA benefits and other ISBA activities such as events, awards and prizes, we encourage you to visit our membership page or login

We encourage you to visit the ISBA website to see what has been happening and help spread the word!

## estimating a constant

Posted in Books, Statistics with tags , , , , , , , , , on October 3, 2012 by xi'an

Paulo (a.k.a., Zen) posted a comment in StackExchange on Larry Wasserman‘s paradox about Bayesians and likelihoodists (or likelihood-wallahs, to quote Basu!) being unable to solve the problem of estimating the normalising constant c of the sample density, f, known up to a constant

$f(x) = c g(x)$

(Example 11.10, page 188, of All of Statistics)

My own comment is that, with all due respect to Larry!, I do not see much appeal in this example, esp. as a potential criticism of Bayesians and likelihood-wallahs…. The constant c is known, being equal to

$1/\int_\mathcal{X} g(x)\text{d}x$

If c is the only “unknown” in the picture, given a sample x1,…,xn, then there is no statistical issue whatsoever about the “problem” and I do not agree with the postulate that there exist estimators of c. Nor priors on c (other than the Dirac mass on the above value). This is not in the least a statistical problem but rather a numerical issue.That the sample x1,…,xn can be (re)used through a (frequentist) density estimate to provide a numerical approximation of c

$\hat c = \hat f(x_0) \big/ g(x_0)$

is a mere curiosity. Not a criticism of alternative statistical approaches: e.g., I could also use a Bayesian density estimate…

Furthermore, the estimate provided by the sample x1,…,xn is not of particular interest since its precision is imposed by the sample size n (and converging at non-parametric rates, which is not a particularly relevant issue!), while I could use importance sampling (or even numerical integration) if I was truly interested in c. I however find the discussion interesting for many reasons

1. it somehow relates to the infamous harmonic mean estimator issue, often discussed on the’Og!;
2. it brings more light on the paradoxical differences between statistics and Monte Carlo methods, in that statistics is usually constrained by the sample while Monte Carlo methods have more freedom in generating samples (up to some budget limits). It does not make sense to speak of estimators in Monte Carlo methods because there is no parameter in the picture, only “unknown” constants. Both fields rely on samples and probability theory, and share many features, but there is nothing like a “best unbiased estimator” in Monte Carlo integration, see the case of the “optimal importance function” leading to a zero variance;
3. in connection with the previous point, the fascinating Bernoulli factory problem is not a statistical problem because it requires an infinite sequence of Bernoullis to operate;
4. the discussion induced Chris Sims to contribute to StackExchange!