## nested sampling: any prior anytime?!

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , on March 26, 2021 by xi'an

A recent arXival by Justin Alsing and Will Handley on “nested sampling with any prior you like” caught my attention. If only because I was under the impression that some priors would not agree with nested sampling. Especially those putting positive weight on some fixed levels of the likelihood function, as well as improper priors.

“…nested sampling has largely only been practical for a somewhat restrictive class of priors, which have a readily available representation as a transform from the unit hyper-cube.”

Reading from the paper, it seems that the whole point is to demonstrate that “any proper prior may be transformed onto the unit hypercube via a bijective transformation.” Which seems rather straightforward if the transform is not otherwise constrained: use a logit transform in every direction. The paper gets instead into the rather fashionable direction of normalising flows as density representations. (Which suddenly reminded me of the PhD dissertation of Rob Cornish at Oxford, which I examined last year. Even though nested was not used there in the same understanding.) The purpose appearing later (in the paper) or in fine to express a random variable simulated from the prior as the (generative) transform of a Uniform variate, f(U). Resuscitating the simulation from an arbitrary distribution from first principles.

“One particularly common scenario where this arises is when one wants to use the (sampled) posterior from one experiment as the prior for another”

But I remained uncertain at the requirement for this representation in implementing nested sampling as I do not see how it helps in bypassing the hurdles of simulating from the prior constrained by increasing levels of the likelihood function. It would be helpful to construct normalising flows adapted to the truncated priors but I did not see anything related to this version in the paper.

The cosmological application therein deals with the incorporation of recent measurements in the study of the ΛCDM cosmological model, that is, more recent that the CMB Planck dataset we played with 15 years ago. (Time flies, even if an expanding Universe!) Namely, the Baryon Oscillation Spectroscopic Survey and the SH0ES collaboration.

## freedom prior

Posted in Books, Kids, Statistics with tags , , , , , on December 9, 2020 by xi'an

Another X validated question on which I spent more time than expected. Because of the somewhat unusual parameterisation used in BDA.for the inverse χ² distribution. The interest behind the question is in the induced distribution on the parameter associated with the degrees of freedom ν of the t-distribution (question that coincided with my last modifications of my undergraduate mathematical statistics exam, involving a t sample). Whichever the prior chosen on ν, the posterior involves a nasty term

$\pi(\nu)\frac{(\nu)^{n\nu/2}}{\Gamma(\nu/2)^n}{\,(v_1\cdots v_n)^{-\nu/2-1}\exp\Big\{-\nu\sigma^2}\sum_{i=1}^n1\big/2v_i\Big\}$

as the Gamma function there is quickly explosive (as can be checked Stirling’s formula). Unless the prior π(ν) cancels this term, which is rather fishy as the prior would then depend on the sample size n. Even though the whole posterior is well-defined (and hence non-explosive). Rather than seeking a special prior π(ν) for computation purposes, I would thus favour a modelling restricted to integer valued ν’s as there is not much motivation in inferring about non-integer degrees of freedom.

## Bayesian non-parametrics

Posted in Statistics with tags , , , , , , , , , , , on April 8, 2013 by xi'an

Here is a short discussion I wrote yesterday with Judith Rousseau of a paper by Peter Müller and Riten Mitra to appear in Bayesian Analysis.

“We congratulate the authors for this very pleasant overview of the type of problems that are currently tackled by Bayesian nonparametric inference and for demonstrating how prolific this field has become. We do share the authors viewpoint that many Bayesian nonparametric models allow for more flexible modelling than parametric models and thus capture finer details of the data. BNP can be a good alternative to complex parametric models in the sense that the computations are not necessarily more difficult in Bayesian nonparametric models. However we would like to mitigate the enthusiasm of the authors since, although we believe that Bayesian nonparametric has proved extremely useful and interesting, we think they oversell the “nonparametric side of the Force”! Our main point is that by definition, Bayesian nonparametric is based on prior probabilities that live on infinite dimensional spaces and thus are never completely swamped by the data. It is therefore crucial to understand which (or why!) aspects of the model are strongly influenced by the prior and how.

As an illustration, when looking at Example 1 with the censored zeroth cell, our reaction is that this is a problem with no proper solution, because it is lacking too much information. In other words, unless some parametric structure of the model is known, in which case the zeroth cell is related with the other cells, we see no way to infer about the size of this cell. The outcome produced by the authors is therefore unconvincing to us in that it seems to only reflect upon the prior modelling (α,G*) and not upon the information contained in the data. Now, this prior modelling may be to some extent justified based on side information about the medical phenomenon under study, however its impact on the resulting inference is palatable.

Recently (and even less recently) a few theoretical results have pointed out this very issue. E.g., Diaconis and Freedman (1986) showed that some priors could surprisingly lead to inconsistent posteriors, even though it was later shown that many priors lead to consistent posteriors and often even to optimal asymptotic frequentist estimators, see for instance van der Vaart and van Zanten (2009) and Kruijer et al. (2010). The worry about Bayesian nonparametrics truly appeared when considering (1) asymptotic frequentist properties of semi-parametric procedures; and (2) interpretation of inferential aspects of Bayesian nonparametric procedures. It was shown in various instances that some nonparametric priors which behaved very nicely for the estimation of the whole parameter could have disturbingly suboptimal behaviour for some specific functionals of interest, see for instance Arbel et al. (2013) and Rivoirard and Rousseau (2012). We do not claim here that asymptotics is the answer to everything however bad asymptotic behaviour shows that something wrong is going on and this helps understanding the impact of the prior. These disturbing bad results are an illustration that in these infinite dimensional models the impact of the prior modelling is difficult to evaluate and that although the prior looks very flexible it can in fact be highly informative and/or restrictive for some aspects of the parameter. It would thus be wrong to conclude that every aspect of the parameter is well-recovered because some are. It has been a well-known fact for Bayesian parametric models, leading to extensive research on reference and other types of objective priors. It is even more crucial in the nonparametric world. No (nonparametric) prior can be suited for every inferential aspect and it is important to understand which aspects of the parameter are well-recovered and which ones are not.

We also concur with the authors that Dirichlet mixture priors provide natural clustering mechanisms, but one may question the “natural” label as the resulting clustering is quite unstructured, growing in the number of clusters as the number of observations increases and not incorporating any prior constraint on the “definition” of a cluster, except the one implicit and well-hidden behind the non-parametric prior. In short, it is delicate to assess what is eventually estimated by this clustering methods.

These remarks are not to be taken criticisms of the overall Bayesian nonparametric approach, just the contrary. We simply emphasize (or recall) that there is no such thing as a free lunch and that we need to post the price to pay for potential customers. In these models, this is far from easy and just as far from being completed.”

References

• Arbel, J., Gayraud, G., and Rousseau, J. (2013). Bayesian adaptive optimal estimation using a sieve prior. Scandinavian Journal of Statistics, to appear.

• Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Statist., 14:1-26.

• Kruijer, W., Rousseau, J., and van der Vaart, A. (2010). Adaptive Bayesian density estimation with location-scale mixtures. Electron. J. Stat., 4:1225-1257.

• Rivoirard, V. and Rousseau, J. (2012). On the Bernstein Von Mises theorem for linear functionals of the density. Ann. Statist., 40:1489-1523.

• van der Vaart, A. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth. Ann. Statist., 37:2655-2675.