Archive for dominating measure

O’Bayes 19/3

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on July 2, 2019 by xi'an

Nancy Reid gave the first talk of the [Canada] day, in an impressive comparison of all approaches in statistics that involve a distribution of sorts on the parameter, connected with the presentation she gave at BFF4 in Harvard two years ago, including safe Bayes options this time. This was related to several (most?) of the talks at the conference, given the level of worry (!) about the choice of a prior distribution. But the main assessment of the methods still seemed to be centred on a frequentist notion of calibration, meaning that epistemic interpretations of probabilities and hence most of Bayesian answers were disqualified from the start.

In connection with Nancy’s focus, Peter Hoff’s talk also concentrated on frequency valid confidence intervals in (linear) hierarchical models. Using prior information or structure to build better and shrinkage-like confidence intervals at a given confidence level. But not in the decision-theoretic way adopted by George Casella, Bill Strawderman and others in the 1980’s. And also making me wonder at the relevance of contemplating a fixed coverage as a natural goal. Above, a side result shown by Peter that I did not know and which may prove useful for Monte Carlo simulation.

Jaeyong Lee worked on a complex model for banded matrices that starts with a regular Wishart prior on the unrestricted space of matrices, computes the posterior and then projects this distribution onto the constrained subspace. (There is a rather consequent literature on this subject, including works by David Dunson in the past decade of which I was unaware.) This is a smart demarginalisation idea but I wonder a wee bit at the notion as the constrained space has measure zero for the larger model. This could explain for the resulting posterior not being a true posterior for the constrained model in the sense that there is no prior over the constrained space that could return such a posterior. Another form of marginalisation paradox. The crux of the paper is however about constructing a functional form of minimaxity. In his discussion of the paper, Guido Consonni provided a representation of the post-processed posterior (P³) that involves the Dickey-Savage ratio, sort of, making me more convinced of the connection.

As a lighter aside, one item of local information I should definitely have broadcasted more loudly and long enough in advance to the conference participants is that the University of Warwick is not located in ye olde town of Warwick, where there is no university, but on the outskirts of the city of Coventry, but not to be confused with the University of Coventry. Located in Coventry.

 

posterior distribution missing the MLE

Posted in Books, Kids, pictures, Statistics with tags , , , , , , , on April 25, 2019 by xi'an

An X validated question as to why the MLE is not necessarily (well) covered by a posterior distribution. Even for a flat prior… Which in restrospect highlights the fact that the MLE (and the MAP) are invasive species in a Bayesian ecosystem. Since they do not account for the dominating measure. And hence do not fare well under reparameterisation. (As a very much to the side comment, I also managed to write an almost identical and simultaneous answer to the first answer to the question.)

dominating measure

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , on March 21, 2019 by xi'an

Yet another question on X validated reminded me of a discussion I had once  with Jay Kadane when visiting Carnegie Mellon in Pittsburgh. Namely the fundamentally ill-posed nature of conjugate priors. Indeed, when considering the definition of a conjugate family as being a parameterised family Þ of distributions over the parameter space Θ stable under transform to the posterior distribution, this property is completely dependent (if there is such a notion as completely dependent!) on the dominating measure adopted on the parameter space Θ. Adopted is the word as there is no default, reference, natural, &tc. measure that promotes one specific measure on Θ as being the dominating measure. This is a well-known difficulty that also sticks out in most “objective Bayes” problems, as well as with maximum entropy priors. This means for instance that, while the Gamma distributions constitute a conjugate family for a Poisson likelihood, so do the truncated Gamma distributions. And so do the distributions which density (against a Lebesgue measure over an arbitrary subset of (0,∞)) is the product of a Gamma density by an arbitrary function of θ. I readily acknowledge that the standard conjugate priors as introduced in every Bayesian textbook are standard because they facilitate (to a certain extent) posterior computations. But, just like there exist an infinity of MaxEnt priors associated with an infinity of dominating measures, there exist an infinity of conjugate families, once more associated with an infinity of dominating measures. And the fundamental reason is that the sampling model (which induces the shape of the conjugate family) does not provide a measure on the parameter space Θ.

risk-adverse Bayes estimators

Posted in Books, pictures, Statistics with tags , , , , , , , , , , on January 28, 2019 by xi'an

An interesting paper came out on arXiv in early December, written by Michael Brand from Monash. It is about risk-adverse Bayes estimators, which are defined as avoiding the use of loss functions (although why avoiding loss functions is not made very clear in the paper). Close to MAP estimates, they bypass the dependence of said MAPs on parameterisation by maximising instead π(θ|x)/√I(θ), which is invariant by reparameterisation if not by a change of dominating measure. This form of MAP estimate is called the Wallace-Freeman (1987) estimator [of which I never heard].

The formal definition of a risk-adverse estimator is still based on a loss function in order to produce a proper version of the probability to be “wrong” in a continuous environment. The difference between estimator and true value θ, as expressed by the loss, is enlarged by a scale factor k pushed to infinity. Meaning that differences not in the immediate neighbourhood of zero are not relevant. In the case of a countable parameter space, this is essentially producing the MAP estimator. In the continuous case, for “well-defined” and “well-behaved” loss functions and estimators and density, including an invariance to parameterisation as in my own intrinsic losses of old!, which the author calls likelihood-based loss function,  mentioning f-divergences, the resulting estimator(s) is a Wallace-Freeman estimator (of which there may be several). I did not get very deep into the study of the convergence proof, which seems to borrow more from real analysis à la Rudin than from functional analysis or measure theory, but keep returning to the apparent dependence of the notion on the dominating measure, which bothers me.

questioning the Bayesian choice

Posted in Books, Kids, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , on November 13, 2018 by xi'an

When I woke up (very early!) in Oaxaca, on Remembrance Day, I noticed a long question about The Bayesian Choice contents on X validated. The originator of the question (OQ) was puzzled about several statements in the book on maximum entropy  methods, from the nature of the moment constraints, outside standard moments, to the existence of a maximum entropy prior (as exemplified by quantiles), to the deeper issue of the ultimate arbitrariness of “the” maximum entropy prior since it is also determined by the choice of the dominating measure. (Never neglect the dominating measure!) A more challenging concept, hopefully making it home with the OQ.