Archive for confidence distribution

structure and uncertainty, Bristol, Sept. 26

Posted in Books, pictures, R, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , on September 27, 2012 by xi'an

Another day full of interesting and challenging—in the sense they generated new questions for me—talks at the SuSTain workshop. After another (dry and fast) run around the Downs; Leo Held started the talks with one of my favourite topics, namely the theory of g-priors in generalized linear models. He did bring a new perspective on the subject, introducing the notion of a testing Bayes factor based on the residual statistic produced by a classical (maximum likelihood) analysis, connected with earlier works of Vale Johnson. While I did not truly get the motivation for switching from the original data to this less informative quantity, I find this perspective opening new questions for dealing with settings where the true data is replaced with one or several classical statistics. With possible strong connections to ABC, of course. Incidentally, Leo managed to produce a napkin with Peter Green’s intro to MCMC dating back from their first meeting in 1994: a feat I certainly could not reproduce (as I also met both Peter and Leo for the first time in 1994, at CIRM)… Then Richard Everit presented his recent JCGS paper on Bayesian inference on latent Markov random fields, centred on the issue that simulating the latent MRF involves an MCMC step that is not exact (as in our earlier ABC paper for Ising models with Aude Grelaud). I already discussed this paper in an earlier blog and the only additional question that comes to my mind is whether or not a comparison with the auxiliary variable approach of Møller et al. (2006) would make sense.

In the intermission, I had a great conversation with Oliver Ratman on his talk of yesterday on the surprising feature that some models produce as “data” some sample from a pseudo-posterior.. Opening once again new vistas! The following talks were more on the mathematical side, with James Cussens focussing on the use of integer programming for Bayesian variable selections, then Éric Moulines presenting a recent work with a PhD student of his on PAC-Bayesian bounds and the superiority of combining experts. Including a CRAN package. Éric concluded his talk with the funny occurence of Peter’s photograph on Éric’s Microsoft Research Profile own page, due to Éric posting our joint photograph at the top of Pic du Midi d’Ossau in 2005… (He concluded with a picture of the mountain that was the exact symmetry of mine yesterday!)

The afternoon was equally superb with Gareth Roberts covering fifteen years of scaling MCMC algorithms, from the mythical 0.234 figure to the optimal temperature decrease in simulated annealing, John Kent playing the outlier with an EM algorithm—however including a formal prior distribution and raising the challenge as to why Bayesians never had to constrain the posterior expectation, which prompted me to infer that (a) the prior distribution should include all constraints and (b) the posterior expectation was not the “right” tool in non-convex parameters spaces—. Natalia Bochkina presented a recent work, joint with Peter Green, on connecting image analysis with Bayesian asymptotics, reminding me of my early attempts at reading Ibragimov and Has’minskii in the 1990’s. Then a second work with Vladimir Spoikoini on Bayesian asymptotics with misspecified models, introducing a new notion of effective dimension. The last talk of the day was by Nils Hjort about his coming book on “Credibility, confidence and likelihood“—not yet advertised by CUP—which sounds like an attempt at resuscitating Fisher by deriving distributions in the parameter space from frequentist confidence intervals. I already discussed this notion in an earlier blog, so I am fairly skeptical about it, but the talk was representative of Nils’ highly entertaining and though-provoking style! Esp. as he sprinkled the talk with examples where MLE (and some default Bayes estimators) did not work. And reanalysed one of Chris Sims‘ example presented during his Nobel Prize talk…

Confidence distributions

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , on June 11, 2012 by xi'an

I was asked by the International Statistical Review editor, Marc Hallin, for a discussion of the paper “Confidence distribution, the frequentist distribution estimator of a parameter — a review” by Min-ge Xie and Kesar Singh, both from Rutgers University. Although the paper is not available on-line, similar and recent reviews and articles can be found, in an 2007 IMS Monograph and a 2012 JASA paper both with Bill Strawderman, as well as a chapter in the recent Fetschrift for Bill Strawderman. The notion of confidence distribution is quite similar to the one of fiducial distribution, introduced by R.A. Fisher, and they both share in my opinion the same drawback, namely that they aim at a distribution over the parameter space without specifying (at least explicitly) a prior distribution. Furthermore, the way the confidence distribution is defined perpetuates the on-going confusion between confidence and credible intervals, in that the cdf on the parameter θ is derived via the inversion of a confidence upper bound (or, equivalently, of a p-value…) Even though this inversion properly defines a cdf on the parameter space, there is no particular validity in the derivation. Either the confidence distribution corresponds to a genuine posterior distribution, in which case I think the only possible interpretation is a Bayesian one. Or  the confidence distribution does not correspond to a genuine posterior distribution, because no prior can lead to this distribution, in which case there is a probabilistic impossibility in using this distribution.  Thus, as a result, my discussion (now posted on arXiv) is rather negative about the benefits of this notion of confidence distribution.

One entry in the review, albeit peripheral, attracted my attention. The authors mention a tech’ report where they exhibit a paradoxical behaviour of a Bayesian procedure: given a (skewed) prior on a pair (p0,p1), and a binomial likelihood, the posterior distribution on p1-p0 has its main mass in the tails of both the prior and the likelihood (“the marginal posterior of d = p1-p0 is more extreme than its prior and data evidence!”). The information provided in the paper is rather sparse on the genuine experiment and looking at two possible priors exhibited nothing of the kind… I went to the authors’ webpages and found a more precise explanation on Min-ge Xie’s page:

Although the contour plot of the posterior distribution sits between those of the prior distribution and the likelihood function, its projected peak is more extreme than the other two. Further examination suggests that this phenomenon is genuine in binomial clinical trials and it would not go away even if we adopt other (skewed) priors (for example, the independent beta priors used in Joseph et al. (1997)). In fact, as long as the center of a posterior distribution is not on the line joining the two centers of the joint prior and likelihood function (as it is often the case with skewed distributions), there exists a direction along which the marginal posterior fails to fall between the prior and likelihood function of the same parameter.

and a link to another paper. Reading through the paper (and in particular Section 4), it appears that the above “paradoxical” picture is the result of the projections of the joint distributions represented in this second picture. By projection, I presume the authors mean integrating out the second component, e.g. p1+p0. This indeed provides the marginal prior of p1-p0, the marginal posterior of p1-p0, but…not the marginal likelihood of p1-p0! This entity is not defined, once again because there is no reference measure on the parameter space which could justify integrating out some parameters in the likelihood. (Overall, I do not think the “paradox” is overwhelming: the joint posterior distribution does precisely the merging of prior and data information we would expect and it is not like the marginal posterior is located in zones with zero prior probability and zero (profile) likelihood. I am also always wary of arguments based on modes, since those are highly dependent on parameterisation.)

Most unfortunately, when searching for more information on the authors’ webpages, I came upon the sad news that Professor Singh had passed away three weeks ago, at the age of 56.  (Professor Xie wrote a touching eulogy of his friend and co-author.) I had only met briefly with Professor Singh during my visit to Rutgers two months ago, but he sounded like an academic who would have enjoyed the kind of debate drafted by my discussion. To the much more important loss to family, friends and faculty represented by Professor Singh demise, I thus add the loss of missing the intellectual challenge of crossing arguments with him. And I look forward discussing the issues with the first author of the paper, Professor Xie.

Follow

Get every new post delivered to your Inbox.

Join 634 other followers