## Confidence distributions

**I** was asked by the International Statistical Review editor, Marc Hallin, for a discussion of the paper “Confidence distribution, the frequentist distribution estimator of a parameter — a review” by Min-ge Xie and Kesar Singh, both from Rutgers University. Although the paper is not available on-line, similar and recent reviews and articles can be found, in an 2007 IMS Monograph and a 2012 JASA paper both with Bill Strawderman, as well as a chapter in the recent Fetschrift for Bill Strawderman. The notion of confidence distribution is quite similar to the one of fiducial distribution, introduced by R.A. Fisher, and they both share in my opinion the same drawback, namely that they aim at a distribution over the parameter space without specifying (at least explicitly) a prior distribution. Furthermore, the way the confidence distribution is defined perpetuates the on-going confusion between confidence and credible intervals, in that the cdf on the parameter *θ* is derived via the inversion of a confidence upper bound (or, equivalently, of a *p*-value…) Even though this inversion properly defines a cdf on the parameter space, there is no particular validity in the derivation. Either the confidence distribution corresponds to a genuine posterior distribution, in which case I think the only possible interpretation is a Bayesian one. Or the confidence distribution does not correspond to a genuine posterior distribution, because no prior can lead to this distribution, in which case there is a probabilistic impossibility in using this distribution. Thus, as a result, my discussion (now posted on arXiv) is rather negative about the benefits of this notion of confidence distribution.

**O**ne entry in the review, albeit peripheral, attracted my attention. The authors mention a tech’ report where they exhibit a paradoxical behaviour of a Bayesian procedure: given a (skewed) prior on a pair (p_{0},p_{1}), and a binomial likelihood, the posterior distribution on p_{1}-p_{0} has its main mass in the tails of both the prior and the likelihood (“the marginal posterior of d = p_{1}-p_{0} is more extreme than its prior and data evidence!”). The information provided in the paper is rather sparse on the genuine experiment and looking at two possible priors exhibited nothing of the kind… I went to the authors’ webpages and found a more precise explanation on Min-ge Xie’s page:

Although the contour plot of the posterior distribution sits between those of the prior distribution and the likelihood function, its projected peak is more extreme than the other two. Further examination suggests that this phenomenon is genuine in binomial clinical trials and it would not go away even if we adopt other (skewed) priors (for example, the independent beta priors used in Joseph et al. (1997)). In fact,

(as it is often the case with skewed distributions), there exists a direction along which the marginal posterior fails to fall between the prior and likelihood function of the same parameteras long as the center of a posterior distribution is not on the line joining the two centers of the joint prior and likelihood function.

and a link to another paper. Reading through the paper (and in particular Section 4), it appears that the above “paradoxical” picture is the result of the projections of the joint distributions represented in this second picture. By projection, I presume the authors mean integrating out the second component, e.g. p_{1}+p_{0}. This indeed provides the marginal prior of p_{1}-p_{0}, the marginal posterior of p_{1}-p_{0}, but…not the marginal likelihood of p_{1}-p_{0}! This entity is not defined, once again because there is no reference measure on the parameter space which could justify integrating out some parameters in the likelihood. (Overall, I do not think the “paradox” is overwhelming: the joint posterior distribution does precisely the merging of prior and data information we would expect and it is not like the marginal posterior is located in zones with zero prior probability and zero (profile) likelihood. I am also always wary of arguments based on modes, since those are highly dependent on parameterisation.)

**M**ost unfortunately, when searching for more information on the authors’ webpages, I came upon the sad news that Professor Singh had passed away three weeks ago, at the age of 56. (Professor Xie wrote a touching eulogy of his friend and co-author.) I had only met briefly with Professor Singh during my visit to Rutgers two months ago, but he sounded like an academic who would have enjoyed the kind of debate drafted by my discussion. To the much more important loss to family, friends and faculty represented by Professor Singh demise, I thus add the loss of missing the intellectual challenge of crossing arguments with him. And I look forward discussing the issues with the first author of the paper, Professor Xie.

June 24, 2012 at 11:57 pm

[...] Confidence distributions [...]

June 14, 2012 at 6:49 pm

Dear Professor,

I keep in mind the construction of credible intervals.

For example, let t be parameter, X – observed data and

we try to find the 90% credible interval.

Case A: pi(t) – flat prior (let be 1).

0.9 = int_a^b { P(X|t) dt }.

Case B: pi(t) has dependence from t, i.e.

0.9 = int_c^d { P(X|t) pi(t) dt }.

Factually, in the case B for each value of P(X|t) is given

weight pi(t), i.e. we change scale of the axis t and, correspondingly,

we incorporate this changing to upper and lower bounds of integration.

a and b go to c and d in accordance with function pi(t).

Of course, this numerical method is correct for resolving the most of

task, but we loose the probabilistic sense of this inference.

How to use the posterior distribution in frame of probabilistic paradigma ?

Sergey Bityukov

June 14, 2012 at 10:09 pm

I still do not get it: the probability measure has been modified when using another prior, but it still remains a probability measure.

June 15, 2012 at 9:40 am

In this reference in appendix you will find paradox. Any prior except uniform breaks the conservation of probability.

http://xxx.lanl.gov/abs/physics/0403069

February 25, 2013 at 6:36 am

One comment. To conserve the probability as measure you must redefine the definition of confidence interval (see, for example, http://xxx.lanl.gov/abs/1209.6545) as new prior is used.

June 11, 2012 at 6:01 pm

Dear Professor,

We try to construct several examples of the confidence distributions with conserving

of probability. It points to possibility for Monte Carlo constructions of confidence

distributions. Confidence distributions, in our opinion, is very useful notion for

combining results. I give the link to our paper in AIP Proceedings (MaxEnt’2010)

http://proceedings.aip.org/resource/2/apcpcs/1305/1/346_1

Sincerely yours,

Dr. Sergey Bityukov

Institute for high energy physics, Protvino,

Moscow region, Russia

June 11, 2012 at 7:10 pm

Thank you for pointing out this reference. I just read through it and I am still unconvinced of the appeal of the approach, I am afraid! Indeed, there is no result in your paper that proceeds from the confidence distribution perspective: a confidence interval on the only parameter in the model, μ

_{s}, could be derived another way, as it has been in your references [31,32,33].In addition, I find that the construction of those densities on the parameters are fraught with measure-theoretic danger: in particular, your derivation (5) and the subsequent density on μ

_{s}do not seem right. Indeed, in (5) the weights p_{0}and p_{1}depend on the parameters, which means that the weighted sum of the densities is not necessarily integrable to 1… (it does in the caseŝ=1!), but also that the subsequent density on μ_{s}depends on μ_{b}.June 11, 2012 at 7:41 pm

Thank you for quick answer and for your comments.

In (5) we suppose that mu_b is known (const). In principle, it is possible to use the distribution of mu_b (for example by Monte Carlo), but we not considered this case.

June 13, 2012 at 6:21 pm

One question more in defense of notation with confidence distribution.

The using of any prior in Bayesian approach (except uniform, that is not

a prior) is only the deformation of abscissa (in case of one dimension),

i.e., factually, reweighting of the probability. What sense of the

probability with weight ?

June 14, 2012 at 4:47 pm

I am afraid I do not get your question…