## prior sensitivity of the marginal likelihood

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , on June 27, 2022 by xi'an

Fernando Llorente and (Madrilene) coauthors have just arXived a paper on the safe use of prior densities for Bayesian model selection. Rather than blaming the Bayes factor, or excommunicating some improper priors, they consider in this survey solutions to design “objective” priors in model selection. (Writing this post made me realised I had forgotten to arXive a recent piece I wrote on the topic, based on short courses and blog pieces, for an incoming handbook on Bayesian advance(ment)s! Soon to be corrected.)

While intrinsically interested in the topic and hence with the study, I somewhat disagree with the perspective adopted by the authors. They for instance stick to the notion that a flat prior over the parameter space is appropriate as “the maximal expression of a non-informative prior” (despite depending on the parameterisation). Over bounded sets at least, while advocating priors “with great scale parameter” otherwise. They also refer to Jeffreys (1939) priors, by which they mean estimation priors rather than testing priors. As uncovered by Susie Bayarri and Gonzalo Garcia-Donato. Considering asymptotic consistency, they state that “in the asymptotic regime, Bayesian model selection is more sensitive to the sample size D than to the prior specifications”, which I find both imprecise and confusing,  as my feeling is that the prior specification remains overly influential as the sample size increases. (In my view, consistency is a minimalist requirement, rather than “comforting”.) The argument therein that a flat prior is informative for model choice stems from the fact that the marginal likelihood goes to zero as the support of the prior goes to infinity, which may have been an earlier argument of Jeffreys’ (1939), but does not carry much weight as the property is shared by many other priors (as remarked later). Somehow, the penalisation aspect of the marginal is not exploited more deeply in the paper. In the “objective” Bayes section, they adhere to the (convenient but weakly supported) choice of a common prior on the nuisance parameters (shared by different models). Their main argument is to develop (heretic!) “data-based priors”, from Aitkin (1991, not cited) double use of the data (or setting the likelihood to the power two), all the way to the intrinsic and fractional Bayes factors of Tony O’Hagan (1995), Jim Berger and Luis Pericchi (1996), and to the expected posterior priors of Pérez and Berger (2002) on which I worked with Juan Cano and Diego Salmeròn. (While the presentation is made against a flat prior, nothing prevents the use of another reference, improper, prior.) A short section also mentions the X-validation approach(es) of Aki Vehtari and co-authors.

## confidence in confidence

Posted in Statistics, University life with tags , , , , on June 8, 2022 by xi'an

[This is a ghost post that I wrote eons ago and which got lost in the meanwhile.]

Following the false confidence paper, Céline Cunen, Niels Hjort & Tore Schweder wrote a short paper in the same Proceedings A defending confidence distributions. And blame the phenomenon on Bayesian tools, which “might have unfortunate frequentist properties”. Which comes as no surprise since Tore Schweder and Nils Hjort wrote a book promoting confidence distributions for statistical inference.

“…there will never be any false confidence, and we can trust the obtained confidence! “

Their re-analysis of Balch et al (2019) is that using a flat prior on the location (of a satellite) leads to a non-central chi-square distribution as the posterior on the squared distance δ² (between two satellites). Which incidentally happens to be a case pointed out by Jeffreys (1939) against the use of the flat prior as δ² has a constant bias of d (the dimension of the space) plus the non-centrality parameter. And offers a neat contrast between the posterior, with non-central chi-squared cdf with two degrees of freedom

$F(\delta)=\Gamma_2(\delta^2/\sigma^2;||y||^2/\sigma^2)$

and the confidence “cumulative distribution”

$C(\delta)=1-\Gamma_2(|y||^2/\sigma^2;\delta^2/\sigma^2)$

Cunen et al (2020) argue that the frequentist properties of the confidence distribution 1-C(R), where R is the impact distance, are robust to an increasing σ when the true value is also R. Which does not seem to demonstrate much. A second illustration of B and C when the distance δ varies and both σ and |y|² are fixed is even more puzzling when the authors criticize the Bayesian credible interval for missing the “true” value of δ, as I find the statement meaningless for a fixed value of |y|²… Looking forward the third round!, i.e. a rebuttal by Balch et al (2019)

Posted in Books, Statistics, University life with tags , , , , , , on May 15, 2015 by xi'an

[Here is a reply by Pierre Druihlet to my comments on his paper.]

There are several goals in the paper, the last one being the most important one.

The first one is to insist that considering θ as a parameter is not appropriate. We are in complete agreement on that point, but I prefer considering l(θ) as the parameter rather than N, mainly because it is much simpler. Knowing N, the law of l(θ) is given by the law of a random walk with 0 as reflexive boundary (Jaynes in his book, explores this link). So for a given prior on N, we can derive a prior on l(θ). Since the random process that generate N is completely unknown, except that N is probably large, the true law of l(θ) is completely unknown, so we may consider l(θ).

The second one is to state explicitly that a flat prior on θ implies an exponentially increasing prior on l(θ). As an anecdote, Stone, in 1972, warned against this kind of prior for Gaussian models. Another interesting anecdote is that he cited the novel by Abbot “Flatland : a romance of many dimension” who described a world where the dimension is changed. This is exactly the case in the FP since θ has to be seen in two dimensions rather than in one dimension.

The third one is to make a distinction between randomness of the parameter and prior distribution, each one having its own rule. This point is extensively discussed in Section 2.3.
– In the intuitive reasoning, the probability of no annihilation involves the true joint distribution on (θ, x) and therefore the true unknown distribution of θ,.
– In the Bayesian reasoning, the posterior probability of no annihilation is derived from the prior distribution which is improper. The underlying idea is that a prior distribution does not obey probability rules but belongs to a projective space of measure. This is especially true if the prior does not represent an accurate knowledge. In that case, there is no discontinuity between proper and improper priors and therefore the impropriety of the distribution is not a key point. In that context, the joint and marginal distributions are irrelevant, not because the prior is improper, but because it is a prior and not a true law. If the prior were the true probability law of θ,, then the flat distribution could not be considered as a limit of probability distributions.

For most applications, the distinction between prior and probability law is not necessary and even pedantic, but it may appear essential in some situations. For example, in the Jeffreys-Lindley paradox, we may note that the construction of the prior is not compatible with the projective space structure.