## prior sensitivity of the marginal likelihood

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , on June 27, 2022 by xi'an

Fernando Llorente and (Madrilene) coauthors have just arXived a paper on the safe use of prior densities for Bayesian model selection. Rather than blaming the Bayes factor, or excommunicating some improper priors, they consider in this survey solutions to design “objective” priors in model selection. (Writing this post made me realised I had forgotten to arXive a recent piece I wrote on the topic, based on short courses and blog pieces, for an incoming handbook on Bayesian advance(ment)s! Soon to be corrected.)

While intrinsically interested in the topic and hence with the study, I somewhat disagree with the perspective adopted by the authors. They for instance stick to the notion that a flat prior over the parameter space is appropriate as “the maximal expression of a non-informative prior” (despite depending on the parameterisation). Over bounded sets at least, while advocating priors “with great scale parameter” otherwise. They also refer to Jeffreys (1939) priors, by which they mean estimation priors rather than testing priors. As uncovered by Susie Bayarri and Gonzalo Garcia-Donato. Considering asymptotic consistency, they state that “in the asymptotic regime, Bayesian model selection is more sensitive to the sample size D than to the prior specifications”, which I find both imprecise and confusing,  as my feeling is that the prior specification remains overly influential as the sample size increases. (In my view, consistency is a minimalist requirement, rather than “comforting”.) The argument therein that a flat prior is informative for model choice stems from the fact that the marginal likelihood goes to zero as the support of the prior goes to infinity, which may have been an earlier argument of Jeffreys’ (1939), but does not carry much weight as the property is shared by many other priors (as remarked later). Somehow, the penalisation aspect of the marginal is not exploited more deeply in the paper. In the “objective” Bayes section, they adhere to the (convenient but weakly supported) choice of a common prior on the nuisance parameters (shared by different models). Their main argument is to develop (heretic!) “data-based priors”, from Aitkin (1991, not cited) double use of the data (or setting the likelihood to the power two), all the way to the intrinsic and fractional Bayes factors of Tony O’Hagan (1995), Jim Berger and Luis Pericchi (1996), and to the expected posterior priors of Pérez and Berger (2002) on which I worked with Juan Cano and Diego Salmeròn. (While the presentation is made against a flat prior, nothing prevents the use of another reference, improper, prior.) A short section also mentions the X-validation approach(es) of Aki Vehtari and co-authors.

## [de]quarantined by slideshare

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on January 11, 2021 by xi'an

## are there a frequentist and a Bayesian likelihoods?

Posted in Statistics with tags , , , , , , , , , , on June 7, 2018 by xi'an

A question that came up on X validated and led me to spot rather poor entries in Wikipedia about both the likelihood function and Bayes’ Theorem. Where unnecessary and confusing distinctions are made between the frequentist and Bayesian versions of these notions. I have already discussed the later (Bayes’ theorem) a fair amount here. The discussion about the likelihood is quite bemusing, in that the likelihood function is the … function of the parameter equal to the density indexed by this parameter at the observed value.

“What we can find from a sample is the likelihood of any particular value of r, if we define the likelihood as a quantity proportional to the probability that, from a population having the particular value of r, a sample having the observed value of r, should be obtained.” R.A. Fisher, On the “probable error’’ of a coefficient of correlation deduced from a small sample. Metron 1, 1921, p.24

By mentioning an informal side to likelihood (rather than to likelihood function), and then stating that the likelihood is not a probability in the frequentist version but a probability in the Bayesian version, the W page makes a complete and unnecessary mess. Whoever is ready to rewrite this introduction is more than welcome! (Which reminded me of an earlier question also on X validated asking why a common reference measure was needed to define a likelihood function.)

This also led me to read a recent paper by Alexander Etz, whom I met at E.J. Wagenmakers‘ lab in Amsterdam a few years ago. Following Fisher, as Jeffreys complained about

“..likelihood, a convenient term introduced by Professor R.A. Fisher, though in his usage it is sometimes multiplied by a constant factor. This is the probability of the observations given the original information and the hypothesis under discussion.” H. Jeffreys, Theory of Probability, 1939, p.28

Alexander defines the likelihood up to a constant, which causes extra-confusion, for free!, as there is no foundational reason to introduce this degree of freedom rather than imposing an exact equality with the density of the data (albeit with an arbitrary choice of dominating measure, never neglect the dominating measure!). The paper also repeats the message that the likelihood is not a probability (density, missing in the paper). And provides intuitions about maximum likelihood, likelihood ratio and Wald tests. But does not venture into a separate definition of the likelihood, being satisfied with the fundamental notion to be plugged into the magical formula

posteriorprior×likelihood

## JASP, a really really fresh way to do stats

Posted in Statistics with tags , , , , , , on February 1, 2018 by xi'an

## Bayesian workers, unite!

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , on January 19, 2018 by xi'an

This afternoon, Alexander Ly is defending his PhD thesis at the University of Amsterdam. While I cannot attend the event, I want to celebrate the event and a remarkable thesis around the Bayes factor [even though we disagree on its role!] and the Jeffreys’s Amazing Statistics Program (!), otherwise known as JASP. Plus commend the coolest thesis cover I ever saw, made by the JASP graphical designer Viktor Beekman and representing Harold Jeffreys leading empirical science workers in the best tradition of socialist realism! Alexander wrote a post on the JASP blog to describe the thesis, the cover, and his plans for the future. Congratulations!