**A** great Bayesian Analysis webinar this afternoon with well-balanced presentations by Steve MacEachern and John Lewis, and original discussions by Bertrand Clarke and Fabrizio Rugieri. Which attracted 122 participants. I particularly enjoyed Bertrand’s points that likelihoods were more general than models [made in 6 different wordings!] and that this paper was closer to the M-open perspective. I think I eventually got the reason why the approach could be seen as an ABC with ε=0, since the simulated y’s all get the right statistic, but this presentation does not bring a strong argument in favour of the restricted likelihood approach, when considering the methodological and computational effort. The discussion also made me wonder if tools like VAEs could be used towards approximating the distribution of T(y) conditional on the parameter θ. This is also an opportunity to thank my friend Michele Guindani for his hard work as Editor of Bayesian Analysis and in particular for keeping the discussion tradition thriving!

## Archive for likelihood function

## why is the likelihood not a pdf?

Posted in Books, pictures, Statistics, University life with tags Bayesian statistics, dominating measure, history of statistics, integration, invariance, inverse probability, likelihood function, Ronald Fisher, uniform prior on January 4, 2021 by xi'an**T**he return of an old debate on X validated. *Can the likelihood be a pdf?! *Even though there exist cases where a [version of the] likelihood function shows such a symmetry between the sufficient statistic and the parameter, as e.g. in the Normal mean model, that they are somewhat exchangeable w.r.t. the same measure, the question is somewhat meaningless for a number of reasons that we can all link to Ronald Fisher:

- when defining the likelihood function, Fisher (in his 1912 undergraduate memoir!) warns against integrating it w.r.t. the parameter:
*“the integration with respect to m is illegitimate and has no definite meaning with respect to inverse probability”*. The likelihood is*“is a relative probability only, suitable to compare point with point, but incapable of being interpreted as a probability distribution over a region, or of giving any estimate of absolute probability.”*And again in 1922: “[the likelihood]*is not a differential element, and is incapable of being integrated: it is assigned to a particular point of the range of variation, not to a particular element of it”*. - He introduced the term “likelihood” especially to avoid the confusion:
*“I perceive that the word probability is wrongly used in such a connection: probability is a ratio of frequencies, and about the frequencies of such values we can know nothing whatever (…) I suggest that we may speak without confusion of the likelihood of one value of p being thrice the likelihood of another (…) likelihood is not here used loosely as a synonym of probability, but simply to express the relative frequencies with which such values of the hypothetical quantity p would in fact yield the observed sample”*. - Another point he makes repeatedly (both in 1912 and 1922) is the lack of invariance of the probability measure obtained by attaching a dθ to the likelihood function L(θ) and normalising it into a density: while the likelihood
*“is entirely unchanged by any [one-to-one] transformation”,*this definition of a probability distribution is not. Fisher actually distanced himself from a Bayesian “uniform prior” throughout the 1920’s.

which sums up as the urge to never neglect the dominating measure!

## Mea Culpa

Posted in Statistics with tags Bayesian Analysis, Bayesian foundations, book review, E.T. Jaynes, improper posteriors, improper prior, information, John Skilling, likelihood function, marginalisation paradoxes, prior information, probability theory on April 10, 2020 by xi'an*[A quote from Jaynes about improper priors that I had missed in his book, Probability Theory.]*

For many years, the present writer was caught in this error just as badly as anybody else, because Bayesian calculations with improper priors continued to give just the reasonable and clearly correct results that common sense demanded. So warnings about improper priors went unheeded; just that psychological phenomenon. Finally, it was the marginalization paradox that forced recognition that we had only been lucky in our choice of problems. If we wish to consider an improper prior, the only correct way of doing it is to approach it as a well-defined limit of a sequence of proper priors. If the correct limiting procedure should yield an improper posterior pdf for some parameter α, then probability theory is telling us that the prior information and data are too meager to permit any inferences about α. Then the only remedy is to seek more data or more prior information; probability theory does not guarantee in advance that it will lead us to a useful answer to every conceivable question.Generally, the posterior pdf is better behaved than the prior because of the extra information in the likelihood function, and the correct limiting procedure yields a useful posterior pdf that is analytically simpler than any from a proper prior. The most universally useful results of Bayesian analysis obtained in the past are of this type, because they tended to be rather simple problems, in which the data were indeed so much more informative than the prior information that an improper prior gave a reasonable approximation – good enough for all practical purposes – to the strictly correct results (the two results agreed typically to six or more significant figures).

In the future, however, we cannot expect this to continue because the field is turning to more complex problems in which the prior information is essential and the solution is found by computer. In these cases it would be quite wrong to think of passing to an improper prior. That would lead usually to computer crashes; and, even if a crash is avoided, the conclusions would still be, almost always, quantitatively wrong. But, since likelihood functions are bounded, the analytical solution with proper priors is always guaranteed to converge properly to finite results; therefore it is always possible to write a computer program in such a way (avoid underflow, etc.) that it cannot crash when given proper priors. So, even if the criticisms of improper priors on grounds of marginalization were unjustified,it remains true that in the future we shall be concerned necessarily with proper priors.

## my likelihood is dominating my prior [not!]

Posted in Kids, Statistics with tags Bayesian inference, cross validated, likelihood function, Likelihood Principle, magnitude, scaling on August 29, 2019 by xi'an**A**n interesting misconception read on X validated today, with a confusion between the absolute value of the likelihood function and its variability. Which I have trouble explaining except possibly by the extrapolation from the discrete case and a confusion between the probability density of the data [scaled as a probability] and the likelihood function [scale-less]. I also had trouble convincing the originator of the question of the irrelevance of the scale of the likelihood *per se*, even when demonstrating that |𝚺| could vanish from the posterior with no consequence whatsoever. It is only when I thought of the case when the likelihood is constant in 𝜃 that I managed to make my case.

## are there a frequentist and a Bayesian likelihoods?

Posted in Statistics with tags Bayes factor, Bayes formula, cross validated, dominating measure, Harold Jeffreys, likelihood function, Metron, probability theory, R.A. Fisher, University of Amsterdam, wikipedia on June 7, 2018 by xi'an**A** question that came up on X validated and led me to spot rather poor entries in Wikipedia about both the likelihood function and Bayes’ Theorem. Where unnecessary and confusing distinctions are made between the frequentist and Bayesian versions of these notions. I have already discussed the later (Bayes’ theorem) a fair amount here. The discussion about the likelihood is quite bemusing, in that the likelihood function is the … function of the parameter equal to the density indexed by this parameter at the observed value.

“What we can find from a sample is the likelihood of any particular value of r, if we define the likelihood as a quantity proportional to the probability that, from a population having the particular value of r, a sample having the observed value of r, should be obtained.”R.A. Fisher,On the “probable error’’ of a coefficient of correlation deduced from a small sample.Metron1, 1921, p.24

By mentioning an informal side to likelihood (rather than to likelihood function), and then stating that the likelihood is not a probability in the frequentist version but a probability in the Bayesian version, the W page makes a complete and unnecessary mess. Whoever is ready to rewrite this introduction is more than welcome! (Which reminded me of an earlier question also on X validated asking why a common reference measure was needed to define a likelihood function.)

This also led me to read a recent paper by Alexander Etz, whom I met at E.J. Wagenmakers‘ lab in Amsterdam a few years ago. Following Fisher, as Jeffreys complained about

“..likelihood, a convenient term introduced by Professor R.A. Fisher, though in his usage it is sometimes multiplied by a constant factor. This is the probability of the observations given the original information and the hypothesis under discussion.”H. Jeffreys,Theory of Probability, 1939, p.28

Alexander defines the likelihood up to a constant, which causes extra-confusion, for free!, as there is no foundational reason to introduce this degree of freedom rather than imposing an exact equality with the density of the data (albeit with an arbitrary choice of dominating measure, never neglect the dominating measure!). The paper also repeats the message that the likelihood is not a probability (density, *missing in the paper*). And provides intuitions about maximum likelihood, likelihood ratio and Wald tests. But does not venture into a separate definition of the likelihood, being satisfied with the fundamental notion to be plugged into the magical formula

posterior∝prior×likelihood