Archive for improper prior

[more than] everything you always wanted to know about marginal likelihood

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on February 10, 2022 by xi'an

Earlier this year, F. Llorente, L. Martino, D. Delgado, and J. Lopez-Santiago have arXived an updated version of their massive survey on marginal likelihood computation. Which I can only warmly recommend to anyone interested in the matter! Or looking for a base camp to initiate a graduate project. They break the methods into four families

  1. Deterministic approximations (e.g., Laplace approximations)
  2. Methods based on density estimation (e.g., Chib’s method, aka the candidate’s formula)
  3. Importance sampling, including sequential Monte Carlo, with a subsection connecting with MCMC
  4. Vertical representations (mostly, nested sampling)

Besides sheer computation, the survey also broaches upon issues like improper priors and alternatives to Bayes factors. The parts I would have done in more details are reversible jump MCMC and the long-lasting impact of Geyer’s reverse logistic regression (with the noise contrasting extension), even though the link with bridge sampling is briefly mentioned there. There is even a table reporting on the coverage of earlier surveys. Of course, the following postnote of the manuscript

The Christian Robert’s blog deserves a special mention , since Professor C. Robert has devoted several entries of his blog with very interesting comments regarding the marginal likelihood estimation and related topics.

does not in the least make me less objective! Some of the final recommendations

  • use of Naive Monte Carlo [simulate from the prior] should be always considered [assuming a proper prior!]
  • a multiple-try method is a good choice within the MCMC schemes
  • optimal umbrella sampling estimator is difficult and costly to implement , so its best performance may not be achieved in practice
  • adaptive importance sampling uses the posterior samples to build a suitable normalized proposal, so it benefits from localizing samples in regions of high posterior probability while preserving the properties of standard importance sampling
  • Chib’s method is a good alternative, that provide very good performances [but is not always available]
  • the success [of nested sampling] in the literature is surprising.

Mea Culpa

Posted in Statistics with tags , , , , , , , , , , , on April 10, 2020 by xi'an

[A quote from Jaynes about improper priors that I had missed in his book, Probability Theory.]

For many years, the present writer was caught in this error just as badly as anybody else, because Bayesian calculations with improper priors continued to give just the reasonable and clearly correct results that common sense demanded. So warnings about improper priors went unheeded; just that psychological phenomenon. Finally, it was the marginalization paradox that forced recognition that we had only been lucky in our choice of problems. If we wish to consider an improper prior, the only correct way of doing it is to approach it as a well-defined limit of a sequence of proper priors. If the correct limiting procedure should yield an improper posterior pdf for some parameter α, then probability theory is telling us that the prior information and data are too meager to permit any inferences about α. Then the only remedy is to seek more data or more prior information; probability theory does not guarantee in advance that it will lead us to a useful answer to every conceivable question.Generally, the posterior pdf is better behaved than the prior because of the extra information in the likelihood function, and the correct limiting procedure yields a useful posterior pdf that is analytically simpler than any from a proper prior. The most universally useful results of Bayesian analysis obtained in the past are of this type, because they tended to be rather simple problems, in which the data were indeed so much more informative than the prior information that an improper prior gave a reasonable approximation – good enough for all practical purposes – to the strictly correct results (the two results agreed typically to six or more significant figures).

In the future, however, we cannot expect this to continue because the field is turning to more complex problems in which the prior information is essential and the solution is found by computer. In these cases it would be quite wrong to think of passing to an improper prior. That would lead usually to computer crashes; and, even if a crash is avoided, the conclusions would still be, almost always, quantitatively wrong. But, since likelihood functions are bounded, the analytical solution with proper priors is always guaranteed to converge properly to finite results; therefore it is always possible to write a computer program in such a way (avoid underflow, etc.) that it cannot crash when given proper priors. So, even if the criticisms of improper priors on grounds of marginalization were unjustified,it remains true that in the future we shall be concerned necessarily with proper priors.

demystify Lindley’s paradox [or not]

Posted in Statistics with tags , , , , , on March 18, 2020 by xi'an

Another paper on Lindley’s paradox appeared on arXiv yesterday, by Guosheng Yin and Haolun Shi, interpreting posterior probabilities as p-values. The core of this resolution is to express a two-sided hypothesis as a combination of two one-sided hypotheses along the opposite direction, taking then advantage of the near equivalence of posterior probabilities under some non-informative prior and p-values in the later case. As already noted by George Casella and Roger Berger (1987) and presumably earlier. The point is that one-sided hypotheses are quite friendly to improper priors, since they only require a single prior distribution. Rather than two when point nulls are under consideration. The p-value created by merging both one-sided hypotheses makes little sense to me as it means testing that both θ≥0 and θ≤0, resulting in the proposal of a p-value that is twice the minimum of the one-sided p-values, maybe due to a Bonferroni correction, although the true value should be zero… I thus see little support for this approach to resolving Lindley paradox in that it bypasses the toxic nature of point-null hypotheses that require a change of prior toward a mixture supporting one hypothesis and the other. Here the posterior of the point-null hypothesis is defined in exactly the same way the p-value is defined, hence making the outcome most favourable to the agreement but not truly addressing the issue.

a question from McGill about The Bayesian Choice

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , on December 26, 2018 by xi'an

I received an email from a group of McGill students working on Bayesian statistics and using The Bayesian Choice (although the exercise pictured below is not in the book, the closest being exercise 1.53 inspired from Raiffa and Shlaiffer, 1961, and exercise 5.10 as mentioned in the email):

There was a question that some of us cannot seem to decide what is the correct answer. Here are the issues,

Some people believe that the answer to both is ½, while others believe it is 1. The reasoning for ½ is that since Beta is a continuous distribution, we never could have θ exactly equal to ½. Thus regardless of α, the probability that θ=½ in that case is 0. Hence it is ½. I found a related stack exchange question that seems to indicate this as well.

The other side is that by Markov property and mean of Beta(a,a), as α goes to infinity , we will approach ½ with probability 1. And hence the limit as α goes to infinity for both (a) and (b) is 1. I think this also could make sense in another context, as if you use the Bayes factor representation. This is similar I believe to the questions in the Bayesian Choice, 5.10, and 5.11.

As it happens, the answer is ½ in the first case (a) because π(H⁰) is ½ regardless of α and 1 in the second case (b) because the evidence against H⁰ goes to zero as α goes to zero (watch out!), along with the mass of the prior on any compact of (0,1) since Γ(2α)/Γ(α)². (The limit does not correspond to a proper prior and hence is somewhat meaningless.) However, when α goes to infinity, the evidence against H⁰ goes to infinity and the posterior probability of ½ goes to zero, despite the prior under the alternative being more and more concentrated around ½!

back to the Bayesian Choice

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on October 17, 2018 by xi'an

Surprisingly (or not?!), I received two requests about some exercises from The Bayesian Choice, one from a group of students from McGill having difficulties solving the above, wondering about the properness of the posterior (but missing the integration of x), to whom I sent back this correction. And another one from the Czech Republic about a difficulty with the term “evaluation” by which I meant (pardon my French!) estimation.

%d bloggers like this: