In case you are in Paris tomorrow and free, there will be an AppliBUGS day focussing on the contributions of our friend Jean-Louis Foulley. (And a regular contributor to the ‘Og!) The meeting takes place in the ampitheatre on second floor of ENGREF-Montparnasse (19 av du Maine, 75015 Paris, Métro Montparnasse Bienvenüe). I will give a part of the O’Bayes tutorial on alternatives to the Bayes factor.
Archive for Bayes factor
Jean-Christophe Mourrat recently arXived a paper “P-value tests and publication bias as causes for high rate of non-reproducible scientific results?”, intended as a rebuttal of Val Johnson’s PNAS paper. The arguments therein are not particularly compelling. (Just as ours’ may sound so to the author.)
“We do not discuss the validity of this [Bayesian] hypothesis here, but we explain in the supplementary material that if taken seriously, it leads to incoherent results, and should thus be avoided for practical purposes.”
The refutation is primarily argued as a rejection of the whole Bayesian perspective. (Although we argue Johnson’ perspective is not that Bayesian…) But the argument within the paper is much simpler: if the probability of rejection under the null is at most 5%, then the overall proportion of false positives is also at most 5% and not 20% as argued in Johnson…! Just as simple as this. Unfortunately, the author mixes conditional and unconditional, frequentist and Bayesian probability models. As well as conditioning upon the data and conditioning upon the rejection region… Read at your own risk. Continue reading
Perrakis, Ntzoufras, and Tsionas just arXived a paper on marginal likelihood (evidence) approximation (with the above title). The idea behind the paper is to base importance sampling for the evidence on simulations from the product of the (block) marginal posterior distributions. Those simulations can be directly derived from an MCMC output by randomly permuting the components. The only critical issue is to find good approximations to the marginal posterior densities. This is handled in the paper either by normal approximations or by Rao-Blackwell estimates. the latter being rather costly since one importance weight involves B.L computations, where B is the number of blocks and L the number of samples used in the Rao-Blackwell estimates. The time factor does not seem to be included in the comparison studies run by the authors, although it would seem necessary when comparing scenarii.
After a standard regression example (that did not include Chib’s solution in the comparison), the paper considers 2- and 3-component mixtures. The discussion centres around label switching (of course) and the deficiencies of Chib’s solution against the current method and Neal’s reference. The study does not include averaging Chib’s solution over permutations as in Berkoff et al. (2003) and Marin et al. (2005), an approach that does eliminate the bias. Especially for a small number of components. Instead, the authors stick to the log(k!) correction, despite it being known for being quite unreliable (depending on the amount of overlap between modes). The final example is Diggle et al. (1995) longitudinal Poisson regression with random effects on epileptic patients. The appeal of this model is the unavailability of the integrated likelihood which implies either estimating it by Rao-Blackwellisation or including the 58 latent variables in the analysis. (There is no comparison with other methods.)
A strange title, if any! (The whetstone is a natural hard stone used for sharpening steel instruments, like knifes or sickles and scythes, I remember my grand-fathers handling one when cutting hay and weeds. Alum is hydrated potassium aluminium sulphate and is used as a blood coagulant. Both items are naturally related with shaving and razors, if not with Occam!) The whole title of the paper published by Guido Consonni, Jon Forster and Luca La Rocca in Statistical Science is “The whetstone and the alum block: balanced objective Bayesian comparison of nested models for discrete data“. The paper builds on the notions introduced in the last Valencia meeting by Guido and Luca (and discussed by Judith Rousseau and myself).
Beyond the pun (that forced me to look for “alum stone” on Wikipedia!, and may be lost on some other non-native readers), the point in the title is to build a prior distribution aimed at the comparison of two models such that those models are more sharply distinguished: Occam’s razor would thus cut better when the smaller model is true (hence the whetstone) and less when it is not (hence the alum block)… The solution proposed by the authors is to replace the reference prior on the larger model, π1, with a moment prior à la Johnson and Rossell (2010, JRSS B) and then to turn this moment prior into an intrinsic prior à la Pérez and Berger (2002, Biometrika), making it an “intrinsic moment”. The first transform turns π1 into a non-local prior, with the aim of correcting for the imbalanced convergence rates of the Bayes factor under the null and under the alternative (this is the whetstone). The second transform accumulates more mass in the vicinity of the null model (this is the alum block). (While I like the overall perspective on intrinsic priors, the introduction is a wee confusing about them, e.g. when it mentions fictive observations instead of predictives.)
Being a referee for this paper, I read it in detail (and also because this is one of my favourite research topics!) Further, we already engaged into a fruitful discussion with Guido since the last Valencia meeting and the current paper incorporates some of our comments (and replies to others). I find the proposal of the authors clever and interesting, but not completely Bayesian. Overall, the paper provides a clearly novel methodology that calls for further studies…
Aris Spanos just published a paper entitled “Who should be afraid of the Jeffreys-Lindley paradox?” in the journal Philosophy of Science. This piece is a continuation of the debate about frequentist versus llikelihoodist versus Bayesian (should it be Bayesianist?! or Laplacist?!) testing approaches, exposed in Mayo and Spanos’ Error and Inference, and discussed in several posts of the ‘Og. I started reading the paper in conjunction with a paper I am currently writing for a special volume in honour of Dennis Lindley, paper that I will discuss later on the ‘Og…
“…the postdata severity evaluation (…) addresses the key problem with Fisherian p-values in the sense that the severity evaluation provides the “magnitude” of the warranted discrepancy from the null by taking into account the generic capacity of the test (that includes n) in question as it relates to the observed data”(p.88)
First, the antagonistic style of the paper is reminding me of Spanos’ previous works in that it relies on repeated value judgements (such as “Bayesian charge”, “blatant misinterpretation”, “Bayesian allegations that have undermined the credibility of frequentist statistics”, “both approaches are far from immune to fallacious interpretations”, “only crude rules of thumbs”, &tc.) and rhetorical sleights of hand. (See, e.g., “In contrast, the severity account ensures learning from data by employing trustworthy evidence (…), the reliability of evidence being calibrated in terms of the relevant error probabilities” [my stress].) Connectedly, Spanos often resorts to an unusual [at least for statisticians] vocabulary that amounts to newspeak. Here are some illustrations: “summoning the generic capacity of the test”, ‘substantively significant”, “custom tailoring the generic capacity of the test”, “the fallacy of acceptance”, “the relevance of the generic capacity of the particular test”, yes the term “generic capacity” is occurring there with a truly high frequency. Continue reading