Archive for Bayesian tests

posterior predictive distributions of Bayes factors

Posted in Books, Kids, Statistics with tags , , , on October 8, 2014 by xi'an

Once a Bayes factor B(y)  is computed, one needs to assess its strength. As repeated many times here, Jeffreys’ scale has no validation whatsoever, it is simply a division of the (1,∞) range into regions of convenience. Following earlier proposals in the literature (Box, 1980; García-Donato and Chen, 2005; Geweke and Amisano, 2008), an evaluation of this strength within the issue at stake, i.e. the comparison of two models, can be based on the predictive distribution. While most authors (like García-Donato and Chen) consider the prior predictive, I think using the posterior predictive distribution is more relevant since

  1. it exploits the information contained in the data y, thus concentrates on a region of relevance in the parameter space(s), which is especially interesting in weakly informative settings (even though we should abstain from testing in those cases, dixit Andrew);
  2. it reproduces the behaviour of the Bayes factor B(x) for values x of the observation similar to the original observation y;
  3. it does not hide issues of indeterminacy linked with improper priors: the Bayes factor B(x) remains indeterminate, even with a well-defined predictive;
  4. it does not separate between errors of type I and errors of type II but instead uses the natural summary provided by the Bayesian analysis, namely the predictive distribution π(x|y);
  5. as long as the evaluation is not used to reach a decision, there is no issue of “using the data twice”, we are simply producing an estimator of the posterior loss, for instance the (posterior) probability of selecting the wrong model. The Bayes factor B(x) is thus functionally  independent of y, while x is probabilistically dependent on y.

Note that, even though probabilities of errors of type I and errors of type II can be computed, they fail to account for the posterior probabilities of both models. (This is the delicate issue with the solution of García-Donato and Chen.) Another nice feature is that the predictive distribution of the Bayes factor can be computed even in complex settings where ABC needs to be used.

Cancun, ISBA 2014 [½ day #2]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , on July 19, 2014 by xi'an


Half-day #2 indeed at ISBA 2014, as the Wednesday afternoon kept to the Valencia tradition of free time, and potential cultural excursions, so there were only talks in the morning. And still the core poster session at (late) night. In which my student Kaniav Kamari presented a poster on a current project we are running with Kerrie Mengersen and Judith Rousseau on the replacement of the standard Bayesian testing setting with a mixture representation. Being half-asleep by the time the session started, I did not stay long enough to collect data on the reactions to this proposal, but the paper should be arXived pretty soon. And Kate Lee gave a poster on our importance sampler for evidence approximation in mixtures (soon to be revised!). There was also an interesting poster about reparameterisation towards higher efficiency of MCMC algorithms, intersecting with my long-going interest in the matter, although I cannot find a mention of it in the abstracts. And I had a nice talk with Eduardo Gutierrez-Pena about infering on credible intervals through loss functions. There were also a couple of appealing posters on g-priors. Except I was sleepwalking by the time I spotted them… (My conference sleeping pattern does not work that well for ISBA meetings! Thankfully, both next editions will be in Europe.)

Great talk by Steve McEachern that linked to our ABC work on Bayesian model choice with insufficient statistics, arguing towards robustification of Bayesian inference by only using summary statistics. Despite this being “against the hubris of Bayes”… Obviously, the talk just gave a flavour of Steve’s perspective on that topic and I hope I can read more to see how we agree (or not!) on this notion of using insufficient summaries to conduct inference rather than trying to model “the whole world”, given the mistrust we must preserve about models and likelihoods. And another great talk by Ioanna Manolopoulou on another of my pet topics, capture-recapture, although she phrased it as a partly identified model (as in Kline’s talk yesterday). This related with capture-recapture in that when estimating a capture-recapture model with covariates, sampling and inference are biased as well. I appreciated particularly the use of BART to analyse the bias in the modelling. And the talk provided a nice counterpoint to the rather pessimistic approach of Kline’s.

Terrific plenary sessions as well, from Wilke’s spatio-temporal models (in the spirit of his superb book with Noel Cressie) to Igor Prunster’s great entry on Gibbs process priors. With the highly significant conclusion that those processes are best suited for (in the sense that they are only consistent for) discrete support distributions. Alternatives are to be used for continuous support distributions, the special case of a Dirichlet prior constituting a sort of unique counter-example. Quite an inspiring talk (even though I had a few micro-naps throughout it!).

I shared my afternoon free time between discussing the next O’Bayes meeting (2015 is getting very close!) with friends from the Objective Bayes section, getting a quick look at the Museo Maya de Cancún (terrific building!), and getting some work done (thanks to the lack of wireless…)

a refutation of Johnson’s PNAS paper

Posted in Books, Statistics, University life with tags , , , , , , , on February 11, 2014 by xi'an

Jean-Christophe Mourrat recently arXived a paper “P-value tests and publication bias as causes for high rate of non-reproducible scientific results?”, intended as a rebuttal of Val Johnson’s PNAS paper. The arguments therein are not particularly compelling. (Just as ours’ may sound so to the author.)

“We do not discuss the validity of this [Bayesian] hypothesis here, but we explain in the supplementary material that if taken seriously, it leads to incoherent results, and should thus be avoided for practical purposes.”

The refutation is primarily argued as a rejection of the whole Bayesian perspective. (Although we argue Johnson’ perspective is not that Bayesian…) But the argument within the paper is much simpler: if the probability of rejection under the null is at most 5%, then the overall proportion of false positives is also at most 5% and not 20% as argued in Johnson…! Just as simple as this. Unfortunately, the author mixes conditional and unconditional, frequentist and Bayesian probability models. As well as conditioning upon the data and conditioning upon the rejection region… Read at your own risk. Continue reading

Statistical evidence for revised standards

Posted in Statistics, University life with tags , , , , , , , , , on December 30, 2013 by xi'an

In yet another permutation of the original title (!), Andrew Gelman posted the answer Val Johnson sent him after our (submitted)  letter to PNAS. As Val did not send me a copy (although Andrew did!), I will not reproduce it here and I rather refer the interested readers to Andrews’ blog… In addition to Andrew’s (sensible) points, here are a few idle (post-X’mas and pre-skiing) reflections:

  • “evidence against a false null hypothesis accrues exponentially fast” makes me wonder in which metric this exponential rate (in γ?) occurs;
  • that “most decision-theoretic analyses of the optimal threshold to use for declaring a significant finding would lead to evidence thresholds that are substantially greater than 5 (and probably also greater 25)” is difficult to accept as an argument since there is no trace of a decision-theoretic argument in the whole paper;
  • Val rejects our minimaxity argument on the basis that “[UMPBTs] do not involve minimization of maximum loss” but the prior that corresponds to those tests is minimising the integrated probability of not rejecting at threshold level γ, a loss function integrated against parameter and observation, a Bayes risk in other words… Point masses or spike priors are clearly characteristics of minimax priors. Furthermore, the additional argument that “in most applications, however, a unique loss function/prior distribution combination does not exist” has been used by many to refute the Bayesian perspective and makes me wonder what are the arguments left in using a (pseudo-)Bayesian approach;
  • the next paragraph is pure tautology: the fact that “no other test, based on either a subjectively or objectively specified alternative hypothesis, is as likely to produce a Bayes factor that exceeds the specified evidence threshold” is a paraphrase of the definition of UMPBTs, not an argument. I do not see we should solely “worry about false negatives”, since minimising those should lead to a point mass on the null (or, more seriously, should not lead to the minimax-like selection of the prior under the alternative).

Revised evidence for statistical standards

Posted in Kids, Statistics, University life with tags , , , , , , , , on December 19, 2013 by xi'an

valizWe just submitted a letter to PNAS with Andrew Gelman last week, in reaction to Val Johnson’s recent paper “Revised standards for statistical evidence”, essentially summing up our earlier comments within 500 words. Actually, we wrote one draft each! In particular, Andrew came up with the (neat) rhetorical idea of alternative Ronald Fishers living in parallel universes who had each set a different significance reference level and for whom alternative Val Johnsons would rise and propose a modification of the corresponding Fisher’s level. For which I made the above graph, left out of the letter and its 500 words. It relates “the old z” and “the new z”, meaning the boundaries of the rejection zones when, for each golden dot, the “old z” is the previous “new z” and “the new z” is Johnson’s transform. We even figured out that Val’s transform was bringing the significance down by a factor of 10 in a large range of values. As an aside, we also wondered why most of the supplementary material was spent on deriving UMPBTs for specific (formal) problems when the goal of the paper sounded much more global…

As I am aware we are not the only ones to have submitted a letter about Johnson’s proposal, I am quite curious at the reception we will get from the editor! (Although I have to point out that all of my earlier submissions of letters to to PNAS got accepted.)

on alternative perspectives and solutions on Bayesian tests

Posted in Statistics, Travel, University life with tags , , , , , , , on December 16, 2013 by xi'an

Here are the slides of my tutorial at O’ Bayes 2013 today, a pot-pourri of various, recent and less recent, criticisms (with, albeit less than usual, a certain proportion of recycled slides):

Shravan’s comments on “Valen in Le Monde” [guest post]

Posted in Books, Statistics, University life with tags , , , , , , , on November 22, 2013 by xi'an

[Those are comments sent yesterday by Shravan Vasishth in connection with my post. Since they are rather lengthy, I made them into a post. Shravan is also the author of The foundations of Statistics and we got in touch through my review of the book . I may address some of his points later, but, for now, I find the perspective of a psycholinguist quite interesting to hear.]

Christian, Is the problem for you that the p-value, however low, is only going to tell you the probability of your data (roughly speaking) assuming the null is true, it’s not going to tell you anything about the probability of the alternative hypothesis, which is the real hypothesis of interest.

However, limiting the discussion to (Bayesian) hierarchical models (linear mixed models), which is the type of model people often fit in repeated measures studies in psychology (or at least in psycholinguistics), as long as the problem is about figuring out P(θ>0) or P(θ>0), the decision (to act as if θ>0) is going to be the same regardless of whether one uses p-values or a fully Bayesian approach. This is because the likelihood is going to dominate in the Bayesian model.

Andrew has objected to this line of reasoning by saying that making a decision like θ>0 is not a reasonable one in the first place. That is true in some cases, where the result of one experiment never replicates because of study effects or whatever. But there are a lot of effects which are robust and replicable, and where it makes sense to ask these types of questions.

One central issue for me is: in situations like these, using a low p-value to make such a decision is going to yield pretty similar outcomes compared to doing inference using the posterior distribution. The machinery needed to do a fully Bayesian analysis is very intimidating; you need to know a lot, and you need to do a lot more coding and checking than when you fit an lmer type of model.

It took me 1.5 to 2 years of hard work (=evenings spent not reading novels) to get to the point that I knew roughly what I was doing when fitting Bayesian models. I don’t blame anyone for not wanting to put their life on hold to get to such a point. I find the Bayesian method attractive because it actually answers the question I really asked, namely is θ>0 or θ<0? This is really great, I don’t have beat around the bush any more! (there; I just used an exclamation mark). But for the researcher unwilling (or more likely: unable) to invest the time into the maths and probability theory and the world of BUGS, the distance between a heuristic like a low p-value and the more sensible Bayesian approach is not that large.


Get every new post delivered to your Inbox.

Join 670 other followers