A new approach to Bayesian hypothesis testing

Posted in Books, Statistics with tags , , , , , on September 8, 2016 by xi'an

“The main purpose of this paper is to develop a new Bayesian hypothesis testing approach for the point null hypothesis testing (…) based on the Bayesian deviance and constructed in a decision theoretical framework. It can be regarded as the Bayesian version of the likelihood ratio test.”

This paper got published in Journal of Econometrics two years ago but I only read it a few days ago when Kerrie Mengersen pointed it out to me. Here is an interesting criticism of Bayes factors.

“In the meantime, unfortunately, Bayes factors also suffers from several theoretical and practical difficulties. First, when improper prior distributions are used, Bayes factors contains undefined constants and takes arbitrary values (…) Second, when a proper but vague prior distribution with a large spread is used to represent prior ignorance, Bayes factors tends to favour the null hypothesis. The problem may persist even when the sample size is large (…) Third, the calculation of Bayes factors generally requires the evaluation of marginal likelihoods. In many models, the marginal likelihoods may be difficult to compute.”

I completely agree with these points, which are part of a longer list in our testing by mixture estimation paper. The authors also rightly blame the rigidity of the 0-1 loss function behind the derivation of the Bayes factor. An alternative decision-theoretic based on the Kullback-Leibler distance has been proposed by José Bernardo and Raúl Rueda, in a 2002 paper, evaluating the average divergence between the null and the full under the full, with the slight drawback that any nuisance parameter has the same prior under both hypotheses. (Which makes me think of the Savage-Dickey paradox, since everything here seems to take place under the alternative.) And the larger drawback of requiring a lower bound for rejecting the null. (Although it could be calibrated under the null prior predictive.)

This paper suggests using instead the difference of the Bayesian deviances, which is the expected log ratio integrated against the posterior. (With the possible embarrassment of the quantity having no prior expectation since the ratio depends on the data. But after all the evidence or marginal likelihood faces the same “criticism”.) So it is a sort of Bayes factor on the logarithms, with a strong similarity with Bernardo & Rueda’s solution since they are equal in expectation under the marginal. As in Dawid et al.’s recent paper, the logarithm removes the issue with the normalising constant and with the Lindley-Jeffreys paradox. The approach then needs to be calibrated in order to define a decision bound about the null. The asymptotic distribution of the criterion is  χ²(p)−p, where p is the dimension of the parameter to be tested, but this sounds like falling back on frequentist tests. And the deadly .05% bounds. I would rather favour a calibration of the criterion using prior or posterior predictives under both models…

Measuring statistical evidence using relative belief [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on July 22, 2015 by xi'an

“It is necessary to be vigilant to ensure that attempts to be mathematically general do not lead us to introduce absurdities into discussions of inference.” (p.8)

This new book by Michael Evans (Toronto) summarises his views on statistical evidence (expanded in a large number of papers), which are a quite unique mix of Bayesian  principles and less-Bayesian methodologies. I am quite glad I could receive a version of the book before it was published by CRC Press, thanks to Rob Carver (and Keith O’Rourke for warning me about it). [Warning: this is a rather long review and post, so readers may chose to opt out now!]

“The Bayes factor does not behave appropriately as a measure of belief, but it does behave appropriately as a measure of evidence.” (p.87)

Posted in Books, Statistics, University life with tags , , , , , , on May 15, 2015 by xi'an

[Here is a reply by Pierre Druihlet to my comments on his paper.]

There are several goals in the paper, the last one being the most important one.

The first one is to insist that considering θ as a parameter is not appropriate. We are in complete agreement on that point, but I prefer considering l(θ) as the parameter rather than N, mainly because it is much simpler. Knowing N, the law of l(θ) is given by the law of a random walk with 0 as reflexive boundary (Jaynes in his book, explores this link). So for a given prior on N, we can derive a prior on l(θ). Since the random process that generate N is completely unknown, except that N is probably large, the true law of l(θ) is completely unknown, so we may consider l(θ).

The second one is to state explicitly that a flat prior on θ implies an exponentially increasing prior on l(θ). As an anecdote, Stone, in 1972, warned against this kind of prior for Gaussian models. Another interesting anecdote is that he cited the novel by Abbot “Flatland : a romance of many dimension” who described a world where the dimension is changed. This is exactly the case in the FP since θ has to be seen in two dimensions rather than in one dimension.

The third one is to make a distinction between randomness of the parameter and prior distribution, each one having its own rule. This point is extensively discussed in Section 2.3.
– In the intuitive reasoning, the probability of no annihilation involves the true joint distribution on (θ, x) and therefore the true unknown distribution of θ,.
– In the Bayesian reasoning, the posterior probability of no annihilation is derived from the prior distribution which is improper. The underlying idea is that a prior distribution does not obey probability rules but belongs to a projective space of measure. This is especially true if the prior does not represent an accurate knowledge. In that case, there is no discontinuity between proper and improper priors and therefore the impropriety of the distribution is not a key point. In that context, the joint and marginal distributions are irrelevant, not because the prior is improper, but because it is a prior and not a true law. If the prior were the true probability law of θ,, then the flat distribution could not be considered as a limit of probability distributions.

For most applications, the distinction between prior and probability law is not necessary and even pedantic, but it may appear essential in some situations. For example, in the Jeffreys-Lindley paradox, we may note that the construction of the prior is not compatible with the projective space structure.

improper priors, incorporated

Posted in Books, Statistics, University life with tags , , , , , , , , on January 11, 2012 by xi'an

If a statistical procedure is to be judged by a criterion such as a conventional loss function (…) we should not expect optimal results from a probabilistic theory that demands multiple observations and multiple parameters.” P. McCullagh & H. Han

Peter McCullagh and Han Han have just published in the Annals of Statistics a paper on Bayes’ theorem for improper mixtures. This is a fascinating piece of work, even though some parts do elude me… The authors indeed propose a framework based on Kingman’s Poisson point processes that allow to include (countable) improper priors in a coherent probabilistic framework. This framework requires the definition of a test set A in the sampling space, the observations being then the events Y∩ A, Y being an infinite random set when the prior is infinite. It is therefore complicated to perceive this representation in a genuine Bayesian framework, i.e. for a single observation, corresponding to a single parameter value. In that sense it seems closer to the original empirical Bayes, à la Robbins.

An improper mixture is designed for a generic class of problems, not necessarily related to one another scientifically, but all having the same mathematical structure.” P. McCullagh & H. Han

The paper thus misses in my opinion a clear link with the design of improper priors. And it does not offer a resolution of the  improper prior Bayes factor conundrum. However, it provides a perfectly valid environment for working with improper priors. For instance, the final section on the marginalisation “paradoxes” is illuminating in this respect as it does not demand  using a limit of proper priors.

Is Bayes posterior [quick and] dirty?!

Posted in Books, Statistics, University life with tags , , , , on April 28, 2011 by xi'an

I have been asked to discuss the on-coming Statistical Science paper by Don Fraser, Is Bayes posterior quick and dirty confidence?.  The title was intriguing if clearly provocative and so did I read through the whole paper… (The following is a draft of my discussion.)

The central point in Don’s paper seems to be a demonstration that Bayes confidence sets are not valid because they do not provide the proper frequentist coverage. While I appreciate the effort made therein of evaluating Bayesian bounds in a frequentist light, and while Don’s paper does shed new insight on the evaluation of Bayesian bounds in a frequentist light, the main point of the paper seems to be a radical reexamination of the relevance of the whole Bayesian approach to confidence regions. The outcome is rather surprising in that the disagreement between classical and frequentist perspectives is usually quite limited [in contrast with tests] in that the coverage statements agree to orders between $n^{-1/2}$ and $n^{-1}$, following older results by Welch and Peers (1963). Continue reading

Thesis defense in València

Posted in Statistics, Travel, University life, Wines with tags , , , , , , on February 25, 2011 by xi'an

On Monday, I took part in the jury of the PhD thesis of Anabel Forte Deltel, in the department of statistics of the Universitat de València. The topic of the thesis was variable selection in Gaussian linear models using an objective Bayes approach. Completely on my own research agenda! I had already discussed with Anabel in Zürich, where she gave a poster and gave me a copy of her thesis, so could concentrate on the fundamentals of her approach during the defense. Her approach extends Liang et al. (2008, JASA) hyper-g prior in a complete analysis of the conditions set by Jeffreys in his book for constructing such priors. She is therefore able to motivate a precise value for most hyperparameters (although some choices were mainly based on computational reasons opposing 2F1 with Appell’s F1 hypergeometric functions). She also defends the use of an improper prior by an invariance argument that leads to the standard Jeffreys’ prior on location-scale. (This is where I prefer the approach in Bayesian Core that does not discriminate between a subset of the covariates including the intercept and the other covariates. Even though it is not invariant by location-scale transforms.) After the defence, Jim Berger pointed out to me that the modelling allowed for the subset to be empty, which would then cancel my above objection! In conclusion, this thesis could well set a reference prior (if not in José Bernardo’s sense of the term!) for Bayesian linear model analysis in the coming years.

Bayes vs. SAS

Posted in Books, R, Statistics with tags , , , , , , , , , , , , , , , , , , on May 7, 2010 by xi'an

Glancing perchance at the back of my Amstat News, I was intrigued by the SAS advertisement

Bayesian Methods

• Specify Bayesian analysis for ANOVA, logistic regression, Poisson regression, accelerated failure time models and Cox regression through the GENMOD, LIFEREG and PHREG procedures.
• Analyze a wider variety of models with the MCMC procedure, a general purpose Bayesian analysis procedure.

and so decided to take a look at those items on the SAS website. (Some entries date back to 2006 so I am not claiming novelty in this post, just my reading through the manual!)

Even though I have not looked at a SAS program since the time in 1984 I was learning principal component and discriminant analysis by programming SAS procedures on punched cards, it seems the MCMC part is rather manageable (if you can manage SAS at all!), looking very much like a second BUGS to my bystander eyes, even to the point of including ARS algorithms! The models are defined in a BUGS manner, with priors on the side (and this includes improper priors, despite a confusing first example that mixes very large variances with vague priors for the linear model!). The basic scheme is a random walk proposal with adaptive scale or covariance matrix. (The adaptivity on the covariance matrix is slightly confusing in that the way it is described it does not seem to implement the requirements of Roberts and Rosenthal for sure convergence.) Gibbs sampling is not directly covered, although some examples are in essence using Gibbs samplers. Convergence is assessed via ca. 1995 methods à la Cowles and Carlin, including the rather unreliable Raftery and Lewis indicator, but so does Introducing Monte Carlo Methods with R, which takes advantage of the R coda package. I have not tested (!) any of the features in the MCMC procedure but judging from a quick skim through the 283 page manual everything looks reasonable enough. I wonder if anyone has ever tested a SAS program against its BUGS counterpart for efficiency comparison.

The Bayesian aspects are rather traditional as well, except for the testing issue. Indeed, from what I have read, SAS does not engage into testing and remains within estimation bounds, offering only HPD regions for variable selection without producing a genuine Bayesian model choice tool. I understand the issues with handling improper priors versus computing Bayes factors, as well as some delicate computational requirements, but this is a truly important chunk missing from the package. (Of course, the package contains a DIC (Deviance information criterion) capability, which may be seen as a substitute, but I have reservations about the relevance of DIC outside generalised linear models. Same difficulty with the posterior predictive.) As usual with SAS, the documentation is huge (I still remember the shelves after shelves of documentation volumes in my 1984 card-punching room!) and full of options and examples. Nothing to complain about. Except maybe the list of disadvantages in using Bayesian analysis:

• It does not tell you how to select a prior. There is no correct way to choose a prior. Bayesian inferences require skills to translate prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results.
• It can produce posterior distributions that are heavily influenced by the priors. From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior.
• It often comes with a high computational cost, especially in models with a large number of parameters.

which does not say much… Since the MCMC procedure allows for any degree of hierarchical modelling, it is always possible to check the impact of a given prior by letting its parameters go random. I found that most practitioners are happy with the formalisation of their prior beliefs into mathematical densities, rather than adamant about a specific prior. As for computation, this is not a major issue.