## Bayesian parameter estimation versus model comparison

**J**ohn Kruschke [of puppies’ fame!] wrote a paper in Perspectives in Psychological Science a few years ago on the comparison between two Bayesian approaches to null hypotheses. Of which I became aware through a X validated question that seemed to confuse Bayesian parameter estimation with Bayesian hypothesis testing.

“Regardless of the decision rule, however, the primary attraction of using parameter estimation to assess null values is that the an explicit posterior distribution reveals the relative credibility of all the parameter values.” (p.302)

After reading this paper, I realised that Kruschke meant something completely different, namely that a Bayesian approach to null hypothesis testing could operate from the posterior on the corresponding parameter, rather than to engage into formal Bayesian model comparison (null versus the rest of the World). The notion is to check whether or not the null value stands within the 95% [why 95?] HPD region [modulo a buffer zone], which offers the pluses of avoiding a Dirac mass at the null value and a long-term impact of the prior tails on the decision, with the minus of replacing the null with a tolerance region around the null and calibrating the rejection level. This opposition is thus a Bayesian counterpart of running tests on point null hypotheses either by Neyman-Pearson procedures or by confidence intervals. Note that in problems with nuisance parameters this solution requires a determination of the 95% HPD region associated with the marginal on the parameter of interest, which may prove a challenge.

“…the measure provides a natural penalty for vague priors that allow a broad range of parameter values, because a vague prior dilutes credibility across a broad range of parameter values, and therefore the weighted average is also attenuated.” (p. 306)

While I agree with most of the critical assessment of Bayesian model comparison, including Kruschke’s version of Occam’s razor [and Lindley’s paradox] above, I do not understand how Bayesian model comparison fails to return a full posterior on both the model indices [for model comparison] and the model parameters [for estimation]. To state that it does not because the Bayes factor only depends on marginal likelihoods (p.307) sounds unfair if only because most numerical techniques to approximate the Bayes factors rely on preliminary simulations of the posterior. The point that the Bayes factor strongly depends on the modelling of the alternative model is well-taken, albeit the selection of the null in the “estimation” approach does depend as well on this alternative modelling. Which is an issue if one ends up accepting the null value and running a Bayesian analysis based on this null value.

“The two Bayesian approaches to assessing null values can be unified in a single hierarchical model.” (p.308)

Incidentally, the paper briefly considers a unified modelling that can be interpreted as a mixture across both models, but this mixture representation completely differs from ours [where we also advocate estimation to replace testing] since the mixture is at the *likelihood x prior* level, as in O’Neill and Kypriaos.

December 22, 2016 at 3:23 pm

These ideas are clarified with recent workshop diagrams and article in this new blog post: http://doingbayesiandataanalysis.blogspot.com/2016/12/bayesian-assessment-of-null-values.html.

December 23, 2016 at 4:22 am

Thank you, John.

December 21, 2016 at 5:11 pm

I just discovered Xi’an’s ‘Og post yesterday, and have attempted to clarify in this new blog post: http://doingbayesiandataanalysis.blogspot.com/2016/12/bayesian-assessment-of-null-values.html

December 6, 2016 at 12:08 pm

Dear readers of Xi’an’s og:

Once again in this blog, I would like to ask current

efforts to acknowledge previous work by Marcel Lauretto

on testing separate hypothesis via mixture models.

M.Lauretto, S.R.Faria, C.A.B.Pereira, J.M.Stern (2007).

The Problem of Separate Hypotheses via Mixtures Models.

AIP Conference Proceedings, 954, 268-275.

This approach is an evolution of Marcelo Lauretto’s

earlier work on

FBST for Mixture Model Selection

Marcelo S. Lauretto and Julio M. Stern.

AIP Conference Proceedings 803, 121-128.

Instead of Bayes Factors, these works used the FBST,

the Full Bayesian Significance Test, an alternative that

is a lot easier to implement, and also has far better

theoretical properties than Bayes Factors, see for example:

J.M.Stern and C.A.B. Pereira (2014).

Bayesian epistemic values:

Focus on surprise, measure probability!

Logic Journal of the IGPL, 22, 2, 236-254.

W.Borges, J.M.Stern (2007). The Rules of Logic

Composition for the Bayesian Epistemic e-Values.

Logic Journal of the IGPL, 15, 5-6, 401-420.

C.A.B.Pereira, J.M.Stern, S.Wechsler (2008).

Can a Signicance Test be Genuinely Bayesian.

Bayesian Analysis, 3, 79-100.

December 6, 2016 at 7:48 pm

This approach has been discussed in several posts on the ‘Og.

December 6, 2016 at 9:24 pm

Yes, Xi’an, it has been discussed, and I am grateful.

However, when discussing the connections with similar approaches, our results are often ignored in many subsequent papers, that nevertheless acknowledge other alternative frameworks that are perhaps not so closely related (hard to work south of the equator).

December 5, 2016 at 6:27 pm

I think that what he’s describing is the most common form of Bayesian “hypothesis testing” (i.e. “Does the 95% credible interval cover zero?”).

I don’t understand your point about computing the marginal HPD in the presence of “nuisance parameters” – isn’t this essentially trivial postprocessing for MCMC? Or am I missing something?

Similarly, if a Bayesian model is calibrated (via prior choice or through matching priors) so that, under data generated from H0, the 95% credible interval contains the null value 95% of the time, then surely the resulting hypothesis test is valid. It may not be powerful, but it would would be a valid Neyman-Pearson test.

This isn’t “Bayesian hypothesis testing” so much as “N-P hypothesis testing from Bayesian output”, but I’m not sure it’s less sensible than Bayesian hypothesis testing (at least BHT with point nulls, or 0-1 loss functions). It’s also infinitely more convenient computationally, and doesn’t require an alternative model but instead only requires a (sensible) null model for calibration. Alternative models only need to be posited when considering the sensitivity of the test (or the type-2 error rate, or the power, or whatever you’re going to call it).

December 5, 2016 at 10:36 pm

Thanks, Dan. Now I may be missing something: if I want to compute the marginal HPD for a single parameter, don’t I need to integrate out the nuisance parameters? This makes a big difference with the joint HPD. I can think of computational bypasses, but nothing instantaneous!

As for the global perspective you propose, there is so little flavour in the output that one hardly remembers the Bayesian cook!

December 5, 2016 at 11:54 pm

You know my feelings on hypothesis testing. I think there’s a necessarily Bayesian flavour to decision analysis, but I can’t get there for hypothesis testing. Most of my reading on this is that there’s no difference between for strong evidence in either direction and complicated evidence near the NP boundary. If you’re not making a Bayesian soup, you shouldn’t be surprised when you end up with a frequentist soufflé.

Surely you marginalise out the usual way: just ignore the nuisance parameters. If you have 2 parameters and only the first IS of interest then the marginal for the interesting parameters is the histogram of the first component of the markov chain and the corresponding HPD comes from that.

December 5, 2016 at 3:40 pm

[…] couldn’t but repost here today the recent post in Xi’ An’ Og about Bayesian parameter estimation and how it gets easily misinterpreted with Bayesian model comparison regarding the null hypothesis […]