Archive for John Kruschke

Bayesian parameter estimation versus model comparison

Posted in Books, pictures, Statistics with tags , , , , , , on December 5, 2016 by xi'an

John Kruschke [of puppies’ fame!] wrote a paper in Perspectives in Psychological Science a few years ago on the comparison between two Bayesian approaches to null hypotheses. Of which I became aware through a X validated question that seemed to confuse Bayesian parameter estimation with Bayesian hypothesis testing.

“Regardless of the decision rule, however, the primary attraction of using parameter estimation to assess null values is that the an explicit posterior distribution reveals the relative credibility of all the parameter values.” (p.302)

After reading this paper, I realised that Kruschke meant something completely different, namely that a Bayesian approach to null hypothesis testing could operate from the posterior on the corresponding parameter, rather than to engage into formal Bayesian model comparison (null versus the rest of the World). The notion is to check whether or not the null value stands within the 95% [why 95?] HPD region [modulo a buffer zone], which offers the pluses of avoiding a Dirac mass at the null value and a long-term impact of the prior tails on the decision, with the minus of replacing the null with a tolerance region around the null and calibrating the rejection level. This opposition is thus a Bayesian counterpart of running tests on point null hypotheses either by Neyman-Pearson procedures or by confidence intervals. Note that in problems with nuisance parameters this solution requires a determination of the 95% HPD region associated with the marginal on the parameter of interest, which may prove a challenge.

“…the measure provides a natural penalty for vague priors that allow a broad range of parameter values, because a vague prior dilutes credibility across a broad range of parameter values, and therefore the weighted average is also attenuated.” (p. 306)

While I agree with most of the critical assessment of Bayesian model comparison, including Kruschke’s version of Occam’s razor [and Lindley’s paradox] above, I do not understand how Bayesian model comparison fails to return a full posterior on both the model indices [for model comparison] and the model parameters [for estimation]. To state that it does not because the Bayes factor only depends on marginal likelihoods (p.307) sounds unfair if only because most numerical techniques to approximate the Bayes factors rely on preliminary simulations of the posterior. The point that the Bayes factor strongly depends on the modelling of the alternative model is well-taken, albeit the selection of the null in the “estimation” approach does depend as well on this alternative modelling. Which is an issue if one ends up accepting the null value and running a Bayesian analysis based on this null value.

“The two Bayesian approaches to assessing null values can be unified in a single hierarchical model.” (p.308)

Incidentally, the paper briefly considers a unified modelling that can be interpreted as a mixture across both models, but this mixture representation completely differs from ours [where we also advocate estimation to replace testing] since the mixture is at the likelihood x prior level, as in O’Neill and Kypriaos.

maximum likelihood on negative binomial

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on October 7, 2015 by xi'an

fishermen poles, Carnon harbour, France, June 13, 2012Estimating both parameters of a negative binomial distribution NB(N,p) by maximum likelihood sounds like an obvious exercise. But it is not because some samples lead to degenerate solutions, namely p=0 and N=∞… This occurs when the mean of the sample is larger than its empirical variance, s²>x̄, not an impossible instance: I discovered this when reading a Cross Validated question asking about the action to take in such a case. A first remark of interest is that this only happens when the negative binomial distribution is defined in terms of failures (since else the number of successes is bounded). A major difference I never realised till now, for estimating N is not a straightforward exercise. A second remark is that a negative binomial NB(N,p) is a Poisson compound of an LSD variate with parameter p, the Poisson being with parameter η=-N log(1-p). And the LSD being a power distribution pk/k rather than a psychedelic drug. Since this is not an easy framework, Adamidis (1999) introduces an extra auxiliary variable that is a truncated exponential on (0,1) with parameter -log(1-p). A very neat trick that removes the nasty normalising constant on the LSD variate.

“Convergence was achieved in all cases, even when the starting values were poor and this emphasizes the numerical stability of the EM algorithm.” K. Adamidis

Adamidis then constructs an EM algorithm on the completed set of auxiliary variables with a closed form update on both parameters. Unfortunately, the algorithm only works when s²>x̄. Otherwise, it gets stuck at the boundary p=0 and N=∞. I was hoping for a replica of the mixture case where local maxima are more interesting than the degenerate global maximum… (Of course, there is always the alternative of using a Bayesian noninformative approach.)