Archive for HPD region

divide & reconquer

Posted in Books, Statistics, University life with tags , , , , , , , , , , on February 5, 2018 by xi'an

Qi Liu, Anindya Bhadra, and William Cleveland from Purdue have arXived a paper entitled Divide and Recombine for Large and Complex Data: Model Likelihood Functions using MCMC. Which is a variation on the earlier divide & … papers attempting at handling large datasets. The beginning is quite similar to these earlier papers in that the likelihood is split into sub-likelihoods, approximated from MCMC samples and recombined into an approximate full likelihood. As in for instance Scott et al. one approximation use for the subsample is to replace the likelihood with a Normal approximation, or a skew Normal generalisation, which remains  a limited choice for heavy tailed likelihoods. Producing a Normal and skew-Normal approximation for the whole [data] likelihood, respectively. If I understand correctly, these approximations are missing a normalising constant to bring them to scale with the true likelihood, which I do not completely understand as the likelihood only needs to be defined up to a [constant] constant for most purposes, including Bayesian ones. The  method of estimation of this constant proposed therein is called the contour probability algorithm and it consists in using a highest density region to compare a likelihood and its approximation. (Nothing to do with our adaptation of Gelfand and Dey (1994) based on HPDs, with Darren Wright. Nor with nested sampling.) Returning a form of qq-plot. This is rather exploratory, while hardly addressing the issue of the precision of such approximations and the resolution of conflicting proposals. And the comparison with all these other recent proposals for splitting likelihoods into manageable bits (proposals that are mentioned in the final section, including our recentering scheme with my student Changye Wu).

John Kruschke on Bayesian assessment of null values

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , on February 28, 2017 by xi'an

John Kruschke pointed out to me a blog entry he wrote last December as a follow-up to my own entry on an earlier paper of his. Induced by an X validated entry. Just in case this sounds a wee bit too convoluted for unraveling the threads (!), the central notion there is to replace a point null hypothesis testing [of bad reputation, for many good reasons] with a check whether or not the null value stands within the 95% HPD region [modulo a buffer zone], which offers the pluses of avoiding a Dirac mass at the null value and a long-term impact of the prior tails on the decision, as well as the possibility of a no-decision, with the minuses of replacing the null with a tolerance region around the null and calibrating both the rejection level and the buffer zone. The December blog entry exposes this principle with graphical illustrations familiar to readers of Doing Bayesian Data Analysis.

As I do not want to fall into an infinite regress of mirror discussions, I will not proceed further than referring to my earlier post, which covers my reservations about the proposal. But interested readers may want to check the latest paper by Kruschke and Liddel on that perspective. (With the conclusion that “Bayesian estimation does everything the New Statistics desires, better”.) Available on PsyArXiv, an avatar of arXiv for psychology papers.

Bayesian parameter estimation versus model comparison

Posted in Books, pictures, Statistics with tags , , , , , , on December 5, 2016 by xi'an

John Kruschke [of puppies’ fame!] wrote a paper in Perspectives in Psychological Science a few years ago on the comparison between two Bayesian approaches to null hypotheses. Of which I became aware through a X validated question that seemed to confuse Bayesian parameter estimation with Bayesian hypothesis testing.

“Regardless of the decision rule, however, the primary attraction of using parameter estimation to assess null values is that the an explicit posterior distribution reveals the relative credibility of all the parameter values.” (p.302)

After reading this paper, I realised that Kruschke meant something completely different, namely that a Bayesian approach to null hypothesis testing could operate from the posterior on the corresponding parameter, rather than to engage into formal Bayesian model comparison (null versus the rest of the World). The notion is to check whether or not the null value stands within the 95% [why 95?] HPD region [modulo a buffer zone], which offers the pluses of avoiding a Dirac mass at the null value and a long-term impact of the prior tails on the decision, with the minus of replacing the null with a tolerance region around the null and calibrating the rejection level. This opposition is thus a Bayesian counterpart of running tests on point null hypotheses either by Neyman-Pearson procedures or by confidence intervals. Note that in problems with nuisance parameters this solution requires a determination of the 95% HPD region associated with the marginal on the parameter of interest, which may prove a challenge.

“…the measure provides a natural penalty for vague priors that allow a broad range of parameter values, because a vague prior dilutes credibility across a broad range of parameter values, and therefore the weighted average is also attenuated.” (p. 306)

While I agree with most of the critical assessment of Bayesian model comparison, including Kruschke’s version of Occam’s razor [and Lindley’s paradox] above, I do not understand how Bayesian model comparison fails to return a full posterior on both the model indices [for model comparison] and the model parameters [for estimation]. To state that it does not because the Bayes factor only depends on marginal likelihoods (p.307) sounds unfair if only because most numerical techniques to approximate the Bayes factors rely on preliminary simulations of the posterior. The point that the Bayes factor strongly depends on the modelling of the alternative model is well-taken, albeit the selection of the null in the “estimation” approach does depend as well on this alternative modelling. Which is an issue if one ends up accepting the null value and running a Bayesian analysis based on this null value.

“The two Bayesian approaches to assessing null values can be unified in a single hierarchical model.” (p.308)

Incidentally, the paper briefly considers a unified modelling that can be interpreted as a mixture across both models, but this mixture representation completely differs from ours [where we also advocate estimation to replace testing] since the mixture is at the likelihood x prior level, as in O’Neill and Kypriaos.

rediscovering the harmonic mean estimator

Posted in Kids, Statistics, University life with tags , , , , , , , on November 10, 2015 by xi'an

When looking at unanswered questions on X validated, I came across a question where the author wanted to approximate a normalising constant

N=\int g(x)\,\text{d}x\,,

while simulating from the associated density, g. While seemingly unaware of the (huge) literature in the area, he re-derived [a version of] the harmonic mean estimate by considering the [inverted importance sampling] identity

\int_\mathcal{X} \dfrac{\alpha(x)}{g(x)}p(x) \,\text{d}x=\int_\mathcal{X} \dfrac{\alpha(x)}{N} \,\text{d}x=\dfrac{1}{N}

when α is a probability density and by using for α the uniform over the whole range of the simulations from g. This choice of α obviously leads to an estimator with infinite variance when the support of g is unbounded, but the idea can be easily salvaged by using instead another uniform distribution, for instance on an highest density region, as we studied in our papers with Darren Wraith and Jean-Michel Marin. (Unfortunately, the originator of the question does not seem any longer interested in the problem.)

Tractable Fully Bayesian inference via convex optimization and optimal transport theory

Posted in Books, Statistics, University life with tags , , , , , , , , on October 6, 2015 by xi'an

IMG_0294“Recently, El Moselhy et al. proposed a method to construct a map that pushed forward the prior measure to the posterior measure, casting Bayesian inference as an optimal transport problem. Namely, the constructed map transforms a random variable distributed according to the prior into another random variable distributed according to the posterior. This approach is conceptually different from previous methods, including sampling and approximation methods.”

Yesterday, Kim et al. arXived a paper with the above title, linking transport theory with Bayesian inference. Rather strangely, they motivate the transport theory with Galton’s quincunx, when the apparatus is a discrete version of the inverse cdf transform… Of course, in higher dimensions, there is no longer a straightforward transform and the paper shows (or recalls) that there exists a unique solution with positive Jacobian for log-concave posteriors. For instance, log-concave priors and likelihoods. This solution remains however a virtual notion in practice and an approximation is constructed via a (finite) functional polynomial basis. And minimising an empirical version of the Kullback-Leibler distance.

I am somewhat uncertain as to how and why apply such a transform to simulations from the prior (which thus has to be proper). Producing simulations from the posterior certainly is a traditional way to approximate Bayesian inference and this is thus one approach to this simulation. However, the discussion of the advantage of this approach over, say, MCMC, is quite limited. There is no comparison with alternative simulation or non-simulation methods and the computing time for the transport function derivation. And on the impact of the dimension of the parameter space on the computing time. In connection with recent discussions on probabilistic numerics and super-optimal convergence rates, Given that it relies on simulations, I doubt optimal transport can do better than O(√n) rates. One side remark about deriving posterior credible regions from (HPD)  prior credible regions: there is no reason the resulting region is optimal in volume (HPD) given that the transform is non-linear.

Statistics slides (5)

Posted in Books, Kids, Statistics, University life with tags , , , , , on December 7, 2014 by xi'an

La Défense from Paris-Dauphine, Nov. 15, 2012Here is the fifth and last set of slides for my third year statistics course, trying to introduce Bayesian statistics in the most natural way and hence starting with… Rasmus’ socks and ABC!!! This is an interesting experiment as I have no idea how my students will react. Either they will see the point besides the anecdotal story or they’ll miss it (being quite unhappy so far about the lack of mathematical rigour in my course and exercises…). We only have two weeks left so I am afraid the concept will not have time to seep through!

beta HPD

Posted in Books, R, Statistics, Uncategorized, University life with tags , , , , , , , on October 17, 2013 by xi'an

While writing an introductory chapter on Bayesian analysis (in French), I came by the issue of computing an HPD region when the posterior distribution is a Beta B(α,β) distribution… There is no analytic solution and hence I resorted to numerical resolution (provided here for α=117.5, β=115.5):

f=function(p){

  # find the symmetric
  g=function(x){return(x-p*((1-p)/(1-x))^(115.5/117.5))}
  return(uniroot(g,c(.504,.99))$root)}

ff=function(alpha){

  # find the coverage
  g=function(x){return(x-p*((1-p)/(1-x))^(115.5/117.5))}
  return(uniroot(g,c(.011,.49))$root)}

and got the following return:

> ff(.95)
[1] 0.4504879
> f(ff(.95))
[1] 0.5580267

which was enough for my simple book illustration… Since (.450,558) is then the HPD region at credible level 0.95.