**A**re we in for a return of the harmonic mean estimator?! Allen Caldwell and co-authors arXived a new document that Allen also sent me, following a technique that offers similarities with our earlier approach with Darren Wraith, the difference being in the more careful and practical construct of the partition set and use of multiple hypercubes, which is the smart thing. I visited Allen’s group at the Max Planck Institut für Physik (Heisenberg) in München (Garching) in 2015 and we confronted our perspectives on harmonic means at that time. The approach followed in the paper starts from what I would call the canonical Gelfand and Dey (1995) representation with a uniform prior, namely that the integral of an arbitrary non-negative function [or unnormalised density] ƒ can be connected with the integral of the said function ƒ over a smaller set Δ with a finite measure measure [or volume]. And therefore to simulations from the density ƒ restricted to this set Δ. Which can be recycled by the harmonic mean identity towards producing an estimate of the integral of ƒ over the set Δ. When considering a partition, these integrals sum up to the integral of interest but this is not necessarily the only exploitation one can make of the fundamental identity. The most novel part stands in constructing an adaptive partition based on the sample, made of hypercubes obtained after whitening of the sample. Only keeping points with large enough density and sufficient separation to avoid overlap. (I am unsure a genuine partition is needed.) In order to avoid selection biases the original sample is separated into two groups, used independently. Integrals that stand too much away from the others are removed as well. This construction may sound a bit daunting in the number of steps it involves and in the poor adequation of a Normal to an hypercube or conversely, but it seems to shy away from the number one issue with the basic harmonic mean estimator, the almost certain infinite variance. Although it would be nice to be completely certain this doom is avoided. I still wonder at the degenerateness of the approximation of the integral with the dimension, as well as at other ways of exploiting this always fascinating [if fraught with dangers] representation. And comparing variances.

## Archive for Bayes factors

## a come-back of the harmonic mean estimator

Posted in Statistics with tags Alan Gelfand, Bayes factors, Bayesian computing, harmonic mean estimator, Max Planck Institute, München, Werner-Heisenberg-Institut on September 6, 2018 by xi'an## JASP, a really really fresh way to do stats

Posted in Statistics with tags Bayes factors, Bayesian inference, design, Harold Jeffreys, JASP, tee-shirt, University of Amsterdam on February 1, 2018 by xi'an## Bayesian spectacles

Posted in Books, pictures, Statistics, University life with tags Amsterdam, Bayes factors, Bayesian Spectacles, blogging, Holland, JASP, non-informative priors, objective Bayes, reference priors, UMPBTs, uniformly most powerful tests, University of Amsterdam on October 4, 2017 by xi'anE.J. Wagenmakers and his enthusiastic team of collaborators at University of Amsterdam and in the JASP software designing team have started a blog called Bayesian spectacles which I find a fantastic title. And not only because I wear glasses. Plus, they got their own illustrator, Viktor Beekman, which sounds like the epitome of sophistication! (Compared with resorting to vacation or cat pictures…)

In a most recent post they addressed the criticisms we made of the 72 author paper on p-values, one of the co-authors being E.J.! Andrew already re-addressed some of the address, but here is a disagreement he let me to chew on my own [and where the Abandoners are us!]:

Disagreement 2.The Abandoners’ critique the UMPBTs –the uniformly most powerful Bayesian tests– that features in the original paper. This is their right (see also the discussion of the 2013 Valen Johnson PNAS paper), but they ignore the fact that the original paper presented a series of other procedures that all point to the same conclusion: p-just-below-.05 results are evidentially weak. For instance, a cartoon on the JASP blog explains the Vovk-Sellke bound. A similar result is obtained using the upper bounds discussed in Berger & Sellke (1987) and Edwards, Lindman, & Savage (1963). We suspect that the Abandoners’ dislike of Bayes factors (and perhaps their upper bounds) is driven by a disdain for the point-null hypothesis. That is understandable, but the two critiques should not be mixed up. The first question is Given that we wish to test a point-null hypothesis, do the Bayes factor upper bounds demonstrate that the evidence is weak for p-just-below-.05 results? We believe they do, and in this series of blog posts we have provided concrete demonstrations.

Obviously, this reply calls for an examination of the entire BS blog series, but being short in time at the moment, let me point out that the upper lower bounds on the Bayes factors showing much more support for H⁰ than a p-value at 0.05 only occur in special circumstances. Even though I spend some time in my book discussing those bounds. Indeed, the [interesting] fact that the lower bounds are larger than the p-values does not hold in full generality. Moving to a two-dimensional normal with potentially zero mean is enough to see the order between lower bound and p-value reverse, as I found [quite] a while ago when trying to expand Berger and Sellker (1987, the same year as I was visiting Purdue where both had a position). I am not sure this feature has been much explored in the literature, I did not pursue it when I realised the gap was missing in larger dimensions… I must also point out I do not have the same repulsion for point nulls as Andrew! While considering whether a parameter, say a mean, is exactly zero [or three or whatever] sounds rather absurd when faced with the strata of uncertainty about models, data, procedures, &tc.—even in theoretical physics!—, comparing several [and all wrong!] models with or without some parameters for later use still makes sense. And my reluctance in using Bayes factors does not stem from an opposition to comparing models or from the procedure itself, which is quite appealing within a Bayesian framework [thus appealing *per se*!], but rather from the unfortunate impact of the prior [and its tail behaviour] on the quantity and on the delicate calibration of the thing. And on a lack of reference solution [to avoid the O and the N words!]. As exposed in the demise papers. (Which main version remains in a publishing limbo, the onslaught from the referees proving just too much for me!)

## priors without likelihoods are like sloths without…

Posted in Books, Statistics with tags Austin, Bayes factors, Bayesian Analysis, identifiability, improper priors, noninformative priors, O'Bayes17, Pierre Simon Laplace, posterior predictive, reference priors, sloth, The American Statistician, The University of Texas at Austin on September 11, 2017 by xi'an

“The idea of building priors that generate reasonable data may seem like an unusual idea…”

**A**ndrew, Dan, and Michael arXived a opinion piece last week entitled “The prior can generally only be understood in the context of the likelihood”. Which connects to the earlier Read Paper of Gelman and Hennig I discussed last year. I cannot state strong disagreement with the positions taken in this piece, actually, in that I do not think prior distributions ever occur as *a given* but are rather chosen as a reference measure to probabilise the parameter space and eventually prioritise regions over others. If anything I find myself even further on the prior agnosticism gradation. (Of course, this lack of disagreement applies to the likelihood understood as a function of both the data and the parameter, rather than of the parameter only, conditional on the data. Priors cannot be depending on the data without incurring disastrous consequences!)

“…it contradicts the conceptual principle that the prior distribution should convey only information that is available before the data have been collected.”

The first example is somewhat disappointing in that it revolves as so many Bayesian textbooks (since Laplace!) around the [sex ratio] Binomial probability parameter and concludes at the strong or long-lasting impact of the Uniform prior. I do not see much of a contradiction between the use of a Uniform prior and the collection of prior information, if only because there is not standardised way to transfer prior information into prior construction. And more fundamentally because a parameter rarely makes sense by itself, alone, without a model that relates it to potential data. As for instance in a regression model. More, following my epiphany of last semester, about the relativity of the prior, I see no damage in the prior being relevant, as I only attach a *relative* meaning to statements based on the posterior. Rather than trying to limit the impact of a prior, we should rather build assessment tools to measure this impact, for instance by prior predictive simulations. And this is where I come to quite agree with the authors.

“…non-identifiabilities, and near nonidentifiabilites, of complex models can lead to unexpected amounts of weight being given to certain aspects of the prior.”

Another rather straightforward remark is that non-identifiable models see the impact of a prior remain as the sample size grows. And I still see no issue with this fact in a relative approach. When the authors mention (p.7) that purely mathematical priors perform more poorly than weakly informative priors it is hard to see what they mean by this “performance”.

“…judge a prior by examining the data generating processes it favors and disfavors.”

Besides those points, I completely agree with them about the fundamental relevance of the prior as a generative process, only when the likelihood becomes available. And simulatable. (This point is found in many references, including our response to the American Statistician paper *Hidden dangers of specifying noninformative priors*, with Kaniav Kamary. With the same illustration on a logistic regression.) I also agree to their criticism of the marginal likelihood and Bayes factors as being so strongly impacted by the choice of a prior, if treated as absolute quantities. I also if more reluctantly and somewhat heretically see a point in using the posterior predictive for assessing whether a prior is relevant for the data at hand. At least at a conceptual level. I am however less certain about how to handle improper priors based on their recommendations. In conclusion, it would be great to see one [or more] of the authors at O-Bayes 2017 in Austin as I am sure it would stem nice discussions there! (And by the way I have no prior idea on how to conclude the comparison in the title!)

## estimation versus testing [again!]

Posted in Books, Statistics, University life with tags Bayes factors, Bayesian inference, Harold Jeffreys, hypothesis testing, parameter estimation, point null hypotheses, psychology, refereeing, review, spike-and-slab prior, unification on March 30, 2017 by xi'an**T**he following text is a review I wrote of the paper “Parameter estimation and Bayes factors”, written by J. Rouder, J. Haff, and J. Vandekerckhove. (As the journal to which it is submitted gave me the option to sign my review.)

The opposition between estimation and testing as a matter of prior modelling rather than inferential goals is quite unusual in the Bayesian literature. In particular, if one follows Bayesian decision theory as in Berger (1985) there is no such opposition, but rather the use of different loss functions for different inference purposes, while the Bayesian model remains single and unitarian.

Following Jeffreys (1939), it sounds more congenial to the Bayesian spirit to return the posterior probability of an hypothesis * H⁰* as an answer to the question whether this hypothesis holds or does not hold. This however proves impossible when the “null” hypothesis

*has prior mass equal to zero (or is not measurable under the prior). In such a case the mathematical answer is a probability of zero, which may not satisfy the experimenter who asked the question. More fundamentally, the said prior proves inadequate to answer the question and hence to incorporate the information contained in this very question. This is how Jeffreys (1939) justifies the move from the original (and deficient) prior to one that puts some weight on the null (hypothesis) space. It is often argued that the move is unnatural and that the null space does not make sense, but this only applies when believing very strongly in the model itself. When considering the issue from a modelling perspective, accepting the null*

**H⁰***means using a new model to represent the model and hence testing becomes a model choice problem, namely whether or not one should use a complex or simplified model to represent the generation of the data. This is somehow the “unification” advanced in the current paper, albeit it does appear originally in Jeffreys (1939) [and then numerous others] rather than the relatively recent Mitchell & Beauchamp (1988). Who may have launched the spike & slab denomination.*

**H⁰**I have trouble with the analogy drawn in the paper between the spike & slab estimate and the Stein effect. While the posterior mean derived from the spike & slab posterior is indeed a quantity drawn towards zero by the Dirac mass at zero, it is rarely the point in using a spike & slab prior, since this point estimate does not lead to a conclusion about the hypothesis: for one thing it is never exactly zero (if zero corresponds to the null). For another thing, the construction of the spike & slab prior is both artificial and dependent on the weights given to the spike and to the slab, respectively, to borrow expressions from the paper. This approach thus leads to model averaging rather than hypothesis testing or model choice and therefore fails to answer the (possibly absurd) question as to which model to choose. Or refuse to choose. But there are cases when a decision must be made, like continuing a clinical trial or putting a new product on the market. Or not.

In conclusion, the paper surprisingly bypasses the decision-making aspect of testing and hence ends up with a inconclusive setting, staying midstream between Bayes factors and credible intervals. And failing to provide a tool for decision making. The paper also fails to acknowledge the strong dependence of the Bayes factor on the tail behaviour of the prior(s), which cannot be [completely] corrected by a finite sample, hence its relativity and the unreasonableness of a fixed scale like Jeffreys’ (1939).

## round-table on Bayes[ian[ism]]

Posted in Books, pictures, Statistics, University life with tags Bayes factors, Bayesian Analysis, Bayesianism, Bureau international des poids et mesures, decision theory, evidence, France Culture, French book, game theory, Henri Poincaré, neurosciences, non-informative priors, relativity, subjective versus objective Bayes, Université Paris-La Sorbonne on March 7, 2017 by xi'an**I**n a [sort of] coincidence, shortly after writing my review on Le bayésianisme aujourd’hui, I got invited by the book editor, Isabelle Drouet, to take part in a round-table on Bayesianism in La Sorbonne. Which constituted the first seminar in the monthly series of the séminaire “Probabilités, Décision, Incertitude”. Invitation that I accepted and honoured by taking place in this public debate (if not dispute) on all [or most] things Bayes. Along with Paul Egré (CNRS, Institut Jean Nicod) and Pascal Pernot (CNRS, Laboratoire de chimie physique). And without a neuroscientist, who could not or would not attend.

While nothing earthshaking came out of the seminar, and certainly not from me!, it was interesting to hear of the perspectives of my philosophy+psychology and chemistry colleagues, the former explaining his path from classical to Bayesian testing—while mentioning trying to read the book Statistical rethinking I reviewed a few months ago—and the later the difficulty to teach both colleagues and students the need for an assessment of uncertainty in measurements. And alluding to GUM, developed by the Bureau International des Poids et Mesures I visited last year. I tried to present my relativity viewpoints on the [relative] nature of the prior, to avoid the usual morass of debates on the nature and subjectivity of the prior, tried to explain Bayesian posteriors via ABC, mentioned examples from The Theorem that Would not Die, yet untranslated into French, and expressed reserves about the glorious future of Bayesian statistics as we know it. This seminar was fairly enjoyable, with none of the stress induced by the constraints of a radio-show. Just too bad it did not attract a wider audience!