**A**s the next MCMski conference, now called Bayes Comp, is starting in Barcelona, Spain, March 26-29, I welcome all guest posts covering the conference, since I am not going to be there! Enjoy!

## Archive for blogging

## and here we go!

Posted in Books, Running, Statistics, University life with tags academic journals, Biometrika, blogging, editor, peer review, scientific editing on March 16, 2018 by xi'an**O**n March 1, I have started handling papers for Biometrika as deputy editor, along with Omiros Papaspiliopoulos. With on average one paper a day to handle this means a change in my schedule and presumably less blog posts about recent papers and arXivals if I want to keep my daily morning runs!

## surg’Og interest from Serbia

Posted in Statistics with tags blogging, Og, Serbia, traffic, Wordpress on January 15, 2018 by xi'an## Bayesian spectacles

Posted in Books, pictures, Statistics, University life with tags Amsterdam, Bayes factors, Bayesian Spectacles, blogging, Holland, JASP, non-informative priors, objective Bayes, reference priors, UMPBTs, uniformly most powerful tests, University of Amsterdam on October 4, 2017 by xi'anE.J. Wagenmakers and his enthusiastic team of collaborators at University of Amsterdam and in the JASP software designing team have started a blog called Bayesian spectacles which I find a fantastic title. And not only because I wear glasses. Plus, they got their own illustrator, Viktor Beekman, which sounds like the epitome of sophistication! (Compared with resorting to vacation or cat pictures…)

In a most recent post they addressed the criticisms we made of the 72 author paper on p-values, one of the co-authors being E.J.! Andrew already re-addressed some of the address, but here is a disagreement he let me to chew on my own [and where the Abandoners are us!]:

Disagreement 2.The Abandoners’ critique the UMPBTs –the uniformly most powerful Bayesian tests– that features in the original paper. This is their right (see also the discussion of the 2013 Valen Johnson PNAS paper), but they ignore the fact that the original paper presented a series of other procedures that all point to the same conclusion: p-just-below-.05 results are evidentially weak. For instance, a cartoon on the JASP blog explains the Vovk-Sellke bound. A similar result is obtained using the upper bounds discussed in Berger & Sellke (1987) and Edwards, Lindman, & Savage (1963). We suspect that the Abandoners’ dislike of Bayes factors (and perhaps their upper bounds) is driven by a disdain for the point-null hypothesis. That is understandable, but the two critiques should not be mixed up. The first question is Given that we wish to test a point-null hypothesis, do the Bayes factor upper bounds demonstrate that the evidence is weak for p-just-below-.05 results? We believe they do, and in this series of blog posts we have provided concrete demonstrations.

Obviously, this reply calls for an examination of the entire BS blog series, but being short in time at the moment, let me point out that the upper lower bounds on the Bayes factors showing much more support for H⁰ than a p-value at 0.05 only occur in special circumstances. Even though I spend some time in my book discussing those bounds. Indeed, the [interesting] fact that the lower bounds are larger than the p-values does not hold in full generality. Moving to a two-dimensional normal with potentially zero mean is enough to see the order between lower bound and p-value reverse, as I found [quite] a while ago when trying to expand Berger and Sellker (1987, the same year as I was visiting Purdue where both had a position). I am not sure this feature has been much explored in the literature, I did not pursue it when I realised the gap was missing in larger dimensions… I must also point out I do not have the same repulsion for point nulls as Andrew! While considering whether a parameter, say a mean, is exactly zero [or three or whatever] sounds rather absurd when faced with the strata of uncertainty about models, data, procedures, &tc.—even in theoretical physics!—, comparing several [and all wrong!] models with or without some parameters for later use still makes sense. And my reluctance in using Bayes factors does not stem from an opposition to comparing models or from the procedure itself, which is quite appealing within a Bayesian framework [thus appealing *per se*!], but rather from the unfortunate impact of the prior [and its tail behaviour] on the quantity and on the delicate calibration of the thing. And on a lack of reference solution [to avoid the O and the N words!]. As exposed in the demise papers. (Which main version remains in a publishing limbo, the onslaught from the referees proving just too much for me!)

## abandon all o(p) ye who enter here

Posted in Books, Statistics, University life with tags Andrew Gelman, Bayesian hypothesis testing, blogging, Dante Alighieri, Nature Methods, p-values, uniformly most powerful tests on September 28, 2017 by xi'an**T**oday appeared on arXiv a joint paper by Blakeley McShane, David Gal, Andrew Gelman, Jennifer Tackett, and myself, towards the abandonment of significance tests, which is a response to the 72 author paper in Nature Methods that recently made the news (and comments on the ‘Og). Some of these comments have been incorporated in the paper, along with others more on the psychology testing side. From the irrelevance of point null hypotheses to the numerous incentives for multiple comparisons, to the lack of sufficiency of the p-value itself, to the limited applicability of the uniformly most powerful prior principle…

“…each [proposal] is a purely statistical measure that fails to take a more holistic view of the evidence that includes the consideration of the traditionally neglected factors, that is, prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain.”

One may wonder about this list of grievances and its impact on statistical practice. The paper however suggests two alternatives, one being to investigate the potential impact of (neglected) factors rather than relying on thresholds. Another one, maybe less realistic, unless it is the very same, is to report the entirety of the data associated with the experiment. This makes the life of journal editors and grant evaluators harder, possibly much harder, but it indeed suggests an holistic and continuous approach to data analysis, rather than the mascarade of binary outputs. (Not surprisingly, posting this item of news on Andrew’s blog a few hours ago generated a large amount of discussion.)

## the end of the Series B’log…

Posted in Books, Statistics, University life with tags blogging, discussion paper, Journal of the Royal Statistical Society, Series B, Series B'log on September 22, 2017 by xi'an**T**oday is the last and final day of Series B’log as David Dunson, Piotr Fryzlewicz and myself have decided to stop the experiment, *faute de combattants*. (As we say in French.) The authors nicely contributed long abstracts of their papers, for which I am grateful, but with a single exception, no one came out with comments or criticisms, and the idea to turn some Series B papers into discussion papers does not seem to appeal, at least in this format. Maybe the concept will be rekindled in another form in the near future, but for now we let it lay down. So be it!

## a conceptual introduction to HMC [reply from the author]

Posted in Statistics with tags blogging, Hamiltonian Monte Carlo, HMC, London, MCMC, Monte Carlo methods, Monte Carlo Statistical Methods, reparameterisation, STAN on September 8, 2017 by xi'an*[Here is the reply on my post from Michael Bétancourt, detailed enough to be promoted from comment to post!]*

As Dan notes this is meant as an introduction for those without a strong mathematical background, hence the focus on concepts rather than theorems! There’s plenty of maths deeper in the references. ;-)

I am not sure I get this sentence. Either it means that an expectation remains invariant under reparameterisation. Or something else and more profound that eludes me. In particular because Michael repeats later (p.25) that the canonical density does not depend on the parameterisation.

What I was trying to get at is that expectations and really all of measure theory are reparameteriztion invariant, but implementations of statistical algorithms that depend on parameterization-dependent representations, namely densities, are not. If your algorithm is sensitive to these parameterization dependencies then you end up with a tuning problem — which parameterization is best? — which makes it harder to utilize the algorithm in practice.

Exact implementations of HMC (i.e. without an integrator) are fully geometric and do not depend on any chosen parameterization, hence the canonical density and more importantly the Hamiltonian being an invariant objects. That said, there are some choices to be made in that construction, and those choices often look like parameter dependencies. See below!

“Every choice of kinetic energy and integration time yields a new Hamiltonian transition that will interact differently with a given target distribution (…) when poorly-chosen, however, the performance can suffer dramatically.”

This is exactly where it’s easy to get confused with what’s invariant and what’s not!

The target density gives rise to a potential energy, and the chosen density over momenta gives rise to a kinetic energy. The two energies transform in opposite ways under a reparameterization so their sum, the Hamiltonian, is invariant.

Really there’s a fully invariant, measure-theoretic construction where you use the target measure directly and add a “cotangent disintegration”.

In practice, however, we often choose a default kinetic energy, i.e. a log density, based on the parameterization of the target parameter space, for example an “identify mass matrix” kinetic energy. In other words, the algorithm itself is invariant but by selecting the algorithmic degrees of freedom based on the parameterization of the target parameter space we induce an implicit parameter dependence.

This all gets more complicated when we introducing the adaptation we use in Stan, which sets the elements of the mass matrix to marginal variances which means that the adapted algorithm is invariant to marginal transformations but not joint ones…

The explanation of the HMC move as a combination of uniform moves along isoclines of fixed energy level and of jumps between energy levels does not seem to translate into practical implementations, at least not as explained in the paper. Simulating directly the energy distribution for a complex target distribution does not seem more feasible than moving up likelihood levels in nested sampling.

Indeed, being able to simulate exactly from the energy distribution, which is equivalent to being able to quantify the density of states in statistical mechanics, is intractable for the same reason that marginal likelihoods are intractable. Which is a shame, because conditioned on those samples HMC could be made embarrassingly parallel!

Instead we draw correlated samples using momenta resamplings between each trajectory. As Dan noted this provides some intuition about Stan (it reduced random walk behavior to one dimension) but also motivates some powerful energy-based diagnostics that immediately indicate when the momentum resampling is limiting performance and we need to improve it by, say, changing the kinetic energy. Or per my previous comment, by keeping the kinetic energy the same but changing the parameterization of the target parameter space. :-)

In the end I cannot but agree with the concluding statement that the geometry of the target distribution holds the key to devising more efficient Monte Carlo methods.

Yes! That’s all I really want statisticians to take away from the paper. :-)