Archive for Royal Statistical Society

unbiased Hamiltonian Monte Carlo with couplings

Posted in Books, Kids, Statistics, University life with tags , , , , , , on October 25, 2019 by xi'an

In the June issue of Biometrika, which had been sitting for a few weeks on my desk under my teapot!, Jeremy Heng and Pierre Jacob published a paper on unbiased estimators for Hamiltonian Monte Carlo using couplings. (Disclaimer: I was not involved with the review or editing of this paper.) Which extends to HMC environments the earlier paper of Pierre Jacob, John O’Leary and Yves Atchadé, to be discussed soon at the Royal Statistical Society. The fundamentals are the same, namely that an unbiased estimator can be produced from a converging sequence of estimators and that it can be de facto computed if two Markov chains with the same marginal can be coupled. The issue with Hamiltonians is to figure out how to couple their dynamics. In the Gaussian case, it is relatively easy to see that two chains with the same initial momentum meet periodically. In general, there is contraction within a compact set (Lemma 1). The coupling extends to a time discretisation of the Hamiltonian flow by a leap-frog integrator, still using the same momentum. Which roughly amounts in using the same random numbers in both chains. When defining a relaxed meeting (!) where both chains are within δ of one another, the authors rely on a drift condition (8) that reminds me of the early days of MCMC convergence and seem to imply the existence of a small set “where the target distribution [density] is strongly log-concave”. And which makes me wonder if this small set could be used instead to create renewal events that would in turn ensure both stationarity and unbiasedness without the recourse to a second coupled chain. When compared on a Gaussian example with couplings on Metropolis-Hastings and MALA (Fig. 1), the coupled HMC sees hardly any impact of the dimension of the target (in the average coupling time), with a much lower value. However, I wonder at the relevance of the meeting time as an assessment of efficiency. In the sense that the coupling time is not a convergence time but reflects as well on the initial conditions. I acknowledge that this allows for an averaging over  parallel implementations but I remain puzzled by the statement that this leads to “estimators that are consistent in the limit of the number of replicates, rather than in the usual limit of the number of Markov chain iterations”, since a particularly poor initial distribution could on principle lead to a mode of the target being never explored or on the coupling time being ever so rarely too large for the computing abilities at hand.

a statistic with consequences

Posted in pictures, Statistics with tags , , , , , , , on July 18, 2019 by xi'an

In the latest Significance, there was a flyer with some members updates, an important one being that Sylvia Richardson had been elected the next president of the Royal Statistical Society. Congratulations to my friend Sylvia! Another item was that the publication of the 2018 RSS Statistic of the Year has led an Australian water company to switch from plastic to aluminum. Hmm, what about switching to nothing and supporting a use-your-own bottle approach? While it is correct that aluminum cans can be 100% made of recycled aluminum, this water company does not seem to appear to make any concerted effort to ensure its can are made of recycled aluminum or to increase the recycling rate for aluminum in Australia towards achieving those of Brazil (92%) or Japan (86%). (Another shocking statistic that could have been added to the 90.5% non-recycled plastic waste [in the World?] is that a water bottle consumes the equivalent of one-fourth of its contents in oil to produce.) Another US water company still promotes water bottles as one of the most effective and inert carbon capture & sequestration methods”..! There is no boundary for green-washing.

O’Bayes 19/2

Posted in Books, pictures, Running, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 1, 2019 by xi'an

One talk on Day 2 of O’Bayes 2019 was by Ryan Martin on data dependent priors (or “priors”). Which I have already discussed in this blog. Including the notion of a Gibbs posterior about quantities that “are not always defined through a model” [which is debatable if one sees it like part of a semi-parametric model]. Gibbs posterior that is built through a pseudo-likelihood constructed from the empirical risk, which reminds me of Bissiri, Holmes and Walker. Although requiring a prior on this quantity that is  not part of a model. And is not necessarily a true posterior and not necessarily with the same concentration rate as a true posterior. Constructing a data-dependent distribution on the parameter does not necessarily mean an interesting inference and to keep up with the theme of the conference has no automated claim to [more] “objectivity”.

And after calling a prior both Beauty and The Beast!, Erlis Ruli argued about a “bias-reduction” prior where the prior is solution to a differential equation related with some cumulants, connected with an earlier work of David Firth (Warwick).  An interesting conundrum is how to create an MCMC algorithm when the prior is that intractable, with a possible help from PDMP techniques like the Zig-Zag sampler.

While Peter Orbanz’ talk was centred on a central limit theorem under group invariance, further penalised by being the last of the (sun) day, Peter did a magnificent job of presenting the result and motivating each term. It reminded me of the work Jim Bondar was doing in Ottawa in the 1980’s on Haar measures for Bayesian inference. Including the notion of amenability [a term due to von Neumann] I had not met since then. (Neither have I met Jim since the last summer I spent in Carleton.) The CLT and associated LLN are remarkable in that the average is not over observations but over shifts of the same observation under elements of a sub-group of transformations. I wondered as well at the potential connection with the Read Paper of Kong et al. in 2003 on the use of group averaging for Monte Carlo integration [connection apart from the fact that both discussants, Michael Evans and myself, are present at this conference].

RSS tribute

Posted in Statistics, University life with tags , , , , , , on November 4, 2018 by xi'an

visual effects

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , on November 2, 2018 by xi'an

As advertised and re-discussed by Dan Simpson on the Statistical Modeling, &tc. blog he shares with Andrew and a few others, the paper Visualization in Bayesian workflow he wrote with Jonah Gabry, Aki Vehtari, Michael Betancourt and Andrew Gelman was one of three discussed at the RSS conference in Cardiff, last week month, as a Read Paper for Series A. I had stored the paper when it came out towards reading and discussing it, but as often this good intention led to no concrete ending. [Except concrete as in concrete shoes…] Hence a few notes rather than a discussion in Series B A.

Exploratory data analysis goes beyond just plotting the data, which should sound reasonable to all modeling readers.

Fake data [not fake news!] can be almost [more!] as valuable as real data for building your model, oh yes!, this is the message I am always trying to convey to my first year students, when arguing about the connection between models and simulation, as well as a defense of ABC methods. And more globally of the very idea of statistical modelling. While indeed “Bayesian models with proper priors are generative models”, I am not particularly fan of using the prior predictive [or the evidence] to assess the prior as it may end up in a classification of more or less all but terrible priors, meaning that all give very little weight to neighbourhoods of high likelihood values. Still, in a discussion of a TAS paper by Seaman et al. on the role of prior, Kaniav Kamary and I produced prior assessments that were similar to the comparison illustrated in Figure 4. (And this makes me wondering which point we missed in this discussion, according to Dan.)  Unhappy am I with the weakly informative prior illustration (and concept) as the amount of fudging and calibrating to move from the immensely vague choice of N(0,100) to the fairly tight choice of N(0,1) or N(1,1) is not provided. The paper reads like these priors were the obvious and first choice of the authors. I completely agree with the warning that “the utility of the the prior predictive distribution to evaluate the model does not extend to utility in selecting between models”.

MCMC diagnostics, beyond trace plots, yes again, but this recommendation sounds a wee bit outdated. (As our 1998 reviewww!) Figure 5(b) links different parameters of the model with lines, which does not clearly relate to a better understanding of convergence. Figure 5(a) does not tell much either since the green (divergent) dots stand within the black dots, at least in the projected 2D plot (and how can one reach beyond 2D?) Feels like I need to rtfm..!

“Posterior predictive checks are vital for model evaluation”, to wit that I find Figure 6 much more to my liking and closer to my practice. There could have been a reference to Ratmann et al. for ABC where graphical measures of discrepancy were used in conjunction with ABC output as direct tools for model assessment and comparison. Essentially predicting a zero error with the ABC posterior predictive. And of course “posterior predictive checking makes use of the data twice, once for the fitting and once for the checking.” Which means one should either resort to loo solutions (as mentioned in the paper) or call for calibration of the double-use by re-simulating pseudo-datasets from the posterior predictive. I find the suggestion that “it is a good idea to choose statistics that are orthogonal to the model parameters” somewhat antiquated, in that this sounds like rephrasing the primeval call to ancillary statistics for model assessment (Kiefer, 1975), while pretty hard to implement in modern complex models.