**W**hile reading the IMS Bulletin (of March 2020), I found out that Canadian statistician Colin Blyth had died last summer. While we had never met in person, I remember his very distinctive and elegant handwriting in a few letters he sent me, including the above I have kept (along with an handwritten letter from Lucien Le Cam!). It contains suggestions about revising our Is Pitman nearness a reasonable criterion?, written with Gene Hwang and William Strawderman and which took three years to publish as it was deemed somewhat controversial. It actually appeared in JASA with discussions from Malay Ghosh, John Keating and Pranab K Sen, Shyamal Das Peddada, C. R. Rao, George Casella and Martin T. Wells, and Colin R. Blyth (with a much stronger wording than in the above letter!, like “What can be said but “It isn’t I, it’s *you* that are crazy?”). While I had used some of his admissibility results, including the admissibility of the Normal sample average in dimension one, e.g. in my book, I had not realised at the time that Blyth was (a) the first student of Erich Lehmann (b) the originator of [the name] Simpson’s paradox, (c) the scribe for Lehmann’s notes that would eventually lead to Testing Statistical Hypotheses and Theory of Point Estimation, later revised with George Casella. And (d) a keen bagpipe player and scholar.

## Archive for JASA

## Colin Blyth (1922-2019)

Posted in Books, pictures, Statistics, University life with tags bagpipes, C.R. Rao, caligraphy, Canada, Colin Blyth, decision theory, discussion paper, Erich Lehmann, IMS Bulletin, JASA, La Trobe University, Lucien Le Cam, Melbourne, obituary, Ontario, Pitman nearness, Simpson's paradox, transitivity on March 19, 2020 by xi'an## non-reversibility in discrete spaces

Posted in Books, Statistics, University life with tags birth-and-death process, coordinate sampler, JASA, jump process, Markov chain, non-reversible diffusion, PDMP, Peskun ordering, reversibility, Zig-Zag on January 3, 2020 by xi'an**F**ollowing a recent JASA paper by Giacomo Zanella (which I have not yet read but is discussed on this blog), Sam Power and Jacob Goldman have recently arXived a paper on Accelerated sampling on discrete spaces with non-reversible Markov processes, where they use continuous-time, non-reversible algorithms à la PDMP, even though differential equations do not exist on discrete spaces. More specifically, they devise discrete versions of the coordinate sampler and of the Zig-Zag sampler, using Markov jump processes instead of differential equations, with detailed balance on the jump rate rather than the Markov kernel. A use of jump processes originating at least from Peskun (1973) and connected with MCMC algorithms in Matthew Stephens‘ 1999 PhD thesis. A neat thing about discrete settings is that the jump process can be implemented with no discretisation! However, as we noticed when working on birth-and-death processes with Olivier Cappé and Tobias Rydèn, there is a potential for disastrous implementation if an infinite sequence of instantaneous moves (out of zero probability states) is proposed.

The authors make the further assumption(s) that the discrete space is endowed with a graphical structure with a group G acting upon this graph, with an involution keeping the target (or a completion of the original target) invariant. In this framework, reversibility amounts to repeatedly using (group) generators þ with a low order (as in Bayesian variable selection, binary spin systems, where þ.þ=id, and other permutation problems), since they bring the chain back to its starting point. Their first sampler is called a Tabu sampler for avoiding such behaviour, forcing the next step to use other generators þ in the generator set Þ thanks to a binary auxiliary variable that partitions Þ into forward vs backward moves. For high order generators, the discrete coordinate and Zig-Zag samplers are instead repeatedly using the same generator (although it is unclear to me why this is beneficial, given that neither graph nor generator is not necessarily linked with the target). With the coordinate sampler being again much cheaper since it only looks at one direction in the generator group.

The paper contains a range of comparisons with (only) Zanella’s sampler, some presenting heavy gains in terms of ESS. Including one on hundreds of sensors in a football stadium. As I am not particularly familiar with these examples, except for the Bayesian variable selection one, I found it rather hard to determine whether or not the compared samplers were indeed exploring the entirety of the (highly complex and highly dimensional) target. The collection of examples is however quite rich and support the use of such non-reversible schemes. It may also be that the discrete nature of the target could facilitate the theoretical study of their convergence properties.

## a hatchet job [book review]

Posted in Books, Statistics, University life with tags Bayes theorem, Bayesian statistics, betting, book review, Bruce Hill, Bruno de Finetti, JASA, John Hartigan, Likelihood Principle on July 20, 2019 by xi'an**B**y happenstance, I came across a rather savage review of John Hartigan’s Bayes Theory (1984) written by Bruce Hill in HASA, including the following slivers:

“By and large this book is at its best in developing the mathematical consequences of the theory and at its worst when dealing with the underlying ideas and concepts, which seems unfortunate since Bayesian statistics is above all an attempt to deal realistically with the nature of uncertainty and decision making.” B. Hill, JASA, 1986, p.569

“Unfortunately, those who had hoped for a serious contribution to the question will be disappointed.” B. Hill, JASA, 1986, p.569

“If the primary concern is mathematical convenience, not content or meaning, then the enterprise is a very different matter from what most of us think of as Bayesian approach.” B. Hill, JASA, 1986, p.570

“Perhaps in a century or two statisticians and probabilists will reach a similar state of maturity.” B. Hill, JASA, 1986, p.570“

Perhaps this is a good place to mention that the notation in the book is formidable. Bayes’s theorem appears in a form that is almost unrecognizable. As elsewhere, the mathematical treatment is elegant. but none of the deeper issues about the meaning and interpretation of conditional probability is discussed.” B. Hill, JASA, 1986, p.570

“The reader will find many intriguing ideas, much that is outrageous, and even some surprises (the likelihood principle is not mentioned, and conditional inference is just barely mentioned).” B. Hill, JASA, 1986, p.571

“What is disappointing to me is that with a little more discipline and effort with regard to the ideas underlying Bayesian statistics, this book could have been a major contribution to the theory.” B. Hill, JASA, 1986, p.571

Another review by William Sudderth (1985, Bulletin of the American Mathematical Society) is much kinder to the book, except for the complaint that “the pace is brisk and sometimes hard to follow”.

## Bayesian inference with intractable normalizing functions

Posted in Books, Statistics with tags adaptive MCMC methods, American Statistical Association, auxiliary variable, benchmark, doubly intractable problems, importance sampling, Ising model, JASA, MCMC algorithms, noisy MCMC, normalising constant, Russian roulette on December 13, 2018 by xi'an**I**n the latest September issue of JASA I received a few days ago, I spotted a review paper by Jaewoo Park & Murali Haran on intractable normalising constants Z(θ). There have been many proposals for solving this problem as well as several surveys, some conferences and even a book. The current survey focus on MCMC solutions, from auxiliary variable approaches to likelihood approximation algorithms (albeit without ABC entries, even though the 2006 auxiliary variable solutions of Møller et al. et of Murray et al. do simulate pseudo-observations and hence…). This includes the MCMC approximations to auxiliary sampling proposed by Faming Liang and co-authors across several papers. And the paper Yves Atchadé, Nicolas Lartillot and I wrote ten years ago on an adaptive MCMC targeting Z(θ) and using stochastic approximation à la Wang-Landau. Park & Haran stress the relevance of using sufficient statistics in this approach towards fighting computational costs, which makes me wonder if an ABC version could be envisioned. The paper also includes pseudo-marginal techniques like Russian Roulette (once spelled Roullette) and noisy MCMC as proposed in Alquier et al. (2016). These methods are compared on three examples: (1) the Ising model, (2) a social network model, the Florentine business dataset used in our original paper, and a larger one where most methods prove too costly, and (3) an attraction-repulsion point process model. In conclusion, an interesting survey, taking care to spell out the calibration requirements and the theoretical validation, if of course depending on the chosen benchmarks.

## empirical Bayes, reference priors, entropy & EM

Posted in Mountains, Statistics, Travel, University life with tags arXiv, Darjeeling, EM algorithm, empirical Bayes, I.J. Good, JASA, Kullback-Leibler divergence, MLE, non-parametrics, penalty, reparameterisation, Robbins-Monro algorithm on January 9, 2017 by xi'an**K**lebanov and co-authors from Berlin arXived this paper a few weeks ago and it took me a quiet evening in Darjeeling to read it. It starts with the premises that led Robbins to introduce empirical Bayes in 1956 (although the paper does not appear in the references), where repeated experiments with different parameters are run. Except that it turns non-parametric in estimating the prior. And to avoid resorting to the non-parametric MLE, which is the empirical distribution, it adds a smoothness penalty function to the picture. (**Warning:** I am not a big fan of non-parametric MLE!) The idea seems to have been Good’s, who acknowledged using the entropy as penalty is missing in terms of reparameterisation invariance. Hence the authors suggest instead to use as penalty function on the prior a joint relative entropy on both the parameter and the prior, which amounts to the average of the Kullback-Leibler divergence between the sampling distribution and the predictive based on the prior. Which is then independent of the parameterisation. And of the dominating measure. This is the only tangible connection with *reference priors* found in the paper.

The authors then introduce a non-parametric EM algorithm, where the unknown prior becomes the “parameter” and the M step means optimising an entropy in terms of this prior. With an infinite amount of data, the true prior (meaning the overall distribution of the genuine parameters in this repeated experiment framework) is a fixed point of the algorithm. However, it seems that the only way it can be implemented is via discretisation of the parameter space, which opens a whole Pandora box of issues, from discretisation size to dimensionality problems. And to motivating the approach by regularisation arguments, since the final product remains an atomic distribution.

While the alternative of estimating the marginal density of the data by kernels and then aiming at the closest entropy prior is discussed, I find it surprising that the paper does not consider the rather natural of setting a prior on the prior, e.g. via Dirichlet processes.