In the recent days, we have had a lively discussion among AEs of the Annals of Statistics, as to whether or not set up a policy regarding publications of documents that have already been published in a shortened (8 pages) version in a machine learning conference like NIPS. Or AISTATS. While I obviously cannot disclose details here, the debate is quite interesting and may bring the machine learning and statistics communities closer if resolved in a certain way. My own and personal opinion on that matter is that what matters most is what’s best for Annals of Statistics rather than the authors’ tenure or the different standards in the machine learning community. If the submitted paper is based on a brilliant and novel idea that can appeal to a sufficiently wide part of the readership and if the maths support of that idea is strong enough, we should publish the paper. Whether or not an eight-page preliminary version has been previously published in a conference proceeding like NIPS does not seem particularly relevant to me, as I find those short papers mostly unreadable and hence do not read them. Since Annals of Statistics runs an anti-plagiarism software that is most likely efficient, blatant cases of duplications could be avoided. Of course, this does not solve all issues and papers with similar contents can and will end up being published. However, this is also the case for statistics journals and statistics, in the sense that brilliant ideas sometimes end up being split between two or three major journals.
Archive for Annals of Statistics
While a random walk Metropolis-Hastings algorithm cannot be uniformly ergodic in a general setting (Mengersen and Tweedie, AoS, 1996), because it needs more energy to leave far away starting points, it can be geometrically ergodic depending on the target (and the proposal). In a recent Annals of Statistics paper, Leif Johnson and Charlie Geyer designed a trick to turn a random walk Metropolis-Hastings algorithm into a geometrically ergodic random walk Metropolis-Hastings algorithm by virtue of an isotropic transform (under the provision that the original target density has a moment generating function). This theoretical result is complemented by an R package called mcmc. (I have not tested it so far, having read the paper in the métro.) The examples included in the paper are however fairly academic and I wonder how the method performs in practice, on truly complex models, in particular because the change of variables relies on (a) an origin and (b) changing the curvature of space uniformly in all dimensions. Nonetheless, the idea is attractive and reminds me of a project of ours with Randal Douc, started thanks to the ‘Og and still under completion.
As Xiao-Li Meng accepted to review—and I am quite grateful he managed to fit this review in an already overflowing deanesque schedule!— our 2004 book Monte Carlo Statistical Methods as part of a special book review issue of CHANCE honouring the memory of George thru his books—thanks to Sam Behseta for suggesting this!—, he sent me the following email about one of our proofs—demonstrating how much efforts he had put into this review!—:
I however have a question about the proof of Lemma 7.3 on page 273. After the expression of E[h(x^(1)|x_0], the proof stated "and substitute Eh(x) for h(x_1)". I cannot think of any justification for this substitution, given the whole purpose is to show h(x) is a constant.
I put it on hold for a while and only looked at it in the (long) flight to Chicago. Lemma 7.3 in Monte Carlo Statistical Methods is the result that the Metropolis-Hastings algorithm is Harris recurrent (and not only recurrent). The proof is based on the characterisation of Harris recurrence as having only constants for harmonic functions, i.e. those satisfying the identity
The chain being recurrent, the above implies that harmonic functions are almost everywhere constant and the proof steps from almost everywhere to everywhere. The fact that the substitution above—and I also stumbled upon that very subtlety when re-reading the proof in my plane seat!—is valid is due to the fact that it occurs within an integral: despite sounding like using the result to prove the result, the argument is thus valid! Needless to say, we did not invent this (elegant) proof but took it from one of the early works on the theory of Metropolis-Hastings algorithms, presumably Luke Tierney’s foundational Annals paper work that we should have quoted…
As pointed out by Xiao-Li, the proof is also confusing for the use of two notations for the expectation (one of which is indexed by f and the other corresponding to the Markov transition) and for the change in the meaning of f, now the stationary density, when compared with Theorem 6.80.
“If a statistical procedure is to be judged by a criterion such as a conventional loss function (…) we should not expect optimal results from a probabilistic theory that demands multiple observations and multiple parameters.” P. McCullagh & H. Han
Peter McCullagh and Han Han have just published in the Annals of Statistics a paper on Bayes’ theorem for improper mixtures. This is a fascinating piece of work, even though some parts do elude me… The authors indeed propose a framework based on Kingman’s Poisson point processes that allow to include (countable) improper priors in a coherent probabilistic framework. This framework requires the definition of a test set A in the sampling space, the observations being then the events Y∩ A, Y being an infinite random set when the prior is infinite. It is therefore complicated to perceive this representation in a genuine Bayesian framework, i.e. for a single observation, corresponding to a single parameter value. In that sense it seems closer to the original empirical Bayes, à la Robbins.
“An improper mixture is designed for a generic class of problems, not necessarily related to one another scientifically, but all having the same mathematical structure.” P. McCullagh & H. Han
The paper thus misses in my opinion a clear link with the design of improper priors. And it does not offer a resolution of the improper prior Bayes factor conundrum. However, it provides a perfectly valid environment for working with improper priors. For instance, the final section on the marginalisation “paradoxes” is illuminating in this respect as it does not demand using a limit of proper priors.
I received this email last week from Ian Langmore, a postdoc in Columbia:
I’m looking for literature on a subject and can’t find it: I have a Metropolis sampler where the acceptance probability is evaluated with some error. This error is not simply error in evaluation of the target density. It occurs due to the method by which we approximate the acceptance probability.
This is a sensible question, albeit a wee vague… The closest item of work I can think of is the recent paper by Christophe Andrieu and Gareth Roberts, in the Annals of Statistics (2009) following an original proposal by Marc Beaumont. I think there is an early 1990’s paper by Gareth and Jeff Rosenthal where they consider the impact of some approximation effect like real number representation on the convergence but I cannot find it. Of course, the recent particle MCMC JRSS B discussion paper by Christophe, Arnaud Doucet and Roman Hollenstein is a way to bypass the problem. (In a sense ABC is a rudimentary answer as well.) And there must be many other papers on this topic I am not aware of….
The Vanilla Rao–Blackwellization of Metropolis–Hastings algorithms paper with Randal Douc is now published in Annals of Statistics (Volume 39, Number 1 (2011), pages 261-277) and available on-line via the project Euclid. We are currently working with Pierre Jacob on an extension of this idea towards parallelisation…
Following a kind invitation of Bala Rajaratnam during JSM 2010, I will give a special seminar in Stanford University this incoming Monday at 4:30, on my recent Annals paper with Randal Douc on “A vanilla Rao-Blackwellisation of Metropolis-Hastings algorithms“. Here are the slides