estimating the measure and hence the constant

Posted in pictures, Running, Statistics, University life with tags , , , , , , , on December 6, 2012 by xi'an

As mentioned on my post about the final day of the ICERM workshop, Xiao-Li Meng addresses this issue of “estimating the constant” in his talk. It is even his central theme. Here are his (2011) slides as he sent them to me (with permission to post them!):

He therefore points out in slide #5 why the likelihood cannot be expressed in terms of the normalising constant because this is not a free parameter. Right! His explanation for the approximation of the unknown constant is then to replace the known but intractable dominating measure—in the sense that it cannot compute the integral—with a discrete (or non-parametric) measure supported by the sample. Because the measure is defined up to a constant, this leads to sample weights being proportional to the inverse density. Of course, this representation of the problem is open to criticism: why focus only on measures supported by the sample? The fact that it is the MLE is used as an argument in Xiao-Li’s talk, but this can alternatively be seen as a drawback: I remember reviewing Dankmar Böhning’s Computer-Assisted Analysis of Mixtures and being horrified when discovering this feature! I am currently more agnostic since this appears as an alternative version of empirical likelihood. There are still questions about the measure estimation principle: for instance, when handling several samples from several distributions, why should they all contribute to a single estimate of μ rather than to a product of measures? (Maybe because their models are all dominated by the same measure μ.) Now, getting back to my earlier remark, and as a possible answer to Larry’s quesiton, there could well be a Bayesian version of the above, avoiding the rough empirical likelihood via Gaussian or Drichlet process prior modelling.

lemma 7.3

Posted in Statistics with tags , , , , , , , , , , , on November 14, 2012 by xi'an

As Xiao-Li Meng accepted to review—and I am quite grateful he managed to fit this review in an already overflowing deanesque schedule!— our 2004 book  Monte Carlo Statistical Methods as part of a special book review issue of CHANCE honouring the memory of George thru his books—thanks to Sam Behseta for suggesting this!—, he sent me the following email about one of our proofs—demonstrating how much efforts he had put into this review!—:

```I however have a question about the proof of Lemma 7.3
on page 273. After the expression of
E[h(x^(1)|x_0], the proof stated "and substitute
Eh(x) for h(x_1)".  I cannot think of any
justification for this substitution, given the whole
purpose is to show h(x) is a constant.```

I put it on hold for a while and only looked at it in the (long) flight to Chicago. Lemma 7.3 in Monte Carlo Statistical Methods is the result that the Metropolis-Hastings algorithm is Harris recurrent (and not only recurrent). The proof is based on the characterisation of Harris recurrence as having only constants for harmonic functions, i.e. those satisfying the identity

$h(x) = \mathbb{E}[h(X_t)|X_{t-1}=x]$

The chain being recurrent, the above implies that harmonic functions are almost everywhere constant and the proof steps from almost everywhere to everywhere. The fact that the substitution above—and I also stumbled upon that very subtlety when re-reading the proof in my plane seat!—is valid is due to the fact that it occurs within an integral: despite sounding like using the result to prove the result, the argument is thus valid! Needless to say, we did not invent this (elegant) proof but took it from one of the early works on the theory of Metropolis-Hastings algorithms, presumably Luke Tierney’s foundational Annals paper work that we should have quoted…

As pointed out by Xiao-Li, the proof is also confusing for the use of two notations for the expectation (one of which is indexed by f and the other corresponding to the Markov transition) and for the change in the meaning of f, now the stationary density, when compared with Theorem 6.80.

Xiao-Li Meng’s inception [in Paris]

Posted in Statistics, University life with tags , , , , , on July 27, 2011 by xi'an

Xiao-Li Meng will give a talk in Paris next September 1st, so I advertise it now, before my Parisian readers leave the city for their August retreat. Here is the abstract, explaining the above title:

Statistical Inception for the MCMC Dream: The kick is in the residual (augmentation)!

Xiao-Li Meng

Department of Statistics, Harvard University

The development of MCMC algorithms via data augmentation (DA) or equivalently auxiliary variables has some resemblance to the theme plot of the recent Hollywood hit Inception. We MCMC designers all share essentially the same “3S” dream, that is, to create algorithms that are simple, stable, and speedy. Within that grand dream, however, we have created a rather complex web of tools, with some of them producing very similar algorithms but for unclear reasons, or others that were thought to be of different origins but actually are layered when viewed from a suitable distance. These include conditional augmentation, marginal augmentation, PX-DA, partially non-centering parameterization, sandwiched algorithms, interweaving strategies, ASIS, etc. It turns out that there is a simple statistical insight that can unify essentially all these methods conceptually, and it also provides practical guidelines for their DA constructions. It is the simple concept of regression residuals, which are constructed to be orthogonal to the regression functions. All these methods in one form or another effectively build a residual augmentation. Given a DA distribution f(T, A), where T is our targeted variable (i.e., f(T) is our targeted distribution) and A is the augmented variable, there are two broad classes of residuals depending on whether we regress T on A or A on T. In this talk we will demonstrate how methods like conditional augmentation and partially non-centering parameterization build their residual augmentations by regressing A on T, whereas methods such as marginal augmentation and ASIS effectively use residual augmentations from regressing T on A. For either class, the attempted orthogonality helps to reduce the dependence among MCMC draws, and when the orthogonality leads to true independence as occurring in some special cases, we reach the dream of producing i.i.d. draws. (The talk is based on an upcoming discussion article, especially its rejoinder, Yu and Meng (2011, JCGS) )

The talk will take place at Institut Henri Poincaré, Thursday Sept. 1, at 15:00, as part of the Big’MC seminars.