Archive for Markov chains

reading classics (#6)

Posted in Statistics with tags , , , , , , , on December 21, 2012 by xi'an

Today my student Xiaolin Cheng presented the mythical 1990 JASA paper of Alan Gelfand and Adrian Smith, Sampling-based approaches to calculating marginal densities. The very one that started the MCMC revolution of the 1990′s! Re-reading it through his eyes was quite enlightening, even though he stuck quite closely to the paper. (To the point of not running his own simulation, nor even reporting Gelfand and Smith’s, as shown by the slides below. This would have helped, I think…)

Indeed, those slides focus very much on the idea that such substitution samplers can provide parametric approximations to the marginal densities of the components of the simulated parameters. To the point of resorting to importance sampling as an alternative to the standard Rao-Blackwell estimate, a solution that did not survive long. (We briefly discussed this point during the seminar, as the importance function was itself based on a Rao-Blackwell estimate, with possibly tail issues. Gelfand and Smith actually conclude on the higher efficiency of the Gibbs sampler.) Maybe not so surprisingly, the approximation of the “other” marginal, namely the marginal likelihood, as it is much more involved (and would lead to the introduction of the infamous harmonic mean estimator a few years later! And Chib’s (1995), which is very close in spirit to the Gibbs sampler). While Xiaolin never mentioned Markov chains in his talk, Gelfand and Smith only report that Gibbs sampling is a Markovian scheme, and refer to both Geman and Geman (1984) and Tanner and Wong (1987), for convergence issues. Rather than directly invoking Markov arguments as in Tierney (1994) and others. A fact that I find quite interesting, a posteriori, as it highlights the strong impact Meyn and Tweedie would have, three years later.

lemma 7.3

Posted in Statistics with tags , , , , , , , , , , , on November 14, 2012 by xi'an

As Xiao-Li Meng accepted to review—and I am quite grateful he managed to fit this review in an already overflowing deanesque schedule!— our 2004 book  Monte Carlo Statistical Methods as part of a special book review issue of CHANCE honouring the memory of George thru his books—thanks to Sam Behseta for suggesting this!—, he sent me the following email about one of our proofs—demonstrating how much efforts he had put into this review!—:

I however have a question about the proof of Lemma 7.3 
on page 273. After the expression of
E[h(x^(1)|x_0], the proof stated "and substitute 
Eh(x) for h(x_1)".  I cannot think of any
justification for this substitution, given the whole 
purpose is to show h(x) is a constant.

I put it on hold for a while and only looked at it in the (long) flight to Chicago. Lemma 7.3 in Monte Carlo Statistical Methods is the result that the Metropolis-Hastings algorithm is Harris recurrent (and not only recurrent). The proof is based on the characterisation of Harris recurrence as having only constants for harmonic functions, i.e. those satisfying the identity

h(x) = \mathbb{E}[h(X_t)|X_{t-1}=x]

The chain being recurrent, the above implies that harmonic functions are almost everywhere constant and the proof steps from almost everywhere to everywhere. The fact that the substitution above—and I also stumbled upon that very subtlety when re-reading the proof in my plane seat!—is valid is due to the fact that it occurs within an integral: despite sounding like using the result to prove the result, the argument is thus valid! Needless to say, we did not invent this (elegant) proof but took it from one of the early works on the theory of Metropolis-Hastings algorithms, presumably Luke Tierney’s foundational Annals paper work that we should have quoted…

As pointed out by Xiao-Li, the proof is also confusing for the use of two notations for the expectation (one of which is indexed by f and the other corresponding to the Markov transition) and for the change in the meaning of f, now the stationary density, when compared with Theorem 6.80.

Who’s #1?

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , on May 2, 2012 by xi'an

First, apologies for this teaser of a title! This post is not about who is #1 in whatever category you can think of, from statisticians to climbs [the Eiger Nordwand, to be sure!], to runners (Gebrselassie?), to books… (My daughter simply said “c’est moi!” when she saw the cover of this book on my desk.) So this is in fact a book review of…a book with this catching title I received a month or so ago!

We decided to forgo purely statistical methodology, which is probably a disappointment to the hardcore statisticians.” A.N. Langville & C.D. Meyer, Who’s #1? The Science of Rating and Ranking (page 225)

This book may be one of the most boring ones I have had to review so far! The reason for this disgruntled introduction to “Who’s #1? The Science of Rating and Ranking” by Langville and Meyer is that it has very little if any to do with statistics and modelling. (And also that it is mostly about American football, a sport I am not even remotely interested in.) The purpose of the book is to present ways of building rating and ranking within a population, based on pairwise numerical connections between some members of this population. The methods abound, at least eight are covered by the book, but they all suffer from the same drawback that they are connected to no grand truth, to no parameter from an underlying probabilistic model, to no loss function that would measure the impact of a “wrong” rating. (The closer it comes to this is when discussing spread betting in Chapter 9.) It is thus a collection of transformation rules, from matrices to ratings. I find this the more disappointing in that there exists a branch of statistics called ranking and selection that specializes in this kind of problems and that statistics in sports is a quite active branch of our profession, witness the numerous books by Jim Albert. (Not to mention Efron’s analysis of baseball data in the 70′s.)

First suppose that in some absolutely perfect universe there is a perfect rating vector.” A.N. Langville & C.D. Meyer, Who’s #1? The Science of Rating and Ranking (page 117)

The style of the book is disconcerting at first, and then some, as it sounds written partly from Internet excerpts (at least for most of the pictures) and partly from local student dissertations… The mathematical level is highly varying, in that the authors take the pain to define what a matrix is (page 33), only to jump to Perron-Frobenius theorem a few pages later (page 36). It also mentions Laplace’s succession rule (only justified as a shrinkage towards the center, i.e. away from 0 and 1), the Sinkhorn-Knopp theorem, the traveling salesman problem, Arrow and Condorcet, relaxation and evolutionary optimization, and even Kendall’s and Spearman’s rank tests (Chapter 16), even though no statistical model is involved. (Nothing as terrible as the completely inappropriate use of Spearman’s rho coefficient in one of Belfiglio’s studies…)

Since it is hard to say which ranking is better, our point here is simply that different methods can produce vastly different rankings.” A.N. Langville & C.D. Meyer, Who’s #1? The Science of Rating and Ranking (page 78)

I also find irritating the association of “science” with “rating”, because the techniques presented in this book are simply tricks to turn pairwise comparison into a general ordering of a population, nothing to do with uncovering ruling principles explaining the difference between the individuals. Since there is no validation for one ordering against another, we can see no rationality in proposing any of those, except to set a convention. The fascination of the authors for the Markov chain approach to the ranking problem is difficult to fathom as the underlying structure is not dynamical (there is not evolving ranking along games in this book) and the Markov transition matrix is just constructed to derive a stationary distribution, inducing a particular “Markov” ranking.

The Elo rating system is the epitome of simple elegance.” A.N. Langville & C.D. Meyer, Who’s #1? The Science of Rating and Ranking (page 64)

An interesting input of the book is its description of the Elo ranking system used in chess, of which I did not know anything apart from its existence. Once again, there is a high degree of arbitrariness in the construction of the ranking, whose sole goal is to provide a convention upon which most people agree. A convention, mind, not a representation of truth! (This chapter contains a section on the Social Network movie, where a character writes a logistic transform on a window, missing the exponent. This should remind Andrew of someone he often refer to in his blog!)

Perhaps the largest lesson is not to put an undue amount of faith in anyone’s rating.” A.N. Langville & C.D. Meyer, Who’s #1? The Science of Rating and Ranking (page 125)

In conclusion, I see little point in suggesting reading this book, unless one is interested in matrix optimization problems and/or illustrations in American football… Or unless one wishes to write a statistics book on the topic!

the Wang-Landau algorithm reaches the flat histogram in finite time

Posted in R, Statistics, University life with tags , , , , on October 20, 2011 by xi'an

Pierre Jacob and Robin Ryder (from Paris-Dauphine, CREST, and Statisfaction) have just arXived (and submitted to the Annals of Applied Probability) a neat result on the Wang-Landau algorithm. (This algorithm, which modifies the target in a sort of reweighted partioned sampling to achieve faster convergence, has always been perplexing to me.)  They show that some variations of the Wang-Landau algorithm meet the flat histogram criterion in finite time, and, just as importantly that other variations do not reach this criterion. The proof uses elegant Markov chain arguments and I hope the paper makes it through, as there are very few theoretical results on this algorithm. (Pierre also wrote recently a paper with Luke Bornn, Arnaud Doucet, and Pierre Del Moral, on An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration last week, with an associated R package. Not yet on CRAN.)

principles of uncertainty

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , on October 14, 2011 by xi'an

Bayes Theorem is a simple consequence of the axioms of probability, and is therefore accepted by all as valid. However, some who challenge the use of personal probability reject certain applications of Bayes Theorem.“  J. Kadane, p.44

Principles of uncertainty by Joseph (“Jay”) Kadane (Carnegie Mellon University, Pittsburgh) is a profound and mesmerising book on the foundations and principles of subjectivist or behaviouristic Bayesian analysis. Jay Kadane wrote Principles of uncertainty over a period of several years and, more or less in his own words, it represents the legacy he wants to leave for the future. The book starts with a large section on Jay’s definition of a probability model, with rigorous mathematical derivations all the way to Lebesgue measure (or more exactly the McShane-Stieltjes measure). This section contains many side derivations that pertain to mathematical analysis, in order to explain the subtleties of infinite countable and uncountable sets, and the distinction between finitely additive and countably additive (probability) measures. Unsurprisingly, the role of utility is emphasized in this book that keeps stressing the personalistic entry to Bayesian statistics. Principles of uncertainty also contains a formal development on the validity of Markov chain Monte Carlo methods that is superb and missing in most equivalent textbooks. Overall, the book is a pleasure to read. And highly recommended for teaching as it can be used at many different levels. Read more »

Follow

Get every new post delivered to your Inbox.

Join 343 other followers