## Special Issue of ACM TOMACS on Monte Carlo Methods in Statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , on December 10, 2012 by xi'an

As posted here a long, long while ago, following a suggestion from the editor (and North America Cycling Champion!) Pierre Lécuyer (Université de Montréal), Arnaud Doucet (University of Oxford) and myself acted as guest editors for a special issue of ACM TOMACS on Monte Carlo Methods in Statistics. (Coincidentally, I am attending a board meeting for TOMACS tonight in Berlin!) The issue is now ready for publication (next February unless I am confused!) and made of the following papers:

 * Massive parallelization of serial inference algorithms for a complex generalized linear model MARC A. SUCHARD, IVAN ZORYCH, PATRICK RYAN, DAVID MADIGAN Abstract *Convergence of a Particle-based Approximation of the Block Online Expectation Maximization Algorithm SYLVAIN LE CORFF and GERSENDE FORT Abstract * Efficient MCMC for Binomial Logit Models AGNES FUSSL, SYLVIA FRÜHWIRTH-SCHNATTER, RUDOLF FRÜHWIRTH Abstract * Adaptive Equi-Energy Sampler: Convergence and Illustration AMANDINE SCHRECK and GERSENDE FORT and ERIC MOULINES Abstract * Particle algorithms for optimization on binary spaces CHRISTIAN SCHÄFER Abstract * Posterior expectation of regularly paved random histograms RAAZESH SAINUDIIN, GLORIA TENG, JENNIFER HARLOW, and DOMINIC LEE Abstract * Small variance estimators for rare event probabilities MICHEL BRONIATOWSKI and VIRGILE CARON Abstract * Self-Avoiding Random Dynamics on Integer Complex Systems FIRAS HAMZE, ZIYU WANG, and NANDO DE FREITAS Abstract * Bayesian learning of noisy Markov decision processes SUMEETPAL S. SINGH, NICOLAS CHOPIN, and NICK WHITELEY Abstract

Here is the draft of the editorial that will appear at the beginning of this special issue. (All faults are mine, of course!) Read more »

## more typos in Monte Carlo statistical methods

Posted in Books, Statistics, University life with tags , , , , , , , , , on October 28, 2011 by xi'an

Jan Hanning kindly sent me this email about several difficulties with Chapters 3, Monte Carlo Integration, and  5, Monte Carlo Optimization, when teaching out of our book Monte Carlo Statistical Methods [my replies in italics between square brackets, apologies for the late reply and posting, as well as for the confusion thus created. Of course, the additional typos will soon be included in the typo lists on my book webpage.]:

1. I seem to be unable to reproduce Table 3.3 on page 88 – especially the chi-square column does not look quite right. [No, they definitely are not right: the true χ² quantiles should be 2.70, 3.84, and 6.63, at the levels 0.1, 0.05, and 0.01, respectively. I actually fail to understand how we got this table that wrong...]
2. The second question  I have is the choice of the U(0,1) in this Example 3.6. It  feels to me that a choice of Beta(23.5,18.5) for p1 and Beta(36.5,5.5) for p2 might give a better representation based on the data we have. Any comments? [I am plainly uncertain about this... Yours is the choice based on the posterior Beta coefficient distributions associated with Jeffreys prior, hence making the best use of the data. I wonder whether or not we should remove this example altogether... It is certainly "better" than the uniform. However, in my opinion, there is no proper choice for the distribution of the pi's because we are mixing there a likelihood-ratio solution with a Bayesian perspective on the predictive distribution of the likelihood-ratio. If anything, this exposes the shortcomings of a classical approach, but it is likely to confuse the students! Anyway, this is a very interesting problem.]
3. My students discovered that Problem 5.19 has the following typos, copying from their e-mail: “x_x” should be “x_i” [sure!]. There are a few “( )”s missing here and there [yes!]. Most importantly, the likelihood/density seems incorrect. The normalizing constant should be the reciprocal of the one showed in the book [oh dear, indeed, the constant in the exponential density did not get to the denominator...]. As a result, all the formulas would differ except the ones in part (a). [they clearly need to be rewritten, sorry about this mess!]
4. I am unsure about the if and only if part of the Theorem 5.15 [namely that the likelihood sequence is stationary if and only if the Q function in the E step has reached a stationary point]. It appears to me that a condition for the “if part” is missing [the "only if" part is a direct consequence of Jensen's inequality]. Indeed Theorem 1 of Dempster et al 1977 has an extra condition [note that the original proof for convergence of EM has a flaw, as discussed here]. Am I missing something obvious? [maybe: it seems to me that, once Q reaches a fixed point, the likelihood L does not change... It is thus tautological, not a proof of convergence! But the theorem says a wee more, so this needs investigating. As Jan remarked, there is no symmetry in the Q function...]
5. Should there be a (n-m) in the last term of formula (5.17)? [yes, indeed!, multiply the last term by (n-m)]
6. Finally, I am a bit confused about the likelihood in Example 5.22 [which is a capture-recapture model]. Assume that Hij=k [meaning the animal i is in state k at time j]. Do you assume that you observed Xijr [which is the capture indicator for animal i at time j in zone k: it is equal to 1 for at most one k] as a Binomial B(n,pr) even for r≠k? [no, we observe all Xijr's with r≠k equal to zero]  The nature of the problem seems to suggest that the answer is no [for other indices, Xijr is always zero, indeed] If that is the case I do not see where the power on top of (1-pk) in the middle of the page 185 comes from [when the capture indices are zero, they do not contribute to the sum, which explains for this condensed formula. Therefore, I do not think there is anything wrong with this over-parameterised representation of the missing variables.]
7. In Section 5.3.4, there seems to be a missing minus sign in the approximation formula for the variance [indeed, shame on us for missing the minus in the observed information matrix!]
8. I could not find the definition of $\mathbb{N}^*$ in Theorem 6.15. Is it all natural numbers or all integers? May be it would help to include it in Appendix B. [Surprising! This is the set of all positive integers, I thought this was a standard math notation...]
9. In Definition 6.27, you probably want to say covering of A and not X. [Yes, we were already thinking of the next theorem, most likely!]
10. In Proposition 6.33 -   all x in A instead of all x in X. [Yes, again! As shown in the proof. Even though it also holds for all x in X]

Thanks a ton to Jan and to his UNC students (and apologies for leading them astray with those typos!!!)

## Another history of MCMC

Posted in Books, Statistics, University life with tags , , , , , on April 20, 2011 by xi'an

In the most recent issue of Statistical Science, the special topic is “Celebrating the EM Algorithm’s Quandunciacentennial“. It contains an historical survey by Martin Tanner and Wing Wong on the emergence of MCMC Bayesian computation in the 1980′s, This survey is more focused and more informative than our global history (also to appear in Statistical Science). In particular, it provides the authors’ analysis as to why MCMC was delayed by ten years or so (or even more when considering that a Gibbs sampler as a simulation tool appears in both Hastings’ (1970) and Besag‘s (1974) papers). They dismiss [our] concerns about computing power (I was running Monte Carlo simulations on my Apple IIe by 1986 and a single mean square error curve evaluation for a James-Stein type estimator would then take close to a weekend!) and Markov innumeracy, rather attributing the reluctance to a lack of confidence into the method. This perspective remains debatable as, apart from Tony O’Hagan who was then fighting again Monte Carlo methods as being un-Bayesian (1987, JRSS D),  I do not remember any negative attitude at the time about simulation and the immediate spread of the MCMC methods from Alan Gelfand’s and Adrian Smith’s presentations of their 1990 paper shows on the opposite that the Bayesian community was ready for the move.

Another interesting point made in this historical survey is that Metropolis’ and other Markov chain methods were first presented outside simulation sections of books like Hammersley and Handscomb (1964), Rubinstein (1981) and Ripley (1987), perpetuating the impression that such methods were mostly optimisation or niche specific methods. This is also why Besag’s earlier works (not mentioned in this survey) did not get wider recognition, until later. Something I was not aware is the appearance of iterative adaptive importance sampling (i.e. population Monte Carlo) in the Bayesian literature of the 1980′s, with proposals from Herman van Dijk, Adrian Smith, and others. The appendix about Smith et al. (1985), the 1987 special issue of JRSS D, and the computation contents of Valencia 3 (that I sadly missed for being in the Army!) is also quite informative about the perception of computational Bayesian statistics at this time.

A missing connection in this survey is Gilles Celeux and Jean Diebolt’s stochastic EM (or SEM). As early as 1981, with Michel Broniatowski, they proposed a simulated version of EM  for mixtures where the latent variable z was simulated from its conditional distribution rather than replaced with its expectation. So this was the first half of the Gibbs sampler for mixtures we completed with Jean Diebolt about ten years later. (Also found in Gelman and King, 1990.) These authors did not get much recognition from the community, though, as they focused almost exclusively on mixtures, used simulation to produce a randomness that would escape the local mode attraction, rather than targeting the posterior distribution, and did not analyse the Markovian nature of their algorithm until later with the simulated annealing EM algorithm.

## On-line EM

Posted in Statistics, University life with tags , , , , on March 4, 2011 by xi'an

Just attended a local Big’MC seminar where Olivier Cappé gave us the ideas behind the online EM algorithm he developed with Eric Moulines. The method mixes the integrated EM technique we used in the population Monte Carlo paper with Robbin-Monro, to end up with a converging sequence with an optimal speed. The paper appeared in JRSS Series B in 2009, so I cannot say this was a complete surprise. The less because this is also the theme of the chapter Olivier wrote for the mixture book. (Soon to be ready!)

## Typo in Example 5.18

Posted in Books, R, Statistics, University life with tags , , , on October 3, 2010 by xi'an

Edward Kao is engaged in a detailed parallel reading of Monte Carlo Statistical Methods and of Introducing Monte Carlo Methods with R. He has pointed out several typos in Example 5.18 of Monte Carlo Statistical Methods which studies a missing data phone plan model and its EM resolution. First, the customers in area i should be double-indexed, i.e.

$Z_{ij}\sim\mathcal{M}(1,(p_1,\ldots,p_5))$

which implies in turn that

$T_i=\sum_{j=1}^{n_j}Z_{ij}$.

Then the summary T should be defined as

$\mathbf{T}=(T_1,T_2,\ldots,T_n)$

and $W_5$ as

$W_5=\sum_{i=m+1}^nT_{i5},$

given that the first m customers have the fifth plan missing.