Archive for EM algorithm

Nonlinear Time Series just appeared

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , on February 26, 2014 by xi'an

My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating both inference in nonlinear time series and algorithms applied to those objects. The later includes MCMC, pMCMC, sequential Monte Carlo, particle filters, and the EM algorithm. While I am too close to the authors to write a balanced review for CHANCE (the book is under review by another researcher, before you ask!), I think this is an important book that reflects the state of the art in the rigorous study of those models. Obviously, the mathematical rigour advocated by the authors makes Nonlinear Time Series a rather advanced book (despite the authors’ reassuring statement that “nothing excessively deep is used”) more adequate for PhD students and researchers than starting graduates (and definitely not advised for self-study), but the availability of the R code (on the highly personal page of David Stoffer) comes to balance the mathematical bent of the book in the first and third parts. A great reference book!

finite mixture models [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , on February 17, 2014 by xi'an

Here is a review of Finite Mixture Models (2000) by Geoff McLachlan & David Peel that I wrote aeons ago (circa 1999), supposedly for JASA, which lost first the files and second the will to publish it. As I was working with my student today, I mentioned the book to her and decided to publish it here, if only because I think the book deserved a positive review, even after all those years! (Since then, Sylvia Frühwirth-Schnatter published Finite Mixture and Markov Switching Models (2004), which is closer to my perspective on the topic and that I would more naturally recommend.)

Mixture modeling, that is, the use of weighted sums of standard distributions as in

\sum_{i=1}^k p_i f({\mathbf y};{\mathbf \theta}_i)\,,

is a widespread and increasingly used technique to overcome the rigidity of standard parametric distributions such as f(y;θ), while retaining a parametric nature, as exposed in the introduction of my JASA review to Böhning’s (1998) book on non-parametric mixture estimation (Robert, 2000). This review pointed out that, while there are many books available on the topic of mixture estimation, the unsurpassed reference remained the book by Titterington, Smith and Makov (1985)  [hereafter TSM]. I also suggested that a new edition of TSM would be quite timely, given the methodological and computational advances that took place in the past 15 years: while it remains unclear whether or not this new edition will ever take place, the book by McLachlan and Peel gives an enjoyable and fairly exhaustive update on the topic, incorporating the most recent advances on mixtures and some related models.

Geoff McLachlan has been a major actor in the field for at least 25 years, through papers, software—the book concludes with a review of existing software—and books: McLachlan (1992), McLachlan and Basford (1988), and McLachlan and Krishnan (1997). I refer the reader to Lindsay (1989) for a review of the second book, which is a forerunner of, and has much in common with, the present book. Continue reading

reading classics (#9,10)

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , on January 28, 2014 by xi'an

La Défense from Paris-Dauphine, Nov. 15, 2012Today was the very last session of our Reading Classics Seminar for the academic year 2013-2014. We listened two presentations, one on the Casella and Strawderman (1984) paper on the estimation of the normal bounded mean. And one on the Hartigan and Wong’s 1979 K-Means Clustering Algorithm paper in JRSS C. The first presentation did not go well as my student had difficulties with the maths behind the paper. (As he did not come to ask me or others for help, it may well be that he put this talk together at the last minute, at a time busy with finals and project deliveries. He also failed to exploit those earlier presentations of the paper.) The innovative part in the talk was the presentation of several R simulations comparing the risk of the minimax Bayes estimator with the one for the MLE. Although the choice of simulating different samples of standard normals for different values of the parameters and even for both estimators made the curves (unnecessarily) all wiggly.

By contrast, the second presentation was very well-designed, with great Beamer slides, interactive features and a software oriented focus. My student Mouna Berrada started from the existing R function kmeans to explain the principles of the algorithm, recycling the interactive presentation of last year as well (with my permission), and creating a dynamic flowchart that was most helpful. So she made the best of this very short paper! Just (predictably) missing the question of the statistical model behind the procedure. During the discussion, I mused why k-medians clustering was not more popular as it offered higher robustness guarantees, albeit further away from a genuine statistical model. And why k-means clustering was not more systematically compared with mixture (EM) estimation.

Here are the slides for the second talk

Special Issue of ACM TOMACS on Monte Carlo Methods in Statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , on December 10, 2012 by xi'an

As posted here a long, long while ago, following a suggestion from the editor (and North America Cycling Champion!) Pierre Lécuyer (Université de Montréal), Arnaud Doucet (University of Oxford) and myself acted as guest editors for a special issue of ACM TOMACS on Monte Carlo Methods in Statistics. (Coincidentally, I am attending a board meeting for TOMACS tonight in Berlin!) The issue is now ready for publication (next February unless I am confused!) and made of the following papers:

* Massive parallelization of serial inference algorithms for a complex generalized linear model
MARC A. SUCHARD, IVAN ZORYCH, PATRICK RYAN, DAVID MADIGAN
*Convergence of a Particle-based Approximation of the Block Online Expectation Maximization Algorithm
SYLVAIN LE CORFF and GERSENDE FORT
* Efficient MCMC for Binomial Logit Models
AGNES FUSSL, SYLVIA FRÜHWIRTH-SCHNATTER, RUDOLF FRÜHWIRTH
* Adaptive Equi-Energy Sampler: Convergence and Illustration
AMANDINE SCHRECK and GERSENDE FORT and ERIC MOULINES
* Particle algorithms for optimization on binary spaces
CHRISTIAN SCHÄFER
* Posterior expectation of regularly paved random histograms
RAAZESH SAINUDIIN, GLORIA TENG, JENNIFER HARLOW, and DOMINIC LEE
* Small variance estimators for rare event probabilities
MICHEL BRONIATOWSKI and VIRGILE CARON
* Self-Avoiding Random Dynamics on Integer Complex Systems
FIRAS HAMZE, ZIYU WANG, and NANDO DE FREITAS
* Bayesian learning of noisy Markov decision processes
SUMEETPAL S. SINGH, NICOLAS CHOPIN, and NICK WHITELEY

Here is the draft of the editorial that will appear at the beginning of this special issue. (All faults are mine, of course!) Continue reading

more typos in Monte Carlo statistical methods

Posted in Books, Statistics, University life with tags , , , , , , , , , on October 28, 2011 by xi'an

Jan Hanning kindly sent me this email about several difficulties with Chapters 3, Monte Carlo Integration, and  5, Monte Carlo Optimization, when teaching out of our book Monte Carlo Statistical Methods [my replies in italics between square brackets, apologies for the late reply and posting, as well as for the confusion thus created. Of course, the additional typos will soon be included in the typo lists on my book webpage.]:

  1. I seem to be unable to reproduce Table 3.3 on page 88 – especially the chi-square column does not look quite right. [No, they definitely are not right: the true χ² quantiles should be 2.70, 3.84, and 6.63, at the levels 0.1, 0.05, and 0.01, respectively. I actually fail to understand how we got this table that wrong...]
  2. The second question  I have is the choice of the U(0,1) in this Example 3.6. It  feels to me that a choice of Beta(23.5,18.5) for p1 and Beta(36.5,5.5) for p2 might give a better representation based on the data we have. Any comments? [I am plainly uncertain about this... Yours is the choice based on the posterior Beta coefficient distributions associated with Jeffreys prior, hence making the best use of the data. I wonder whether or not we should remove this example altogether... It is certainly "better" than the uniform. However, in my opinion, there is no proper choice for the distribution of the pi's because we are mixing there a likelihood-ratio solution with a Bayesian perspective on the predictive distribution of the likelihood-ratio. If anything, this exposes the shortcomings of a classical approach, but it is likely to confuse the students! Anyway, this is a very interesting problem.]
  3. My students discovered that Problem 5.19 has the following typos, copying from their e-mail: “x_x” should be “x_i” [sure!]. There are a few “( )”s missing here and there [yes!]. Most importantly, the likelihood/density seems incorrect. The normalizing constant should be the reciprocal of the one showed in the book [oh dear, indeed, the constant in the exponential density did not get to the denominator...]. As a result, all the formulas would differ except the ones in part (a). [they clearly need to be rewritten, sorry about this mess!]
  4. I am unsure about the if and only if part of the Theorem 5.15 [namely that the likelihood sequence is stationary if and only if the Q function in the E step has reached a stationary point]. It appears to me that a condition for the “if part” is missing [the "only if" part is a direct consequence of Jensen's inequality]. Indeed Theorem 1 of Dempster et al 1977 has an extra condition [note that the original proof for convergence of EM has a flaw, as discussed here]. Am I missing something obvious? [maybe: it seems to me that, once Q reaches a fixed point, the likelihood L does not change... It is thus tautological, not a proof of convergence! But the theorem says a wee more, so this needs investigating. As Jan remarked, there is no symmetry in the Q function...]
  5. Should there be a (n-m) in the last term of formula (5.17)? [yes, indeed!, multiply the last term by (n-m)]
  6. Finally, I am a bit confused about the likelihood in Example 5.22 [which is a capture-recapture model]. Assume that Hij=k [meaning the animal i is in state k at time j]. Do you assume that you observed Xijr [which is the capture indicator for animal i at time j in zone k: it is equal to 1 for at most one k] as a Binomial B(n,pr) even for r≠k? [no, we observe all Xijr's with r≠k equal to zero]  The nature of the problem seems to suggest that the answer is no [for other indices, Xijr is always zero, indeed] If that is the case I do not see where the power on top of (1-pk) in the middle of the page 185 comes from [when the capture indices are zero, they do not contribute to the sum, which explains for this condensed formula. Therefore, I do not think there is anything wrong with this over-parameterised representation of the missing variables.]
  7. In Section 5.3.4, there seems to be a missing minus sign in the approximation formula for the variance [indeed, shame on us for missing the minus in the observed information matrix!]
  8. I could not find the definition of \mathbb{N}^* in Theorem 6.15. Is it all natural numbers or all integers? May be it would help to include it in Appendix B. [Surprising! This is the set of all positive integers, I thought this was a standard math notation...]
  9. In Definition 6.27, you probably want to say covering of A and not X. [Yes, we were already thinking of the next theorem, most likely!]
  10. In Proposition 6.33 -   all x in A instead of all x in X. [Yes, again! As shown in the proof. Even though it also holds for all x in X]

Thanks a ton to Jan and to his UNC students (and apologies for leading them astray with those typos!!!)

Follow

Get every new post delivered to your Inbox.

Join 551 other followers