speed of R, C, &tc.

Posted in R, Running, Statistics, University life with tags , , , , , , , , , on February 3, 2012 by xi'an

My Paris colleague (and fellow-runner) Aurélien Garivier has produced an interesting comparison of 4 (or 6 if you consider scilab and octave as different from matlab) computer languages in terms of speed for producing the MLE in a hidden Markov model, using EM and the Baum-Welch algorithms. His conclusions are that

• matlab is a lot faster than R and python, especially when vectorization is important : this is why the difference is spectacular on filtering/smoothing, not so much on the creation of the sample;
• octave is a good matlab emulator, if no special attention is payed to execution speed…;
• scilab appears as a credible, efficient alternative to matlab;
• still, C is a lot faster; the inefficiency of matlab in loops is well-known, and clearly shown in the creation of the sample.

(In this implementation, R is “only” three times slower than matlab, so this is not so damning…) All the codes are available and you are free to make suggestions to improve the speed of of your favourite language!

Posted in Statistics, Travel, University life with tags , , , , , on April 13, 2011 by xi'an

Today I gave my first lecture in Universidad Autonoma Madrid. Apart from a shaky start due to my new computer not recognising the videoprojector, I covered EM for mixtures in the one and half hour of the course. I obviously finished the day with tapas in a nearby bar, vaguely watching Barcelona playing an improbable team to merge with the other patrons… In the second lecture, I hope to illustrate both EM and Gibbs on a simple mixture likelihood surface.

Correlated Poissons

Posted in Statistics with tags , , on March 2, 2011 by xi'an

A graduate student came to see me the other day with a bivariate Poisson distribution and a question about using EM in this framework. The problem boils down to adding one correlation parameter and an extra term in the likelihood

$(1-\rho)^{n_1}(1+\lambda\rho)^{n_2}(1+\mu\rho)^{n_3}(1-\lambda\mu\rho)^{n_4}\quad 0\le\rho\le\min(1,\frac{1}{\lambda\mu})$

Both terms involving sums are easy to deal with, using latent variables as in mixture models. The subtractions are trickier, as the negative parts cannot appear in a conditional distribution. Even though the problem can be handled by a direct numerical maximisation or by an almost standard Metropolis-within-Gibbs sampler, my suggestion regarding EM per se was to proceed by conditional EM, one parameter at a time. For instance, when considering $\rho$ conditional on both Poisson parameters, depending on whether $\lambda\mu>1$ or not, one can consider either

$(1-\theta/\lambda\mu)^{n_1}(1+\theta/\mu)^{n_2}(1+\theta/\lambda)^{n_3}(1-\theta)^{n_4}\quad0<\theta<1$

and turn

$(1-\theta/\lambda\mu) \text{ into } (1-\theta+\theta\{1-\frac{1}{\lambda\mu}\})$

thus producing a Beta-like target function in $\theta$ after completion, or turn

$(1-\lambda\mu\rho) \text{ into } (1-\rho+\{1-\lambda\mu\}\rho)$

to produce a Beta-like target function in $\rho$ after completion. In the end, this is a rather pedestrian exercise and I am still frustrated at missing the trick to handle the subtractions directly, however it was nonetheless a nice question!

Mixture Estimation and Applications

Posted in Books, Mountains, Statistics, University life with tags , , , , , , , , , , on November 24, 2010 by xi'an

We have now completed the edition of the book ﻿Mixture Estimation and Applications with Kerrie Mengersen and Mike Titterington, made of contributions from participants to  the ICMS workshop on mixtures that took place in Edinburgh last March. Here is the prospective table of contents: Continue reading

Hill climbing

Posted in Statistics, University life with tags , , , on October 7, 2010 by xi'an

Yesterday, I received the following email from Rob Taylor:

Dr. Robert, I’ve made an observation about a variation on the Gibbs sampler that hopefully would interest you enough to want to answer my question. I’ve noticed that if I want to simply estimate the mean of a unimodal posterior density (such as a multivariate Gaussian), I can modify the Gibbs sampler to just sample the MEAN of the full conditionals at each update and get convergence to the true posterior mean in many cases. In other words I’m only sampling the posterior mean instead of sampling the target posterior distribution (or something of that flavor). So my question is: Does modifying the Gibb’s sampler to sample only the mean of the full conditionals (instead of the sampling the distribution) have any supporting theory or prior art? Empirically it seems to work very well, but I don’t know if there’s an argument for why it works.

To which I replied:  What you are implementing is closer to the EM algorithm than to Gibbs sampling. By using the (conditional) mean (or, better, mode) in unimodal conditional posteriors you are using a local maximum in one direction corresponding to the conditioned parameter and by repeating this across all parameters the algorithm increases the corresponding value of the posterior in well-behaved models. So this is a special case of hill climbing algorithm. The theory behind it is however gradient-like rather than Gibbs-like, because by taking the mean at each step you remove the randomness of a Gibbs sampler step and hence its Markovian validation. Simulated annealing would be a stochastic version of this algorithm, using Markov simulation but progressively concentrating the conditional distributions around their mode.