## reaching transcendence for Gaussian mixtures

Posted in Books, R, Statistics with tags , , , , on September 3, 2015 by xi'an

“…likelihood inference is in a fundamental way more complicated than the classical method of moments.”

Carlos Amendola, Mathias Drton, and Bernd Sturmfels arXived a paper this Friday on “maximum likelihood estimates for Gaussian mixtures are transcendental”. By which they mean that trying to solve the five likelihood equations for a two-component Gaussian mixture does not lead to an algebraic function of the data. (When excluding the trivial global maxima spiking at any observation.) This is not highly surprising when considering two observations, 0 and x, from a mixture of N(0,1/2) and N(μ,1/2) because the likelihood equation

$(x-\mu)\exp\{\mu^2\}-x+\mu\exp\{-\mu(2x-\mu)\}=0$

involves both exponential and algebraic terms. While this is not directly impacting (statistical) inference, this result has the computational consequence that the number of critical points ‘and also the maximum number of local maxima, depends on the sample size and increases beyond any bound’, which means that EM faces increasing difficulties in finding a global finite maximum as the sample size increases…

## Bangalore workshop [ಬೆಂಗಳೂರು ಕಾರ್ಯಾಗಾರ] and new book

Posted in Books, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , on August 13, 2014 by xi'an

On the last day of the IFCAM workshop in Bangalore, Marc Lavielle from INRIA presented a talk on mixed effects where he illustrated his original computer language Monolix. And mentioned that his CRC Press book on Mixed Effects Models for the Population Approach was out! (Appropriately listed as out on a 14th of July on amazon!) He actually demonstrated the abilities of Monolix live and on diabets data provided by an earlier speaker from Kolkata, which was a perfect way to start initiating a collaboration! Nice cover (which is all I saw from the book at this stage!) that maybe will induce candidates to write a review for CHANCE. Estimation of those mixed effect models relies on stochastic EM algorithms developed by Marc Lavielle and Éric Moulines in the 90’s, as well as MCMC methods.

## speed of R, C, &tc.

Posted in R, Running, Statistics, University life with tags , , , , , , , , , on February 3, 2012 by xi'an

My Paris colleague (and fellow-runner) Aurélien Garivier has produced an interesting comparison of 4 (or 6 if you consider scilab and octave as different from matlab) computer languages in terms of speed for producing the MLE in a hidden Markov model, using EM and the Baum-Welch algorithms. His conclusions are that

• matlab is a lot faster than R and python, especially when vectorization is important : this is why the difference is spectacular on filtering/smoothing, not so much on the creation of the sample;
• octave is a good matlab emulator, if no special attention is payed to execution speed…;
• scilab appears as a credible, efficient alternative to matlab;
• still, C is a lot faster; the inefficiency of matlab in loops is well-known, and clearly shown in the creation of the sample.

(In this implementation, R is “only” three times slower than matlab, so this is not so damning…) All the codes are available and you are free to make suggestions to improve the speed of of your favourite language!

## Mixtures in Madrid (2)

Posted in Statistics, Travel, University life with tags , , , , , on April 13, 2011 by xi'an

Today I gave my first lecture in Universidad Autonoma Madrid. Apart from a shaky start due to my new computer not recognising the videoprojector, I covered EM for mixtures in the one and half hour of the course. I obviously finished the day with tapas in a nearby bar, vaguely watching Barcelona playing an improbable team to merge with the other patrons… In the second lecture, I hope to illustrate both EM and Gibbs on a simple mixture likelihood surface.

## Correlated Poissons

Posted in Statistics with tags , , on March 2, 2011 by xi'an

A graduate student came to see me the other day with a bivariate Poisson distribution and a question about using EM in this framework. The problem boils down to adding one correlation parameter and an extra term in the likelihood

$(1-\rho)^{n_1}(1+\lambda\rho)^{n_2}(1+\mu\rho)^{n_3}(1-\lambda\mu\rho)^{n_4}\quad 0\le\rho\le\min(1,\frac{1}{\lambda\mu})$

Both terms involving sums are easy to deal with, using latent variables as in mixture models. The subtractions are trickier, as the negative parts cannot appear in a conditional distribution. Even though the problem can be handled by a direct numerical maximisation or by an almost standard Metropolis-within-Gibbs sampler, my suggestion regarding EM per se was to proceed by conditional EM, one parameter at a time. For instance, when considering $\rho$ conditional on both Poisson parameters, depending on whether $\lambda\mu>1$ or not, one can consider either

$(1-\theta/\lambda\mu)^{n_1}(1+\theta/\mu)^{n_2}(1+\theta/\lambda)^{n_3}(1-\theta)^{n_4}\quad0<\theta<1$

and turn

$(1-\theta/\lambda\mu) \text{ into } (1-\theta+\theta\{1-\frac{1}{\lambda\mu}\})$

thus producing a Beta-like target function in $\theta$ after completion, or turn

$(1-\lambda\mu\rho) \text{ into } (1-\rho+\{1-\lambda\mu\}\rho)$

to produce a Beta-like target function in $\rho$ after completion. In the end, this is a rather pedestrian exercise and I am still frustrated at missing the trick to handle the subtractions directly, however it was nonetheless a nice question!

## Mixture Estimation and Applications

Posted in Books, Mountains, Statistics, University life with tags , , , , , , , , , , on November 24, 2010 by xi'an

We have now completed the edition of the book ﻿Mixture Estimation and Applications with Kerrie Mengersen and Mike Titterington, made of contributions from participants to  the ICMS workshop on mixtures that took place in Edinburgh last March. Here is the prospective table of contents: Continue reading

## Hill climbing

Posted in Statistics, University life with tags , , , on October 7, 2010 by xi'an

Yesterday, I received the following email from Rob Taylor:

Dr. Robert, I’ve made an observation about a variation on the Gibbs sampler that hopefully would interest you enough to want to answer my question. I’ve noticed that if I want to simply estimate the mean of a unimodal posterior density (such as a multivariate Gaussian), I can modify the Gibbs sampler to just sample the MEAN of the full conditionals at each update and get convergence to the true posterior mean in many cases. In other words I’m only sampling the posterior mean instead of sampling the target posterior distribution (or something of that flavor). So my question is: Does modifying the Gibb’s sampler to sample only the mean of the full conditionals (instead of the sampling the distribution) have any supporting theory or prior art? Empirically it seems to work very well, but I don’t know if there’s an argument for why it works.

To which I replied:  What you are implementing is closer to the EM algorithm than to Gibbs sampling. By using the (conditional) mean (or, better, mode) in unimodal conditional posteriors you are using a local maximum in one direction corresponding to the conditioned parameter and by repeating this across all parameters the algorithm increases the corresponding value of the posterior in well-behaved models. So this is a special case of hill climbing algorithm. The theory behind it is however gradient-like rather than Gibbs-like, because by taking the mean at each step you remove the randomness of a Gibbs sampler step and hence its Markovian validation. Simulated annealing would be a stochastic version of this algorithm, using Markov simulation but progressively concentrating the conditional distributions around their mode.