## reaching transcendence for Gaussian mixtures

Posted in Statistics, Books, R with tags , , , , on September 3, 2015 by xi'an

“…likelihood inference is in a fundamental way more complicated than the classical method of moments.”

Carlos Amendola, Mathias Drton, and Bernd Sturmfels arXived a paper this Friday on “maximum likelihood estimates for Gaussian mixtures are transcendental”. By which they mean that trying to solve the five likelihood equations for a two-component Gaussian mixture does not lead to an algebraic function of the data. (When excluding the trivial global maxima spiking at any observation.) This is not highly surprising when considering two observations, 0 and x, from a mixture of N(0,1/2) and N(μ,1/2) because the likelihood equation

$(x-\mu)\exp\{\mu^2\}-x+\mu\exp\{-\mu(2x-\mu)\}=0$

involves both exponential and algebraic terms. While this is not directly impacting (statistical) inference, this result has the computational consequence that the number of critical points ‘and also the maximum number of local maxima, depends on the sample size and increases beyond any bound’, which means that EM faces increasing difficulties in finding a global finite maximum as the sample size increases…

## Bayesian model averaging in astrophysics

Posted in Books, Statistics, University life with tags , , , , , , , , , , on July 29, 2015 by xi'an

[A 2013 post that somewhat got lost in a pile of postponed entries and referee’s reports…]

In this review paper, now published in Statistical Analysis and Data Mining 6, 3 (2013), David Parkinson and Andrew R. Liddle go over the (Bayesian) model selection and model averaging perspectives. Their argument in favour of model averaging is that model selection via Bayes factors may simply be too inconclusive to favour one model and only one model. While this is a correct perspective, this is about it for the theoretical background provided therein. The authors then move to the computational aspects and the first difficulty is their approximation (6) to the evidence

$P(D|M) = E \approx \frac{1}{n} \sum_{i=1}^n L(\theta_i)Pr(\theta_i)\, ,$

where they average the likelihood x prior terms over simulations from the posterior, which does not provide a valid (either unbiased or converging) approximation. They surprisingly fail to account for the huge statistical literature on evidence and Bayes factor approximation, incl. Chen, Shao and Ibrahim (2000). Which covers earlier developments like bridge sampling (Gelman and Meng, 1998).

As often the case in astrophysics, at least since 2007, the authors’ description of nested sampling drifts away from perceiving it as a regular Monte Carlo technique, with the same convergence speed n1/2 as other Monte Carlo techniques and the same dependence on dimension. It is certainly not the only simulation method where the produced “samples, as well as contributing to the evidence integral, can also be used as posterior samples.” The authors then move to “population Monte Carlo [which] is an adaptive form of importance sampling designed to give a good estimate of the evidence”, a particularly restrictive description of a generic adaptive importance sampling method (Cappé et al., 2004). The approximation of the evidence (9) based on PMC also seems invalid:

$E \approx \frac{1}{n} \sum_{i=1}^n \dfrac{L(\theta_i)}{q(\theta_i)}\, ,$

is missing the prior in the numerator. (The switch from θ in Section 3.1 to X in Section 3.4 is  confusing.) Further, the sentence “PMC gives an unbiased estimator of the evidence in a very small number of such iterations” is misleading in that PMC is unbiased at each iteration. Reversible jump is not described at all (the supposedly higher efficiency of this algorithm is far from guaranteed when facing a small number of models, which is the case here, since the moves between models are governed by a random walk and the acceptance probabilities can be quite low).

The second quite unrelated part of the paper covers published applications in astrophysics. Unrelated because the three different methods exposed in the first part are not compared on the same dataset. Model averaging is obviously based on a computational device that explores the posteriors of the different models under comparison (or, rather, averaging), however no recommendation is found in the paper as to efficiently implement the averaging or anything of the kind. In conclusion, I thus find this review somehow anticlimactic.

## Bayesian computation: a summary of the current state, and samples backwards and forwards

Posted in Books, Statistics, University life with tags , , , , , , , , on June 25, 2015 by xi'an

“The Statistics and Computing journal gratefully acknowledges the contributions for this special issue, celebrating 25 years of publication. In the past 25 years, the journal has published innovative, distinguished research by leading scholars and professionals. Papers have been read by thousands of researchers world-wide, demonstrating the global importance of this field. The Statistics and Computing journal looks forward to many more years of exciting research as the field continues to expand.” Mark Girolami, Editor in Chief for The Statistics and Computing journal

Our joint [Peter Green, Krzysztof Łatuszyński, Marcelo Pereyra, and myself] review [open access!] on the important features of Bayesian computation has already appeared in the special 25th anniversary issue of Statistics & Computing! Along with the following papers

which means very good company, indeed! And happy B’day to Statistics & Computing!

## postdoc in the Alps

Posted in Kids, Mountains, Statistics, Travel, University life with tags , , , , , , , , , on May 22, 2015 by xi'an

Post-doctoral Position in Spatial/Computational Statistics (Grenoble, France)

A post-doctoral position is available in Grenoble, France, to work on computational methods for spatial point process models. The candidate will work with Simon Barthelmé (GIPSA-lab, CNRS) and Jean-François Coeurjolly (Univ. Grenoble Alpes, Laboratory Jean Kuntzmann) on extending point process methodology to deal with large datasets involving multiple sources of variation. We will focus on eye movement data, a new and exciting application area for spatial statistics. The work will take place in the context of an interdisciplinary project on eye movement modelling involving psychologists, statisticians and applied mathematicians from three different institutes in Grenoble.

The ideal candidate has a background in spatial or computational statistics or machine learning. Knowledge of R (and in particular the package spatstat) and previous experience with point process models is a definite plus.

The duration of the contract is 12+6 months, starting 01.10.2015 at the earliest. Salary is according to standard CNRS scale (roughly EUR 2k/month).

Grenoble is the largest city in the French Alps, with a very strong science and technology cluster. It is a pleasant place to live, in an exceptional mountain environment.

## I am cold all over…

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on October 29, 2014 by xi'an

An email from one of my Master students who sent his problem sheet (taken from Monte Carlo Statistical Methods) late:

Bonsoir Professeur
Je « suis » votre cours du mercredi dont le formalisme mathématique me fait froid partout
Avec beaucoup de difficulté je vous envoie mes exercices du premier chapitre de votre livre.

which translates as

Good evening Professor,
I “follow” your Wednesday class which mathematical formalism makes me cold all over. With much hardship, I send you the first batch of problems from your book.

I know that winter is coming, but, still, making students shudder from mathematical cold is not my primary goal when teaching Monte Carlo methods!

## future of computational statistics

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , on September 29, 2014 by xi'an

I am currently preparing a survey paper on the present state of computational statistics, reflecting on the massive evolution of the field since my early Monte Carlo simulations on an Apple //e, which would take a few days to return a curve of approximate expected squared error losses… It seems to me that MCMC is attracting more attention nowadays than in the past decade, both because of methodological advances linked with better theoretical tools, as for instance in the handling of stochastic processes, and because of new forays in accelerated computing via parallel and cloud computing, The breadth and quality of talks at MCMski IV is testimony to this. A second trend that is not unrelated to the first one is the development of new and the rehabilitation of older techniques to handle complex models by approximations, witness ABC, Expectation-Propagation, variational Bayes, &tc. With a corollary being an healthy questioning of the models themselves. As illustrated for instance in Chris Holmes’ talk last week. While those simplifications are inevitable when faced with hardly imaginable levels of complexity, I still remain confident about the “inevitability” of turning statistics into an “optimize+penalize” tunnel vision…  A third characteristic is the emergence of new languages and meta-languages intended to handle complexity both of problems and of solutions towards a wider audience of users. STAN obviously comes to mind. And JAGS. But it may be that another scale of language is now required…

If you have any suggestion of novel directions in computational statistics or instead of dead ends, I would be most interested in hearing them! So please do comment or send emails to my gmail address bayesianstatistics

## Bayesian computational tools

Posted in R, Statistics, University life with tags , , , , on June 18, 2013 by xi'an

I just updated my short review on Bayesian computational tools I first wrote in April for the Annual Review of Statistics and Its Applications. The coverage is quite restricted, as I took advantage of two phantom papers I had started a while ago, one with Jean-Michel Marin, on hierarchical Bayes methods and on ABC. (As stressed in the first version, the paper handles missing data, not as a topic, but as a fact!) The running example is the Laplace vs. Gauss model choice problem, first considered in our ABC model choice paper. The referee of the paper was asking for a broader perspective, which makes perfect sense (except that I did not have the time to get that broad). And mentioned a potential missing acknowledgement of priority as Olli’s thesis was using a simple (instead of double) exponential vs. Gauss as its running example. Once again, a plain 25 pages introduction to the topic, not aiming at anything new. The exercise made me ponder whether or not I wanted to engage into it in a near future, with a pessimistic outcome!