Archive for variational Bayes methods

tenured research position with ABC skills!

Posted in R, Statistics, Travel, University life with tags , , , , , on February 2, 2012 by xi'an

I just received this announcement for the opening of a (tenured/civil servant) position in the national research institute in biostatistics, genetics, and agronomy, INRA:

Position opening with profile Approximate inference techniques in complex systems

Key activities and required skills: You will develop methodological research in the field of statistical inference for models used in environmental sciences. These inference techniques will account for the complex dependency structure due to the temporal, spatial and evolutionary organisation of the observations, for the heterogeneity of the data and for the existence of unobserved variables or incomplete data. Solid experience in statistical modelling of complex data (graphic models, multi-scale spatio-temporal data) and a strong orientation towards the applications in environment and biology would be appreciated. Skills in approximation techniques (variational inference, ABC techniques) will be welcome.

Contact person: Stéphane Robin (robin [chez] agroparistech [lepoint] fr)

Location:  Versailles-Grignon (Paris)

Deadline: February 25, 2012

Website: INRA offer

This should appeal to (some) readers of the blog, esp. since the offer has no nationality constraint.

reversible jump on HMMs

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , on December 19, 2011 by xi'an

Here is an email I received a few weeks ago about a paper written more than a decade ago in Glasgow with Tobias Rydén and Mike Titterington:

Sorry to bother you. I am a PhD student in economics. Recently, I am very interested in your paper “Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method”. I would like to use your method in estimating some regime-switching economic model. Unfortunately, I am not exactly understand your paper. Hence, I am writing to ask for your help. My questions are:

  1. A split or merge move is determined at the same time or sequentially? If the moves are determined at that same time, then accepting a split move implies that we can not accept a merge move any more in the same sweep. If the moves are determined sequentially, it means that we can accept a split move first, then accept a merge move in the same sweep. [Answer: First interpretation is correct. Except that the type of move is first selected at random, then only the corresponding move is generated and potentially accepted.]
  2.  In the paper, you discuss how to generate new transition probabilities in a split move in details. However, you did not discuss (probably, I am wrong) how to generate probabilities in each new state (series Zt in your paper).  Could you please tell me how to generate the series Zt? [Answer: check eqn (3).]
  3. My economic model is a multiple series (a vector hidden Markov model), will you refer me to some other papers for the vector model? [Answer: If the observed series is multidimensional, the extension is formally straightforward, if potentially prone to slow mixing and low acceptance rates. If the hidden Markov chain is multidimensional, I have not seen a version of reversible jump in this setting. Maybe an extension of the variational methods described in Ghahramani and Jordan would help.]

to which I replied that the questions showed a deep lack of understanding of what reversible jump is and that the PhD student should first check the literature, for instance the great intro paper by Charlie Geyer in Handbook of Markov chain Monte Carlo and then the original papers by Green (1995) and Richardson and Green (1997).

expectation-propagation and ABC

Posted in R, Statistics, University life with tags , , , , , on August 24, 2011 by xi'an

“It seems quite absurd to reject an EP-based approach, if the only alternative is an ABC approach based on summary statistics, which introduces a bias which seems both larger (according to our numerical examples) and more arbitrary, in the sense that in real-world applications one has little intuition and even less mathematical guidance on to why p(θ|s(y)) should be close to p(θ|y) for a given set of summary statistics s.”

Simon Barthelmé and Nicolas Chopin posted a recent arXiv paper on Expectation-Propagation for Summary-Less, Likelihood-Free Inference. They sell expectation-propagation as quick and dirty version of ABC, avoiding the selection of summary statistics by using the constraint

||y_i-y^\star_i||\le \epsilon

on each component of the simulated pseudo-data vector y* being the actual data. Expectation-propagation is a variational technique [Simon and Nicolas are quite fond of!] and it consists in replacing the target with the “closest” member from an exponential family, like the Gaussian distribution. The expectation-propagation approximation is found by including a single “observation” at a time, using the other approximations for the prior, and finding the best Gaussian in this pseudo-model. In addition, expectation-propagation provides an approximation of the evidence. In the “likelihood-free” setting (I do not like this term because we are dealing with a specific well-defined likelihood, we simply cannot compute it!), this means computing empirical mean and empirical variance, one observation at a time, under the above tolerance constraint.

Unless I am confused, the expectation-propagation approximation to the  posterior distribution is a [sequentially updated] Gaussian distribution, which means that it will only be appropriate in cases where the posterior distribution is approximately Gaussian. Since the three examples processed in the paper are of this kind, e.g. the above reproduction, I wonder at the performances of the expectation-propagation method in less smooth cases, such as ridge-like or multimodal posteriors. The authors mention two limitations:  “First, it [EP] assumes a Gaussian prior; and second, it relies on a particular factorisation of the likelihood, which makes it possible to simulate sequentially the datapoints“, but those seem negligible wrt my above comment. I thus remain unconvinced by the concluding sentence quoted above. (The current approach to ABC is to consider p(θ|s(y)) as a target per se, not as an approximation to p(θ|y).) Nonetheless, expectation-propagation constitutes a quick approximation method that can always used as a reference against other approximations.

Computational methods in Bayesian statistics

Posted in Statistics with tags , , , on March 24, 2010 by xi'an

This paper by Tua and Adami was first posted on arXiv last Wednesday, with a corrected version posted today. Despite its very generic title, its focus is quite restricted since it compares variational Bayes with nested sampling on two examples. The description of both methods is fairly standard, even though I find the part on the variational Bayes approximation slightly confusing with a graph presenting “the Evidence” (should be the log-evidence) as the sum of a Kullback-Leibler divergence and a bound, while the log-evidence may be a negative number—and thus cancel tbe appeal of the decomposition… The paper concludes at the higher speed efficiency of the variational Bayes approximation, which is not a major step forward when considering that this is the very reason for using this approximation! The authors use an “Occam factor” without providing a definition, although it sounds like the difference

\log m(x) - \log L(\hat\theta|x) where \hat\theta is the mle,

and it could be computed for both methods despite the authors’ claim (if I understand correctly what “the Likelihood” is). The sentence “when calculating the Evidence the higher Likelihood values are multiplied by smaller weights resulting in a lower Evidence value over all” shows a poor understanding of the nested sampling method, since using a large enough number of particles leads to a proper approxiation of the evidence, as shown for instance in our paper with Nicolas Chopin. Maybe paradoxically, it is interesting to see via this paper how far (numerically) the lower bound provided by the variational Bayes approximation is from the evidence approximated by nested sampling, even though they appear to peak at the same value for the mixture problem in the specific experiment run by the authors.