Archive for variational Bayes methods

variational particle approximations

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , on February 28, 2014 by xi'an

IMG_2515In the plane to Montréal, today, I read this paper by Kulkarni, Saeedi and Gershman, which will be presented at AISTATS. The main idea is to create a mix between particle Monte Carlo and a kind of quasi-Monte Carlo technique (qNC is not mentionned in the paper), using variational inference (and coordinate ascent) to optimise the location and weight of the particles. It is however restricted to cases with finite support (as a product of N latent variables) as in an HMM with a finite state space. There is also something I do not get in the HMM case, which is that the variational approximation to the filtering is contracted sequentially. This means that at time the K highest weight current particles are selected while the past remains unchanged. Is this due to the Markovian nature of the hidden model? (Blame oxygen deprivation, altitude dizziness or travelling stress, then!) I also fail to understand how for filtering, “at each time step, the algorithm selects the K continuations (new variable assignments of the current particle set) that maximize the variational free energy.” Because the weight function to be optimised (eqn (11)) seems to freeze the whole past path of particles… I presume I will find an opportunity while in Reykjavik to discuss those issues with the authors.

David Blei smile in Paris (seminar)

Posted in Statistics, Travel, University life with tags , , , , , , , , on October 30, 2013 by xi'an

Nicolas Chopin just reminded me of a seminar given by David Blei in Paris tomorrow (at 4pm, SMILE seminarINRIA 23 avenue d’Italie, 5th floor, orange room) on Stochastic Variational Inference and Scalable Topic Models, machine learning seminar that I will alas miss, being busy on giving mine at CMU. Here is the abstract:

Probabilistic topic modeling provides a suite of tools for analyzing
large collections of electronic documents.  With a collection as
input, topic modeling algorithms uncover its underlying themes and
decompose its documents according to those themes.  We can use topic
models to explore the thematic structure of a large collection of
documents or to solve a variety of prediction problems about text.

Topic models are based on hierarchical mixed-membership models,
statistical models where each document expresses a set of components
(called topics) with individual per-document proportions. The
computational problem is to condition on a collection of observed
documents and estimate the posterior distribution of the topics and
per-document proportions. In modern data sets, this amounts to
posterior inference with billions of latent variables.

How can we cope with such data?  In this talk I will describe
stochastic variational inference, a general algorithm for
approximating posterior distributions that are conditioned on massive
data sets.  Stochastic inference is easily applied to a large class of
hierarchical models, including time-series models, factor models, and
Bayesian nonparametric models.  I will demonstrate its application to
topic models fit with millions of articles.  Stochastic inference
opens the door to scalable Bayesian computation for modern data

adaptive Metropolis-Hastings sampling using reversible dependent mixture proposals

Posted in Statistics with tags , , , , , on May 23, 2013 by xi'an

In the plane to Birmingham, I was reading this recent arXived paper by Minh-Ngoc Tran, Michael K. Pitt, and Robert Kohn. The adaptive structure of their ACMH algorithm is based upon two parallel Markov chains, the former (called the trial chain) feeding the proposal densities of the later (called the main chain), bypassing the more traditional diminishing adaptation conditions. (Even though convergence actually follows from a minorisation condition.) These proposals are mixtures of t distributions fitted by variational Bayes approximations. Furthermore, the proposals are (a) reversible and (b) mixing local [dependent] and global [independent] components. One nice aspect of the reversibility is that the proposals do not have to be evaluated at each step.

The convergence results in the paper indeed assume a uniform minorisation condition on all proposal densities: although this sounded restrictive at first (but allows for straightforward proofs), I realised this could be implemented by adding a specific component to the mixture as in Corollary 3. (I checked the proof to realise that the minorisation on the proposal extends to the minorisation on the Metropolis-Hastings transition kernel.) A reversible kernel is defined as satisfying the detailed balance condition, which means that a single Gibbs step is reversible even though the Gibbs sampler as a whole is not. If a reversible Markov kernel with stationary distribution ζ is used, the acceptance probability in the Metropolis-Hastings transition is

α(x,z) = min{1,π(z)ζ(x)/π(x)ζ(z)}

(a result I thought was already known). The sweet deal is that the transition kernel involves Dirac masses, but the acceptance probability bypasses the difficulty. The way mixtures of t distributions can be reversible follows from Pitt & Walker (2006) construction, with  ζ  a specific mixture of t distributions. This target is estimated by variational Bayes. The paper further bypasses my classical objection to the use of normal, t or mixtures thereof, distributions:  this modelling assumes a sort of common Euclidean space for all components, which is (a) highly restrictive and (b) very inefficient in terms of acceptance rate. Instead, Tran & al. resort to Metropolis-within-Gibbs by constructing a partition of the components into subgroups.

ASC 2012 (#1)

Posted in Statistics, Travel, University life with tags , , , , , , , , , , on July 11, 2012 by xi'an

This morning I attended Alan Gelfand talk on directional data, i.e. on the torus (0,2π), and found his modeling via wrapped normals (i.e. normal reprojected onto the unit sphere) quite interesting and raising lots of probabilistic questions. For instance, usual moments like mean and variance had no meaning in this space. The variance matrix of the underlying normal, as well of its mean, obviously matter. One thing I am wondering about is how restrictive the normal assumption is. Because of the projection, any random change to the scale of the normal vector does not impact this wrapped normal distribution but there are certainly features that are not covered by this family. For instance, I suspect it can only offer at most two modes over the range (0,2π). And that it cannot be explosive at any point.

The keynote lecture this afternoon was delivered by Roderick Little in a highly entertaining way, about calibrated Bayesian inference in official statistics. For instance, he mentioned the inferential “schizophrenia” in this field due to the between design-based and model-based inferences. Although he did not define what he meant by “calibrated Bayesian” in the most explicit manner, he had this nice list of eight good reasons to be Bayesian (that came close to my own list at the end of the Bayesian Choice):

  1. conceptual simplicity (Bayes is prescriptive, frequentism is not), “having a model is an advantage!”
  2. avoiding ancillarity angst (Bayes conditions on everything)
  3. avoiding confidence cons (confidence is not probability)
  4. nails nuisance parameters (frequentists are either wrong or have a really hard time)
  5. escapes from asymptotia
  6. incorporates prior information and if not weak priors work fine
  7. Bayes is useful (25 of the top 30 cited are statisticians out of which … are Bayesians)
  8. Bayesians go to Valencia! [joke! Actually it should have been Bayesian go MCMskiing!]
  9. Calibrated Bayes gets better frequentists answers

He however insisted that frequentists should be Bayesians and also that Bayesians should be frequentists, hence the calibration qualification.

After an interesting session on Bayesian statistics, with (adaptive or not) mixtures and variational Bayes tools, I actually joined the “young statistician dinner” (without any pretense at being a young statistician, obviously) and had interesting exchanges on a whole variety of topics, esp. as Kerrie Mengersen adopted (reinvented) my dinner table switch strategy (w/o my R simulated annealing code). Until jetlag caught up with me.

tenured research position with ABC skills!

Posted in R, Statistics, Travel, University life with tags , , , , , on February 2, 2012 by xi'an

I just received this announcement for the opening of a (tenured/civil servant) position in the national research institute in biostatistics, genetics, and agronomy, INRA:

Position opening with profile Approximate inference techniques in complex systems

Key activities and required skills: You will develop methodological research in the field of statistical inference for models used in environmental sciences. These inference techniques will account for the complex dependency structure due to the temporal, spatial and evolutionary organisation of the observations, for the heterogeneity of the data and for the existence of unobserved variables or incomplete data. Solid experience in statistical modelling of complex data (graphic models, multi-scale spatio-temporal data) and a strong orientation towards the applications in environment and biology would be appreciated. Skills in approximation techniques (variational inference, ABC techniques) will be welcome.

Contact person: Stéphane Robin (robin [chez] agroparistech [lepoint] fr)

Location:  Versailles-Grignon (Paris)

Deadline: February 25, 2012

Website: INRA offer

This should appeal to (some) readers of the blog, esp. since the offer has no nationality constraint.


Get every new post delivered to your Inbox.

Join 557 other followers