## Advances in scalable Bayesian computation [day #1]

Posted in Books, Mountains, pictures, R, Statistics, University life with tags , , , , , , , , , on March 4, 2014 by xi'an

This was the first day of our workshop Advances in Scalable Bayesian Computation and it sounded like the “main” theme was probabilistic programming, in tune with my book review posted this morning. Indeed, both Vikash Mansinghka and Frank Wood gave talks about this concept, Vikash detailing the specifics of a new programming language called Venture and Frank focussing on his state-space version of the above called Anglican. This is a version of the language Church, developed to handle probabilistic models and inference (hence the joke about Anglican, “a Church of England Venture’! But they could have also added that Frank Wood was also the name of a former archbishop of Melbourne..!) I alas had an involuntary doze during Vikash’s talk, which made it harder for me to assess the fundamentals of those ventures, of how they extended beyond a “mere” new software (and of why I would invest in learning a Lisp-based language!).

The other talks of Day #1 were of a more “classical” nature with Pierre Jacob explaining why non-negative unbiased estimators were impossible to provide in general, a paper I posted about a little while ago, and including an objective Bayes example that I found quite interesting. Then Sumeet Singh (no video) presented a joint work with Nicolas Chopin on the uniform ergodicity of the particle Gibbs sampler, a paper that I should have commented here (except that it appeared just prior to The Accident!), with a nice coupling proof. And Maria Lomeli gave us an introduction to the highly general Poisson-Kingman mixture models as random measures, which encompasses all of the previously studied non-parametric random measures, with an MCMC implementation that included a latent variable representation for the alpha-stable process behind the scene, representation that could be (and maybe is) also useful in parametric analyses of alpha-stable processes.

We also had an open discussion in the afternoon that ended up being quite exciting, with a few of us voicing out some problems or questions about existing methods and others making suggestions or contradictions. We are still a wee bit short of considering a collective paper on MCMC under constraints with coherent cross-validated variational Bayes and loss-based pseudo priors, with applications to basketball data” to appear by the end of the week!

Add to this two visits to the Sally Borden Recreation Centre for morning swimming and evening climbing, and it is no wonder I woke up a bit late this morning! Looking forward Day #2!

## the alive particle filter

Posted in Books, Statistics, University life with tags , , , , , , , , on February 14, 2014 by xi'an

As mentioned earlier on the ‘Og, this is a paper written by Ajay Jasra, Anthony Lee, Christopher Yau, and Xiaole Zhang that I missed when it got arXived (as I was also missing my thumb at the time…) The setting is a particle filtering one with a growing product of spaces and constraints on the moves between spaces. The motivating example is one of an ABC algorithm for an HMM where at each time, the simulated (pseudo-)observation is forced to remain within a given distance of the true observation. The (one?) problem with this implementation is that the particle filter may well die out by making only proposals that stand out of the above ball. Based on an idea of François Le Gland and Nadia Oudjane, the authors define the alive filter by imposing a fixed number of moves onto the subset, running a negative binomial number of proposals. By removing the very last accepted particle, they show that the negative binomial experiment allows for an unbiased estimator of the normalising constant(s). Most of the paper is dedicated to the theoretical study of this scheme, with results summarised as (p.2)

1. Time uniform Lp bounds for the particle filter estimates
2. A central limit theorem (CLT) for suitably normalized and centered particle filter estimates
3. An unbiased property of the particle filter estimates
4. The relative variance of the particle filter estimates, assuming N = O(n), is shown to grow linearly in n.

The assumptions necessary to reach those impressive properties are fairly heavy (or “exceptionally strong” in the words of the authors, p.5): the original and constrained transition kernels are both uniformly ergodic, with equivalent coverage of the constrained subsets for all possible values of the particle at the previous step. I also find the proposed implementation of the ABC filtering inadequate for approximating the posterior on the parameters of the (HMM) model. Expecting every realisation of the simulated times series to be close to the corresponding observed value is too hard a constraint. The above results are scaled in terms of the number N of accepted particles but it may well be that the number of generated particles and hence the overall computing time is much larger. In the examples completing the paper, the comparison is run with an earlier ABC sampler based on the very same stepwise proximity to the observed series so the impact of this choice is difficult to assess and, furthermore, the choice of the tolerances ε is difficult to calibrate: is 3, 6, or 12 a small or large value for ε? A last question that I heard from several sources during talks on that topic is why an ABC approach would be required in HMM settings where SMC applies. Given that the ABC reproduces a simulation on the pair latent variables x parameters, there does not seem to be a gain there…

## non-negative unbiased estimators

Posted in Books, Kids, Statistics, University life with tags , , , , , on October 3, 2013 by xi'an

Pierre Jacob and Alexandre Thiéry just arXived a highly pertinent paper on the most debated issue of non-negative unbiased estimators (of positive quantities). If you remember that earlier post of mine, I mentioned the issue in connection with the Russian roulette estimator(s) of Mark Girolami et al. And, as Pierre and Alexandre point out in the paper, there is also a clear and direct connection with the Bernoulli factory problem. And with our Vanilla Rao-Blackwellisation technique (sadly overlooked, once more!).

The first thing I learned from the paper is how to turn a converging sequence into an unbiased estimator. If (En) is this converging sequence, with limit μ, then

$\sum_{n=0}^N (E_n-E_{n-1}) / \mathbb{P}(N\ge n)$

is unbiased..! Amazing. Even though the choice of the distribution of N matters towards getting a finite variance estimator, this transform is simply amazing. (Of course, once one looks at it, one realises it is the “old” trick of turning a series into a sequence and vice-versa. Still…!) And then you can reuse it into getting an unbiased estimator for almost any transform of μ.

The second novel thing in the paper is the characterisation of impossible cases for non-negative unbiased estimators. For instance, if the original sequence has an unbounded support, there cannot be such an estimator. If the support is an half-line, the transform must be monotonous monotonic. If the support is a bounded interval (a,b), then the transform must be bounded from below by a polynomial bound

$\epsilon\,\min\{(x-a)^m,(b-x)^n\}$

(where the extra-parameters obviously relate to the transform). (In this later case, the authors also show how to derive a Bernoulli estimator from the original unbiased estimator.)

## IS² for Bayesian inference

Posted in Statistics, University life with tags , , , , on September 26, 2013 by xi'an

“…the method of Approximate Bayesian Computation (ABC) may be used to estimate unbiasedly an approximation to the likelihood.”

Minh-Ngoc Tran, Marcel Scharth, Michael Pitt and Robert Kohn arXived a paper on using an unbiased estimate of the likelihood in lieu of the genuine thing and still getting convergence to the right thing. While the spirit of the paper is in the same spirit as the fundamental paper of  Andrieu and Roberts (2009, AoS, somewhat surprisingly missing from the reference), comparing the asymptotic efficiency of using an estimate versus using the genuine likelihood, my attention was distracted by the above quote. This is the only sentence (besides the abstract) where ABC is mentioned and I was a bit confused: ABC is used to estimate an approximation to the likelihood, for sure, converging to

$\int_{d(x,x^\text{obs})\le\varepsilon} f(x|\theta)\,\text{d}\theta$

as the number of pseudo-datasets grows to infinity and it is unbiased on this sense, but this is not the reason for using ABC, as the ABC pseudo-likelihood above is the (by)product of the methodology rather than the genuine quantity of interest. Reading the sentence too fast gave me the feeling that ABC did produce an unbiased approximation to the genuine likelihood! Distracted I was, since this is not at all the point of the paper! However, I would be curious to see how it applies to ABC.

The core result is the convergence of an importance sampling estimator using a likelihood estimated by importance sampling (hence the IS², also inspired by SMC²),. The trick in the proof is to turn the computation of  the likelihood estimand into the production of an (unobserved or “implicitly generated”) auxiliary variable and then to rewrite the original estimator as a genuine importance estimator. (This seems to imply the derivation of an independent importance sampling estimator of the likelihood at each iteration, right?) Standard convergence results then follow, except that the asymptotic variance has an extra term. And except that the estimator of the likelihood does not have to converge, i.e. can keep a fixed number of terms and a positive variance. The second part of the paper establishes that using an estimate degrades the asymptotic variance.

## Alésia sunset

Posted in pictures, Running, Statistics, University life, Wines with tags , , , , , , , on July 12, 2013 by xi'an

Mark Girolami came on Monday for a short visit at CREST this week, to discuss further the Russian roulette with Nicolas and I (and evacuate some of my “worries”), exploit the potential links with vanilla Rao-Blackwellisation, and look at other directions of common interest. In the conversation, we spent a while pondering about the “sign problem”, namely the difficulty with signed unbiased estimates of positive normalising constants. Quickly bumping into the impossibility of simulating from a negative density. Not that we had high expectations of solving in a single afternoon an NP hard problem, and one of the major unsolved problems in the physics of many-particle systems… Although Mark had made the “mistake” of picking a Monday for his visit, reducing considerably the potential for wine bars and great restaurants in the area, we undertook to play Russian roulette with sea-shells, at a brasserie in the shadow of Alésia church, without any of us being hit by a bacterial bullet. (Mark then played the Parisian roulette by biking back to the north of Paris and his hotel, again managing to foil the automotive bullet!)