## Particle learning [rejoinder]

Posted in R, Statistics, University life with tags , , , , , , , , , , on November 10, 2010 by xi'an

Following the posting on arXiv of the Statistical Science paper of Carvalho et al., and the publication by the same authors in Bayesian Analysis of Particle Learning for general mixtures I noticed on Hedibert Lopes’ website his rejoinder to the discussion of his Valencia 9 paper has been posted. Since the discussion involved several points made by members of the CREST statistics lab (and covered the mixture paper as much as the Valencia 9 paper), I was quite eager to read Hedie’s reply. Unsurprisingly, this rejoinder is however unlikely to modify my reservations about particle learning. The following is a detailed examination of the arguments found in the rejoinder but requires a preliminary reading of the above papers as well as our discussion.. Read more »

## On particle learning

Posted in R, Statistics, University life with tags , , on June 5, 2010 by xi'an

In connection with the Valencia 9 meeting that started yesterday, and with Hedie‘s talk there, we have posted on arXiv a set of comments on particle learning. The arXiv paper contains several discussions but they mostly focus on the inevitable degeneracy that accompanies particle systems. When Lopes et al. state that $p(Z^t|y^t)$ is not of interest as the filtered, low dimensional $p(Z_t|y^t)$ is sufficient for inference at time t, they seem to implicitly imply that the restriction of the simulation focus to a low dimensional vector is a way to avoid the degeneracy inherent to all particle filters. The particle learning algorithm therefore relies on an approximation of $p(Z^t|y^t)$ and the fact that this approximation quickly degenerates as t increases means that this approximation impacts the approximation of $p(Z_t|y^t)$. We show that, unless the size of the particle population exponentially increases with t, the sample of $Z_t$‘s will not be distributed as an iid sample from $p(Z_t|y^t)$.

The graph above is an illustration of the degeneracy in the setup of a Poisson mixture with five components and 10,000 observations. The boxplots represent the variation of the evidence approximations based on a particle learning sample and Lopes et al. approximation, on a particle learning sample and Chib’s (1995) approximation, and on an MCMC sample and Chib’s (1995) approximation, for 250 replications. The differences are therefore quite severe when considering this number of observations. (I put the R code on my website for anyone who wants to check if I programmed things wrong.) There is no clear solution to the degeneracy problem, in my opinion, because the increase in the particle size overcoming degeneracy must be particularly high… We will be discussing that this morning.

On Monday, Paul Fearnhead and Benjamin Taylor reposted on arXiv a paper about adaptive SMC. It is as well since I had missed the first posting on Friday. While the method has some similarities with our earlier work on population Monte Carlo methods with Olivier Cappé, Randal Douc, Arnaud Guillin and Jean-Michel Marin, there are quite novel and interesting features in this paper!  First, the paper is firmly set within a sequential setup, as in Chopin (2002, Biometrika) and Del Moral, Doucet and Jasra (2006, JRSS B). This means considering a sequence of targets corresponding to likelihoods with increasing datasets. We mentioned this case as a possible implementation of population Monte Carlo but never truly experimented with this. Fearnhead and Taylor do set their method within this framework, using a sequence of populations (or particle systems) aimed at this moving sequence of targets. The second major difference is that, while they also use a mixture of transition kernels as their proposal (or importance functions) and while they also aim at optimising the parameters of those transitions (parameters that I would like to dub cyberparameters to distinguish them from the parameters of the statistical model), they do not update those cyberparameters in a deterministic way, as we do. On the opposite, they build a non-parametric approximation $\pi_t(h)$ to the distribution of those cyberparameters and simulate from those approximations at each step of the sequential algorithm, using a weight $f(\theta^{(j)}_{t-1},\theta^{(j)}_t)$ that assesses the quality of the move from $\theta^{(j)}_{t-1}$ to  $\theta^{(j)}_{t}$, based on the simulated $h^{(j)}_t$. I like very much this aspect of the paper, in that the cyberparameters are part of the dynamics in the stochastic algorithm, a point I tried to implement since the (never published) controlled MCMC paper with Christophe Andrieu. As we do in our paper now published in Statistics and Computing, they further establish that this method is asymptotically improving the efficiency criterion at each step of the sequential procedure. The paper concludes with an extensive simulation study where Fearnhead and Taylor show that their implementation outperforms random walk with adaptive steps. (I am not very happy with their mixture example in that they resort to an ordering on the means…)