**A** few years ago Lawrence Murray wrote a note on accelerating the resampling stage in particle filters by using a Metropolis step. And GPUs. The notion that Metropolis can be applied in this setting is at first puzzling since exact multinomial sampling is available. And Metropolis requires convergence guarantees. Which Lawrence covers by a Raftery and Lewis assessment, which has severe limitations in general but may well be adequate for this very case, although possibly too conservative in the number of recommended Metropolis iterations. The gain brought by Metropolis is that it does not require summing up all the particle weights, and as a result the gain is real in that Metropolis beats all other approaches (time-wise) when the number of particles is not too large and the heterogeneity of the weighs not too high. (I did not know of this note until Richard Everitt brought it to my attention.)

## Archive for systematic resampling

## multinomial resampling by Metropolis

Posted in Books, Statistics with tags Metropolis-Hastings algorithm, multinomial distribution, particle degeneracy, Raftery and Lewis' number of iterations, stratified resampling, systematic resampling on December 28, 2017 by xi'an## resampling methods

Posted in Books, pictures, Running, Statistics, Travel, University life with tags Book, Clifton, hidden Markov models, Hilbert curve, iterated importance sampling, resampling, sequential Monte Carlo, stratified resampling, systematic resampling, Université Paris Dauphine, University of Bristol on December 6, 2017 by xi'an**A** paper that was arXived [and that I missed!] last summer is a work on resampling by Mathieu Gerber, Nicolas Chopin (CREST), and Nick Whiteley. Resampling is used to sample from a weighted empirical distribution and to correct for very small weights in a weighted sample that otherwise lead to degeneracy in sequential Monte Carlo (SMC). Since this step is based on random draws, it induces noise (while improving the estimation of the target), reducing this noise is preferable, hence the appeal of replacing plain multinomial sampling with more advanced schemes. The initial motivation is for sequential Monte Carlo where resampling is rife and seemingly compulsory, but this also applies to importance sampling when considering several schemes at once. I remember discussing alternative schemes with Nicolas, then completing his PhD, as well as Olivier Cappé, Randal Douc, and Eric Moulines at the time (circa 2004) we were working on the Hidden Markov book. And getting then a somewhat vague idea as to why systematic resampling failed to converge.

In this paper, Mathieu, Nicolas and Nick show that stratified sampling (where a uniform is generated on every interval of length 1/n) enjoys some form of consistent, while systematic sampling (where the “same” uniform is generated on every interval of length 1/n) does not necessarily enjoy this consistency. There actually exists cases where convergence does not occur. However, a residual version of systematic sampling (where systematic sampling is applied to the residuals of the decimal parts of the n-enlarged weights) is itself consistent.

The paper also studies the surprising feature uncovered by Kitagawa (1996) that stratified sampling applied to an ordered sample brings an error of O(1/n²) between the cdf rather than the usual O(1/n). It took me a while to even understand the distinction between the original and the ordered version (maybe because Nicolas used the empirical cdf during his SAD (Stochastic Algorithm Day!) talk, ecdf that is the same for ordered and initial samples). And both systematic and deterministic sampling become consistent in this case. The result was shown in dimension one by Kitagawa (1996) but extends to larger dimensions via the magical trick of the Hilbert curve.

## SPA 2015 Oxford [my day #2]

Posted in pictures, Statistics, Travel, University life with tags British Rail, Keble College, Leamington Spa, Oxford, particle filters, pseudo-marginal MCMC, SPA 2015, systematic resampling, unbiased estimation, University of Oxford on July 17, 2015 by xi'an**T**oday I [barely made it on a delayed train from Leaminton Spa to Oxford as I] chaired my invited session at SPA 2015 on advanced MCMC methodology. The three speakers, Randal Douc, Mike Pitt and Matti Vihola, all gave talks related to the pseudo-marginal technique. For instance, Randal gave examples of guaranteed variance improvements by adding randomisation steps in the generation of the rv’s behind the unbiased estimation of the likelihood function. Mike Pitt presented the paper I discussed a little while ago about evaluating the computing performances of pseudo-marginal approximations, with a fairly compelling perspective [I may have missed from the paper] on approximating the distribution on the approximation to the log-likelihood as a normal. Which led me to ponder at the ultimate version where the log-likelihood itself would get directly simulated in an MCMC algorithm bypassing the preliminary simulation of the parameters. Sounds a bit too fantasy-like to be of any use… Matti Vihola also presented recent results with Christophe Andrieu on comparing pseudo-marginal approximations, based on convex ordering properties. They included a domination result on ABC-MCM algorithms, as noted in a recent post. Which made me musing about the overall importance of unbiasedness in the global picture, where all we need are converging approximations, *in fine*.

## discussions on Gerber and Chopin

Posted in Books, Kids, Statistics, University life with tags ABC, discussion paper, doubly intractable problems, Hilbert, Igor Prünster, Julyan Arbel, Mathieu Gerber, Nicolas Chopin, quasi-Monte Carlo methods, Read paper, Royal Statistical Society, Series B, systematic resampling, Turino, University of Warwick, Vapnik-Chervonenkis on May 29, 2015 by xi'an**A**s a coincidence, I received my copy of JRSS Series B with the Read Paper by Mathieu Gerber and Nicolas Chopin on sequential quasi Monte Carlo just as I was preparing an arXival of a few discussions on the paper! Among the [numerous and diverse] discussions, a few were of particular interest to me *[I highlighted members of the University of Warwick and of Université Paris-Dauphine to suggest potential biases!]*:

- Mike Pitt (Warwick), Murray Pollock et al. (Warwick) and Finke et al. (Warwick) all suggested combining quasi Monte Carlo with pseudomarginal Metropolis-Hastings, pMCMC (Pitt) and Rao-Bklackwellisation (Finke et al.);
- Arnaud Doucet pointed out that John Skilling had used the Hilbert (ordering) curve in a 2004 paper;
- Chris Oates, Dan Simpson and Mark Girolami (Warwick) suggested combining quasi Monte Carlo with their functional control variate idea;
- Richard Everitt wondered about the dimension barrier of d=6 and about possible slice extensions;
- Zhijian He and Art Owen pointed out simple solutions to handle a random number of uniforms (for simulating each step in sequential Monte Carlo), namely to start with quasi Monte Carlo and end up with regular Monte Carlo, in an hybrid manner;
- Hans Künsch points out the connection with systematic resampling à la Carpenter, Clifford and Fearnhead (1999) and wonders about separating the impact of quasi Monte Carlo between resampling and propagating [which vaguely links to one of my comments];
- Pierre L’Ecuyer points out a possible improvement over the Hilbert curve by a preliminary sorting;
- Frederik Lindsten and Sumeet Singh propose using ABC to extend the backward smoother to intractable cases [but still with a fixed number of uniforms to use at each step], as well as Mateu and Ryder (Paris-Dauphine) for a more general class of intractable models;
- Omiros Papaspiliopoulos wonders at the possibility of a quasi Markov chain with “low discrepancy paths”;
- Daniel Rudolf suggest linking the error rate of sequential quasi Monte Carlo with the bounds of Vapnik and Ĉervonenkis (1977).

The arXiv document also includes the discussions by Julyan Arbel and Igor Prünster (Turino) on the Bayesian nonparametric side of sqMC and by Robin Ryder (Dauphine) on the potential of sqMC for ABC.

## Quasi-Monte Carlo sampling

Posted in Books, Kids, Statistics, Travel, University life, Wines with tags CREST, forward-backward formula, JRSSB, London, MCMC, particle learning, quasi-Monte Carlo methods, Rao-Blackwellisation, Read Pap, reproducing kernel Hilbert space, Royal Statistical Society, SMC, systematic resampling on December 10, 2014 by xi'an

“The QMC algorithm forces us to write any simulation as an explicit function of uniform samples.” (p.8)

**A**s posted a few days ago, Mathieu Gerber and Nicolas Chopin will read this afternoon a Paper to the Royal Statistical Society on their sequential quasi-Monte Carlo sampling paper. Here are some comments on the paper that are preliminaries to my written discussion (to be sent before the slightly awkward deadline of *Jan 2, 2015*).

Quasi-Monte Carlo methods are definitely *not* popular within the (mainstream) statistical community, despite regular attempts by respected researchers like Art Owen and Pierre L’Écuyer to induce more use of those methods. It is thus to be hoped that the current attempt will be more successful, it being Read to the Royal Statistical Society being a major step towards a wide diffusion. I am looking forward to the collection of discussions that will result from the incoming afternoon (and bemoan once again having to miss it!).

“It is also the resampling step that makes the introduction of QMC into SMC sampling non-trivial.” (p.3)

At a mathematical level, the fact that randomised low discrepancy sequences produce both unbiased estimators *and* error rates of order

means that randomised quasi-Monte Carlo methods should always be used, instead of regular Monte Carlo methods! So why is it not *always* used?! The difficulty stands [I think] in expressing the Monte Carlo estimators in terms of a *deterministic* function of a *fixed* number of uniforms (and possibly of past simulated values). At least this is why I never attempted at crossing the Rubicon into the quasi-Monte Carlo realm… And maybe also why the step *had to* appear in connection with particle filters, which can be seen as dynamic importance sampling methods and hence enjoy a local iid-ness that relates better to quasi-Monte Carlo integrators than single-chain MCMC algorithms. For instance, each resampling step in a particle filter consists in a repeated multinomial generation, hence should have been turned into quasi-Monte Carlo ages ago. (However, rather than the basic solution drafted in Table 2, lower variance solutions like systematic and residual sampling have been proposed in the particle literature and I wonder if any of these is a special form of quasi-Monte Carlo.) In the present setting, the authors move further and apply quasi-Monte Carlo to the particles themselves. However, they still assume the deterministic transform

which the q-block on which I stumbled each time I contemplated quasi-Monte Carlo… So the fundamental difficulty with the whole proposal is that the generation from the Markov proposal

has to be of the above form. Is the strength of this assumption discussed anywhere in the paper? All baseline distributions there are normal. And in the case it does not easily apply, what would the gain bw in only using the second step (i.e., quasi-Monte Carlo-ing the multinomial simulation from the empirical cdf)? In a sequential setting with unknown parameters θ, the transform is modified each time θ is modified and I wonder at the impact on computing cost if the inverse cdf is not available analytically. And I presume simulating the θ’s cannot benefit from quasi-Monte Carlo improvements.

The paper obviously cannot get into every detail, obviously, but I would also welcome indications on the cost of deriving the Hilbert curve, in particular in connection with the dimension d as it has to separate all of the N particles, and on the stopping rule on m that means only H_{m} is used.

Another question stands with the multiplicity of low discrepancy sequences and their impact on the overall convergence. If Art Owen’s (1997) nested scrambling leads to the best rate, as implied by Theorem 7, why should we ever consider another choice?

In connection with Lemma 1 and the sequential quasi-Monte Carlo approximation of the evidence, I wonder at any possible Rao-Blackwellisation using all proposed moves rather than only those accepted. I mean, from a quasi-Monte Carlo viewpoint, is Rao-Blackwellisation easier and is it of any significant interest?

What are the computing costs and gains for forward and backward sampling? They are not discussed there. I also fail to understand the trick at the end of 4.2.1, using SQMC on a single vector instead of (t+1) of them. Again assuming inverse cdfs are available? Any connection with the Polson et al.’s particle learning literature?

Last questions: what is the (learning) effort for lazy me to move to SQMC? Any hope of stepping outside particle filtering?

## resampling and [GPU] parallelism

Posted in Statistics, University life with tags GPU, particle MCMC, Raftery and Lewis' number of iterations, random number generator, resampling, stratified resampling, systematic resampling on March 13, 2012 by xi'an**I**n a recent note posted on arXiv, Lawrence Murray compares the implementation of resampling schemes for parallel systems like GPUs. Given a system of weighted particles, *(x _{i},ω_{i})*, there are several ways of drawing a sample according to those weights:

- regular
*multinomial resampling*, where each point in the (new) sample is one of the*(x*, with probability_{i},ω_{i})*(x*, meaning there is a uniform generated for each point;_{i},ω_{i}) *stratified resampling*, where the weights are added, divided into equal pieces and a uniform is sampled on each piece, which means that points with large weights are sampled at least once and those with small weights at most once;*systematic resampling*, which is the same as the above except that*the same*uniform is used for each piece,*Metropolis resampling*, where a Markov chain converges to the distribution (*ω*,…,_{1}*ω*on {1,…,P},_{P})

**T**he three first resamplers are common in the particle system literature (incl. Nicolas Chopin’s PhD thesis), but difficult to adapt to GPUs (and I always feel uncomfortable with the fact that systematic uses *a single uniform*!), while the last one is more unusual, but actually well-fitted for a parallel implementation. While Lawrence Murray suggests using Raftery and Lewis’ (1992) assessment of the required number of Metropolis iterations to “achieve convergence”, I would instead suggest taking advantage of the toric nature of the space (as represented above) to run a random walk and wait for the equivalent of a complete cycle. In any case, this is a cool illustration of the new challenges posed by parallel implementations (like the development of proper random generators).