Archive for systematic resampling

multinomial resampling by Metropolis

Posted in Books, Statistics with tags , , , , , on December 28, 2017 by xi'an

A few years ago Lawrence Murray wrote a note on accelerating the resampling stage in particle filters by using a Metropolis step. And GPUs. The notion that Metropolis can be applied in this setting is at first puzzling since exact multinomial sampling is available. And Metropolis requires convergence guarantees. Which Lawrence covers by a Raftery and Lewis assessment, which has severe limitations in general but may well be adequate for this very case, although possibly too conservative in the number of recommended Metropolis iterations. The gain brought by Metropolis is that it does not require summing up all the particle weights, and as a result the gain is real in that Metropolis beats all other approaches (time-wise) when the number of particles is not too large and the heterogeneity of the weighs not too  high. (I did not know of this note until Richard Everitt brought it to my attention.)

resampling methods

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , on December 6, 2017 by xi'an

A paper that was arXived [and that I missed!] last summer is a work on resampling by Mathieu Gerber, Nicolas Chopin (CREST), and Nick Whiteley. Resampling is used to sample from a weighted empirical distribution and to correct for very small weights in a weighted sample that otherwise lead to degeneracy in sequential Monte Carlo (SMC). Since this step is based on random draws, it induces noise (while improving the estimation of the target), reducing this noise is preferable, hence the appeal of replacing plain multinomial sampling with more advanced schemes. The initial motivation is for sequential Monte Carlo where resampling is rife and seemingly compulsory, but this also applies to importance sampling when considering several schemes at once. I remember discussing alternative schemes with Nicolas, then completing his PhD, as well as Olivier Cappé, Randal Douc, and Eric Moulines at the time (circa 2004) we were working on the Hidden Markov book. And getting then a somewhat vague idea as to why systematic resampling failed to converge.

In this paper, Mathieu, Nicolas and Nick show that stratified sampling (where a uniform is generated on every interval of length 1/n) enjoys some form of consistent, while systematic sampling (where the “same” uniform is generated on every interval of length 1/n) does not necessarily enjoy this consistency. There actually exists cases where convergence does not occur. However, a residual version of systematic sampling (where systematic sampling is applied to the residuals of the decimal parts of the n-enlarged weights) is itself consistent.

The paper also studies the surprising feature uncovered by Kitagawa (1996) that stratified sampling applied to an ordered sample brings an error of O(1/n²) between the cdf rather than the usual O(1/n). It took me a while to even understand the distinction between the original and the ordered version (maybe because Nicolas used the empirical cdf during his SAD (Stochastic Algorithm Day!) talk, ecdf that is the same for ordered and initial samples).  And both systematic and deterministic sampling become consistent in this case. The result was shown in dimension one by Kitagawa (1996) but extends to larger dimensions via the magical trick of the Hilbert curve.

parallel adaptive importance sampling

Posted in Statistics with tags , , , , , on August 30, 2016 by xi'an

Following Paul Russell’s talk at MCqMC 2016, I took a look at his recently arXived paper. In the plane to Sydney. The pseudo-code representation of the method is identical to our population Monte Carlo algorithm as is the suggestion to approximate the posterior by a mixture, but one novel aspect is to use Reich’s ensemble transportation at the resampling stage, in order to maximise the correlation between the original and the resampled versions of the particle systems. (As in our later versions of PMC, the authors also use as importance denominator the entire mixture rather than conditioning on the selected last-step particle.)

“The output of the resampling algorithm gives us a set of evenly weighted samples that we believe represents the target distribution well”

I disagree with this statement: Reweighting does not improve the quality of the posterior approximation, since it introduces more variability. If the original sample is found missing in its adequation to the target, so is the resampled one. Worse, by producing a sample with equal weights, this step may give a false impression of adequate representation…

Another unclear point in the pape relates to tuning the parameters of the mixture importance sampler. The paper discusses tuning these parameters during a burn-in stage, referring to “due to the constraints on adaptive MCMC algorithms”, which indeed is only pertinent for MCMC algorithms, since importance sampling can be constantly modified while remaining valid. This was a major point for advocating PMC. I am thus unsure what the authors mean by a burn-in period in such a context. Actually, I am also unsure on how they use effective sample size to select the new value of the importance parameter, e.g., the variance β in a random walk mixture: the effective sample size involves this variance implicitly through the realised sample hence changing β means changing the realised sample… This seems too costly to contemplate so I wonder at the way Figure 4.2 is produced.

“A popular approach for adaptive MCMC algorithms is to view the scaling parameter as a random variable which we can sample during the course of the MCMC iterations.”

While this is indeed an attractive notion [that I played with in the early days of adaptive MCMC, with the short-lived notion of cyber-parameters], I do not think it is of much help in optimising an MCMC algorithm, since the scaling parameter need be optimised, resulting into a time-inhomogeneous target. A more appropriate tool is thus stochastic optimisation à la Robbins-Monro, as exemplified in Andrieu and Moulines (2006). The paper however remains unclear as to how the scales are updated (see e.g. Section 4.2).

“Ideally, we would like to use a resampling algorithm which is not prohibitively costly for moderately or large sized ensembles, which preserves the mean of the samples, and which makes it much harder for the new samples to forget a significant region in the density.”

The paper also misses on the developments of the early 2000’s about more sophisticated resampling steps, especially Paul Fearnhead’s contributions (see also Nicolas Chopin’s thesis). There exist valid resampling methods that require a single uniform (0,1) to be drawn, rather than m. The proposed method has a flavour similar to systematic resampling, but I wonder at the validity of returning values that are averages of earlier simulations, since this modifies their distribution into ones with slimmer tails. (And it is parameterisation dependent.) Producing xi with probability pi is not the same as returning the average of the pixi‘s.

SPA 2015 Oxford [my day #2]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on July 17, 2015 by xi'an

KebleToday I [barely made it on a delayed train from Leaminton Spa to Oxford as I] chaired my invited session at SPA 2015 on advanced MCMC methodology. The three speakers, Randal Douc, Mike Pitt and Matti Vihola, all gave talks related to the pseudo-marginal technique. For instance, Randal gave examples of guaranteed variance improvements by adding randomisation steps in the generation of the rv’s behind the unbiased estimation of the likelihood function. Mike Pitt presented the paper I discussed a little while ago about evaluating the computing performances of pseudo-marginal approximations, with a fairly compelling perspective [I may have missed from the paper] on approximating the distribution on the approximation to the log-likelihood as a normal. Which led me to ponder at the ultimate version where the log-likelihood itself would get directly simulated in an MCMC algorithm bypassing the preliminary simulation of the parameters. Sounds a bit too fantasy-like to be of any use… Matti Vihola also presented recent results with Christophe Andrieu on comparing pseudo-marginal approximations, based on convex ordering properties. They included a domination result on ABC-MCM algorithms, as noted in a recent post. Which made me musing about the overall importance of unbiasedness in the global picture, where all we need are converging approximations, in fine.

discussions on Gerber and Chopin

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , on May 29, 2015 by xi'an

As a coincidence, I received my copy of JRSS Series B with the Read Paper by Mathieu Gerber and Nicolas Chopin on sequential quasi Monte Carlo just as I was preparing an arXival of a few discussions on the paper! Among the [numerous and diverse] discussions, a few were of particular interest to me [I highlighted members of the University of Warwick and of Université Paris-Dauphine to suggest potential biases!]:

  1. Mike Pitt (Warwick), Murray Pollock et al.  (Warwick) and Finke et al. (Warwick) all suggested combining quasi Monte Carlo with pseudomarginal Metropolis-Hastings, pMCMC (Pitt) and Rao-Bklackwellisation (Finke et al.);
  2. Arnaud Doucet pointed out that John Skilling had used the Hilbert (ordering) curve in a 2004 paper;
  3. Chris Oates, Dan Simpson and Mark Girolami (Warwick) suggested combining quasi Monte Carlo with their functional control variate idea;
  4. Richard Everitt wondered about the dimension barrier of d=6 and about possible slice extensions;
  5. Zhijian He and Art Owen pointed out simple solutions to handle a random number of uniforms (for simulating each step in sequential Monte Carlo), namely to start with quasi Monte Carlo and end up with regular Monte Carlo, in an hybrid manner;
  6. Hans Künsch points out the connection with systematic resampling à la Carpenter, Clifford and Fearnhead (1999) and wonders about separating the impact of quasi Monte Carlo between resampling and propagating [which vaguely links to one of my comments];
  7. Pierre L’Ecuyer points out a possible improvement over the Hilbert curve by a preliminary sorting;
  8. Frederik Lindsten and Sumeet Singh propose using ABC to extend the backward smoother to intractable cases [but still with a fixed number of uniforms to use at each step], as well as Mateu and Ryder (Paris-Dauphine) for a more general class of intractable models;
  9. Omiros Papaspiliopoulos wonders at the possibility of a quasi Markov chain with “low discrepancy paths”;
  10. Daniel Rudolf suggest linking the error rate of sequential quasi Monte Carlo with the bounds of Vapnik and Ĉervonenkis (1977).

 The arXiv document also includes the discussions by Julyan Arbel and Igor Prünster (Turino) on the Bayesian nonparametric side of sqMC and by Robin Ryder (Dauphine) on the potential of sqMC for ABC.