multinomial resampling by Metropolis
A few years ago Lawrence Murray wrote a note on accelerating the resampling stage in particle filters by using a Metropolis step. And GPUs. The notion that Metropolis can be applied in this setting is at first puzzling since exact multinomial sampling is available. And Metropolis requires convergence guarantees. Which Lawrence covers by a Raftery and Lewis assessment, which has severe limitations in general but may well be adequate for this very case, although possibly too conservative in the number of recommended Metropolis iterations. The gain brought by Metropolis is that it does not require summing up all the particle weights, and as a result the gain is real in that Metropolis beats all other approaches (time-wise) when the number of particles is not too large and the heterogeneity of the weighs not too high. (I did not know of this note until Richard Everitt brought it to my attention.)
December 28, 2017 at 11:09 pm
Thanks to my student, Ivis, for bringing it to my attention!
December 28, 2017 at 11:33 am
With Anthony Lee and Pierre Jacob, we extended this note into a paper a few years later, sharpening my naive analysis (courtesy of coauthors) and adding more extensive empirical results: https://doi.org/10.1080/10618600.2015.1062015 https://arxiv.org/abs/1301.4019
The take home message: the bias from numerical issues associated with multinomial resampling for large numbers of particles (from the sum of weights) can greatly exceed the finite iteration bias of Metropolis resampling (which requires no sum of weights, and so has better numerical properties). If using hundreds of thousands of particles or more, and single precision, a resampling algorithm that relies on a sum of weights is probably not doing what’s expected.
This is a common scenario on GPU: 10^5 or more particles (a GPU will happily host millions of threads) and single precision (on a GPU, it’s 2 to 16 times faster than double precision depending on the particular device).
Also, as you say, the Metropolis resampling is often faster on GPU, simply because it better suits the architecture.