Sophie Donnet pointed out to me this arXived paper by Tianxi Li, Elizaveta Levina, and Ji Zhu, on a network resampling strategy for X validation, where I appear as a datapoint rather than as a [direct] citation! Which reminded me of the “where you are the hero” gamebooks with which my kids briefly played, before computer games took over. The model selection method is illustrated on a dataset made of X citations [reduced to 706 authors] in all papers published between 2003 and 2012 in the Annals of Statistics, Biometrika, JASA, and JRSS Series B. With the outcome being the determination of a number of communities, 20, which the authors labelled as they wanted, based on 10 authors with the largest number of citations in the category. As it happens, I appear in the list, within the “mixed (causality + theory + Bayesian)” category (!), along with Jamie Robbins, Paul Fearnhead, Gilles Blanchard, Zhiqiang Tan, Stijn Vansteelandt, Nancy Reid, Jae Kwang Kim, Tyler VanderWeele, and Scott Sisson, which is somewhat mind-boggling in that I am pretty sure I never quoted six of these authors [although I find it hilarious that Jamie appears in the category, given that we almost got into a car crash together, at one of the Valencià meetings!].
Archive for resampling
resampling methods
Posted in Books, pictures, Running, Statistics, Travel, University life with tags Book, Clifton, hidden Markov models, Hilbert curve, iterated importance sampling, resampling, sequential Monte Carlo, stratified resampling, systematic resampling, Université Paris Dauphine, University of Bristol on December 6, 2017 by xi'anA paper that was arXived [and that I missed!] last summer is a work on resampling by Mathieu Gerber, Nicolas Chopin (CREST), and Nick Whiteley. Resampling is used to sample from a weighted empirical distribution and to correct for very small weights in a weighted sample that otherwise lead to degeneracy in sequential Monte Carlo (SMC). Since this step is based on random draws, it induces noise (while improving the estimation of the target), reducing this noise is preferable, hence the appeal of replacing plain multinomial sampling with more advanced schemes. The initial motivation is for sequential Monte Carlo where resampling is rife and seemingly compulsory, but this also applies to importance sampling when considering several schemes at once. I remember discussing alternative schemes with Nicolas, then completing his PhD, as well as Olivier Cappé, Randal Douc, and Eric Moulines at the time (circa 2004) we were working on the Hidden Markov book. And getting then a somewhat vague idea as to why systematic resampling failed to converge.
In this paper, Mathieu, Nicolas and Nick show that stratified sampling (where a uniform is generated on every interval of length 1/n) enjoys some form of consistent, while systematic sampling (where the “same” uniform is generated on every interval of length 1/n) does not necessarily enjoy this consistency. There actually exists cases where convergence does not occur. However, a residual version of systematic sampling (where systematic sampling is applied to the residuals of the decimal parts of the n-enlarged weights) is itself consistent.
The paper also studies the surprising feature uncovered by Kitagawa (1996) that stratified sampling applied to an ordered sample brings an error of O(1/n²) between the cdf rather than the usual O(1/n). It took me a while to even understand the distinction between the original and the ordered version (maybe because Nicolas used the empirical cdf during his SAD (Stochastic Algorithm Day!) talk, ecdf that is the same for ordered and initial samples). And both systematic and deterministic sampling become consistent in this case. The result was shown in dimension one by Kitagawa (1996) but extends to larger dimensions via the magical trick of the Hilbert curve.
Banff workshop [BIRS 12w5105 meeting [#2]]
Posted in Mountains, pictures, Statistics, Travel, University life with tags 12w5105, ABC, adaptive MCMC methods, Banff, Banff National Park, BIRS, MCMC, Mount Rundle, resampling, video on March 21, 2012 by xi'anToday the program of 12w5105 was more on the theoretical side with adaptive MCMC in the morning and ABC in the afternoon. Éric Moulines and Gersende Fort shared a talk on two papers, one on adaptive tempering and the other one on equi-energy sampling, then Nando de Freitas spoke first about Gaussian process approximation for Bayesian optimisation, then about an adaptive Hamiltonian technique called Sardonics. And Jeff Rosenthal concluded the morning with a review of the results ensuring convergence for adaptive MCMC (with a delightful counter-example called Stairways to Heaven that reminded me of an ice climb in Utah!). After my talk, where Scott Sisson made an interesting comment on the difficulty to extend our framework to a large collection of models (since then the summary statistics have to differ), François Perron discussed in highly interesting details several approximation techniques for the Bayesian estimation of copulas and Scott Sisson presented his recent arXiv paper where a rough estimate of the joint posterior is obtained regression-adjustment ABC, and then estimates of each marginal posterior distribution are separately obtained in a lower-dimensional analysis, all this being connected with Bayes linear analysis. (I do not completely get the way summary statistics are selected for each marginal there, which seems to be done by hand. While I understand why using a lower-dimensional statistic helps in improving the approximation of the marginal posteriors and fights the curse of dimensionality, the fact that the joint posterior sample is based on different summary statistics for the different components makes an interesting statistical puzzle. Maybe the copula approach by François in the previous talk could be used at the final stage.) The final talk by Zhiqiang Tan on comparative performances of resampling and subsampling strategies generated a very animated discussion. (All talks being recorded, mine is available as an mp4 video but watch at your own peril!)
resampling and [GPU] parallelism
Posted in Statistics, University life with tags GPU, particle MCMC, Raftery and Lewis' number of iterations, random number generator, resampling, stratified resampling, systematic resampling on March 13, 2012 by xi'anIn a recent note posted on arXiv, Lawrence Murray compares the implementation of resampling schemes for parallel systems like GPUs. Given a system of weighted particles, (xi,ωi), there are several ways of drawing a sample according to those weights:
- regular multinomial resampling, where each point in the (new) sample is one of the (xi,ωi), with probability (xi,ωi), meaning there is a uniform generated for each point;
- stratified resampling, where the weights are added, divided into equal pieces and a uniform is sampled on each piece, which means that points with large weights are sampled at least once and those with small weights at most once;
- systematic resampling, which is the same as the above except that the same uniform is used for each piece,
- Metropolis resampling, where a Markov chain converges to the distribution (ω1,…, ωP) on {1,…,P},
The three first resamplers are common in the particle system literature (incl. Nicolas Chopin’s PhD thesis), but difficult to adapt to GPUs (and I always feel uncomfortable with the fact that systematic uses a single uniform!), while the last one is more unusual, but actually well-fitted for a parallel implementation. While Lawrence Murray suggests using Raftery and Lewis’ (1992) assessment of the required number of Metropolis iterations to “achieve convergence”, I would instead suggest taking advantage of the toric nature of the space (as represented above) to run a random walk and wait for the equivalent of a complete cycle. In any case, this is a cool illustration of the new challenges posed by parallel implementations (like the development of proper random generators).