Archive for stratified sampling

stratified MCMC

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , , on December 3, 2020 by xi'an

When working last week with a student, we came across [the slides of a talk at ICERM by Brian van Koten about] a stratified MCMC method whose core idea is to solve a eigenvector equation z’=z’F associated with the masses of “partition” functions Ψ evaluated at the target. (The arXived paper is also available since 2017 but I did not check it in more details.)Although the “partition” functions need to overlap for the matrix not to be diagonal (actually the only case that does not work is when these functions are truly indicator functions). As in other forms of stratified sampling, the practical difficulty is in picking the functions Ψ so that the evaluation of the terms of the matrix F is not overly impacted by the Monte Carlo error. If spending too much time in estimating these terms, there is not a clear gain in switching to stratified sampling, which may be why it is not particularly developed in the MCMC literature….

As an interesting aside, the illustration in this talk comes from the Mexican stamp thickness data I also used in my earlier mixture papers, concerning the 1872 Hidalgo issue that was printed on different qualities of paper. This makes the number k of components somewhat uncertain, although k=3 is sometimes used as a default. Hence a parameter and simulation space of dimension 8, even though the method is used toward approximating the marginal posteriors on the weights λ¹ and λ².

stratified ABC [One World ABC webinar]

Posted in Books, Statistics, University life with tags , , , , , , , , on May 15, 2020 by xi'an

The third episode of the One World ABC seminar (Season 1!) was kindly delivered by Umberto Picchini on Stratified sampling and bootstrapping for ABC which I already if briefly discussed after BayesComp 2020. Which sounds like a million years ago… His introduction on the importance of estimating the likelihood using a kernel, while 600% justified wrt his talk, made the One World ABC seminar sounds almost like groundhog day!  The central argument is in the computational gain brought by simulating a single θ dependent [expensive] dataset followed by [cheaper] bootstrap replicates. Which turns de fact into bootstrapping the summary statistics.

If I understand correctly, the post-stratification approach of Art Owen (2013?, I cannot find the reference) corrects a misrepresentation of mine. Indeed, defining a partition with unknown probability weights seemed to me to annihilate the appeal of stratification, because the Bernoulli variance of the estimated probabilities brought back the same variability as the mother estimator. But with bootstrap, this requires only two simulations, one for the weights and one for the target. And further allows for a larger ABC tolerance in fine. Free lunch?!

The speaker in two weeks (21 May or Ascension Thursday!) is my friend and co-author Gael Martin from Monash University, who will speak on Focused Bayesian prediction, at quite a late time down under..!

delayed acceptance ABC-SMC

Posted in pictures, Statistics, Travel with tags , , , , , , , on December 11, 2017 by xi'an

Last summer, during my vacation on Skye,  Richard Everitt and Paulina Rowińska arXived a paper on delayed acceptance associated with ABC. ArXival that I missed, then! In order to decrease the number of simulations from the likelihood. As in our own delayed acceptance paper (without ABC), a cheap alternative generator is used to first reject the least likely parameters values, before possibly continuing to use a full generator. Also as lazy ABC. The first step of this ABC algorithm requires a cheap generator plus a primary tolerance ε¹ to compare the generation with the data or part of it. This may be followed by a second generation with a second tolerance level ε². The paper applies more specifically ABC-SMC as introduced in Sisson, Fan and Tanaka (2007) and reassessed in our subsequent 2009 Biometrika paper with Mark Beaumont, Jean-Marie Cornuet and Jean-Michel Marin. As well as in the ABC-SMC paper by Pierre Del Moral and Arnaud Doucet.

When looking at the version of the algorithm [Algorithm 2] based on two basic acceptance ABC steps, there are two features I find intriguing: (i) the primary step uses a cheap generator to reject early poor values of the parameter, followed by the second step involving a more expensive and exact generator, but I see no impact of the choice of this cheap generator in the acceptance probability; (ii) this is an SMC algorithm with imposed resampling at each iteration but there is no visible step for creating new weights after the resampling step. In the current presentation, it sounds like the weights do not change from the initial step, except for those turning to zero and the renormalisation transforms. Which makes the (unspecified) stratification of little interest if any. I must therefore miss a point in the implementation!

One puzzling sentence in the appendix is that the resampling algorithm used in the SMC step “ensures that every particle that is alive before resampling is represented in the resampled particles”, which reminds me of an argument [possibly a different one] made already in Sisson, Fan and Tanaka (2007) and that we could not validate in our subsequent paper. For resampling to be correct, a form of multinomial sampling must be implemented, even via variance reduction schemes like stratified or systematic sampling.

conditional sampling

Posted in R, Statistics with tags , , , , on September 5, 2016 by xi'an

An interesting question about stratified sampling came up on X validated last week, namely how to optimise a Monte Carlo estimate based on two subsequent simulations, one, X, from a marginal and one or several Y from the corresponding conditional given X, when the costs of producing those two simulations significantly differ. When looking at the variance of the Monte Carlo estimate, this variance can be minimised in the number of simulations from the marginal under a computing budget. However, when the costs of producing x, y given or (x,y) are about the same, it does not pay to replicate simulations from y given x or x given y, because this increases the randomness of the estimator and induces positive correlation between some terms in the sum. Assuming that the conditional variances are always computable, we could envision an extension to the question where for each new value of a simulated x (or y), a variable number of conditionally simulated y (or x) could be produced. Or even when those conditional variances are unknown but can be replaced with empirical versions.

The illustration comes from a bivariate normal model with correlation, for a rather arbitrary function , but the pattern remains the same, namely that iid simulations of the pair (X,Y invariably leads to a smaller variance of the estimator compared with simulation with a 1:10 (10 Y’s for one X) or 10:1 ratio between x’s and y’s. Depending on the function and the relative variances, the 1:10 or 10:1 schemes may have a similar variability.


targ=function(x,y){ log(x^2*y^4)+x^2*cos(x^2)/y*sin(y^2)}

for (i in 1:N){