Archive for sampling w/o replacement

sampling-importance-resampling is not equivalent to exact sampling [triste SIR]

Posted in Books, Kids, Statistics, University life with tags , , , , , , on December 16, 2019 by xi'an

Following an X validated question on the topic, I reassessed a previous impression I had that sampling-importance-resampling (SIR) is equivalent to direct sampling for a given sample size. (As suggested in the above fit between a N(2,½) target and a N(0,1) proposal.)  Indeed, when one produces a sample

x_1,\ldots,x_n \stackrel{\text{i.i.d.}}{\sim} g(x)

and resamples with replacement from this sample using the importance weights


the resulting sample


is neither “i.” nor “i.d.” since the resampling step involves a self-normalisation of the weights and hence a global bias in the evaluation of expectations. In particular, if the importance function g is a poor choice for the target f, meaning that the exploration of the whole support is imperfect, if possible (when both supports are equal), a given sample may well fail to reproduce the properties of an iid example ,as shown in the graph below where a Normal density is used for g while f is a Student t⁵ density:

Bayesian adaptive sampling

Posted in R, Statistics with tags , , , , , , , , on December 6, 2010 by xi'an

In the continuation of my earlier post on computing evidence, I read a very interesting paper by Merlise Clyde, Joyee Ghosh and Michael Littman, to appear in JCGS. It is called  Bayesian adaptive sampling for variable selection and model averaging. The sound idea at the basis of the paper is that, when one is doing variable selection (i.e. exploring a finite if large state space) in setups where the individual probabilities of the models are known (up to a constant), it is not necessary to return to models that have been already visited. Hence the move to sample models without replacement called BAS (for Bayesian adaptive sampling) in the paper. The first part discusses the way to sample without replacement a state space whose elements and probabilities are defined by a binary tree, i.e.

f(\mathbf{\gamma})=\prod_{j=1}^k \rho_{j|<j}^{\gamma_j}(1-\rho_{j|<j})^{1-\gamma_j}

(The connection with variable selection is that each level of the tree corresponds to the binary choice between including and excluding one of the variables. The tree thus has 2k endpoints/leaves for k potential variables in the model.) The cost in updating the probabilities is actually in O(k) if k is the number of levels, instead of 2k because most of the branches of the tree are unaffected by setting one final branch to probability zero. The second part deals with the adaptive and approximative issues.

Continue reading