## sampling-importance-resampling is not equivalent to exact sampling [triste SIR]

Posted in Books, Kids, Statistics, University life with tags , , , , , , on December 16, 2019 by xi'an Following an X validated question on the topic, I reassessed a previous impression I had that sampling-importance-resampling (SIR) is equivalent to direct sampling for a given sample size. (As suggested in the above fit between a N(2,½) target and a N(0,1) proposal.)  Indeed, when one produces a sample $x_1,\ldots,x_n \stackrel{\text{i.i.d.}}{\sim} g(x)$

and resamples with replacement from this sample using the importance weights $f(x_1)g(x_1)^{-1},\ldots,f(x_n)g(x_n)^{-1}$

the resulting sample $y_1,\ldots,y_n$

is neither “i.” nor “i.d.” since the resampling step involves a self-normalisation of the weights and hence a global bias in the evaluation of expectations. In particular, if the importance function g is a poor choice for the target f, meaning that the exploration of the whole support is imperfect, if possible (when both supports are equal), a given sample may well fail to reproduce the properties of an iid example ,as shown in the graph below where a Normal density is used for g while f is a Student t⁵ density:  In the continuation of my earlier post on computing evidence, I read a very interesting paper by Merlise Clyde, Joyee Ghosh and Michael Littman, to appear in JCGS. It is called  Bayesian adaptive sampling for variable selection and model averaging. The sound idea at the basis of the paper is that, when one is doing variable selection (i.e. exploring a finite if large state space) in setups where the individual probabilities of the models are known (up to a constant), it is not necessary to return to models that have been already visited. Hence the move to sample models without replacement called BAS (for Bayesian adaptive sampling) in the paper. The first part discusses the way to sample without replacement a state space whose elements and probabilities are defined by a binary tree, i.e. $f(\mathbf{\gamma})=\prod_{j=1}^k \rho_{j|