Archive for self-normalised importance sampling

sampling-importance-resampling is not equivalent to exact sampling [triste SIR]

Posted in Books, Kids, Statistics, University life with tags , , , , , , on December 16, 2019 by xi'an

Following an X validated question on the topic, I reassessed a previous impression I had that sampling-importance-resampling (SIR) is equivalent to direct sampling for a given sample size. (As suggested in the above fit between a N(2,½) target and a N(0,1) proposal.)  Indeed, when one produces a sample

x_1,\ldots,x_n \stackrel{\text{i.i.d.}}{\sim} g(x)

and resamples with replacement from this sample using the importance weights

f(x_1)g(x_1)^{-1},\ldots,f(x_n)g(x_n)^{-1}

the resulting sample

y_1,\ldots,y_n

is neither “i.” nor “i.d.” since the resampling step involves a self-normalisation of the weights and hence a global bias in the evaluation of expectations. In particular, if the importance function g is a poor choice for the target f, meaning that the exploration of the whole support is imperfect, if possible (when both supports are equal), a given sample may well fail to reproduce the properties of an iid example ,as shown in the graph below where a Normal density is used for g while f is a Student t⁵ density:

IS vs. self-normalised IS

Posted in Books, R, Statistics, University life with tags , , , , on March 12, 2012 by xi'an

I was grading my Master projects this morning and came upon this graph:

which compares the variability of an importance-sampling estimator versus its self-normalised alternative… This is an interesting case in that self-normalisation does considerably degrade the quality of the approximation in that setting. In other cases, self-normalisation may bring a clear improvement. (This reminded me of a recent email from David Einstein complaining about imprecisions in the importance section of Monte Carlo Statistical methods , incl. the fact that self-normalisation was not truly addressing the infinite variance issue. His criticism is appropriate, we should rewrite this section towards more precise statements…)

Maybe this is to be expected. Here is a similar comparison for finite and infinite variance cases:

compar=function(df,N){
y=matrix(rt(df=df,n=N*100),nrow=100)
t=sqrt(abs(y))*dcauchy(y)/dt(y,df=df)
w=dcauchy(y)/dt(y,df=df)
tone=t(apply(t,1,cumsum)/(1:N))
wone=t(apply(t,1,cumsum)/apply(w,1,cumsum))
dim(tone)
ttwo=apply(tone,2,max)
wtwo=apply(wone,2,max)
three=apply(tone,2,min)
whree=apply(wone,2,min)
plot(apply(tone,2,mean),col="white",ylim=c(min(three),max(ttwo)))
if (diff(range(tone[,100]))<diff(range(wone[,100]))){
polygon(c(1:N,N:1),c(whree,rev(wtwo)),col="chocolate")
polygon(c(1:N,N:1),c(three,rev(ttwo)),col="wheat")}
else{
polygon(c(1:N,N:1),c(three,rev(ttwo)),col="chocolate")
polygon(c(1:N,N:1),c(whree,rev(wtwo)),col="wheat")}
}

The outcome is shown above, with an increased variability in the finite variance case (df=.5, left) and a (meaningful?) decrease in the infinite variance case (df=2.5, right).