## checking for finite variance of importance samplers

Posted in R, Statistics, Travel, University life with tags , , , , , , , , on June 11, 2014 by xi'an

Over a welcomed curry yesterday night in Edinburgh I read this 2008 paper by Koopman, Shephard and Creal, testing the assumptions behind importance sampling, which purpose is to check on-line for (in)finite variance in an importance sampler, based on the empirical distribution of the importance weights. To this goal, the authors use the upper tail  of the weights and a limit theorem that provides the limiting distribution as a type of Pareto distribution

$\dfrac{1}{\beta}\left(1+\xi z/\beta \right)^{-1-1/\xi}$

over (0,∞). And then implement a series of asymptotic tests like the likelihood ratio, Wald and score tests to assess whether or not the power ξ of the Pareto distribution is below ½. While there is nothing wrong with this approach, which produces a statistically validated diagnosis, I still wonder at the added value from a practical perspective, as raw graphs of the estimation sequence itself should exhibit similar jumps and a similar lack of stabilisation as the ones seen in the various figures of the paper. Alternatively, a few repeated calls to the importance sampler should disclose the poor convergence properties of the sampler, as in the above graph. Where the blue line indicates the true value of the integral.

## non-negative unbiased estimators

Posted in Books, Kids, Statistics, University life with tags , , , , , on October 3, 2013 by xi'an

Pierre Jacob and Alexandre Thiéry just arXived a highly pertinent paper on the most debated issue of non-negative unbiased estimators (of positive quantities). If you remember that earlier post of mine, I mentioned the issue in connection with the Russian roulette estimator(s) of Mark Girolami et al. And, as Pierre and Alexandre point out in the paper, there is also a clear and direct connection with the Bernoulli factory problem. And with our Vanilla Rao-Blackwellisation technique (sadly overlooked, once more!).

The first thing I learned from the paper is how to turn a converging sequence into an unbiased estimator. If (En) is this converging sequence, with limit μ, then

$\sum_{n=0}^N (E_n-E_{n-1}) / \mathbb{P}(N\ge n)$

is unbiased..! Amazing. Even though the choice of the distribution of N matters towards getting a finite variance estimator, this transform is simply amazing. (Of course, once one looks at it, one realises it is the “old” trick of turning a series into a sequence and vice-versa. Still…!) And then you can reuse it into getting an unbiased estimator for almost any transform of μ.

The second novel thing in the paper is the characterisation of impossible cases for non-negative unbiased estimators. For instance, if the original sequence has an unbounded support, there cannot be such an estimator. If the support is an half-line, the transform must be monotonous monotonic. If the support is a bounded interval (a,b), then the transform must be bounded from below by a polynomial bound

$\epsilon\,\min\{(x-a)^m,(b-x)^n\}$

(where the extra-parameters obviously relate to the transform). (In this later case, the authors also show how to derive a Bernoulli estimator from the original unbiased estimator.)

## IS vs. self-normalised IS

Posted in Books, R, Statistics, University life with tags , , , , on March 12, 2012 by xi'an

I was grading my Master projects this morning and came upon this graph:

which compares the variability of an importance-sampling estimator versus its self-normalised alternative… This is an interesting case in that self-normalisation does considerably degrade the quality of the approximation in that setting. In other cases, self-normalisation may bring a clear improvement. (This reminded me of a recent email from David Einstein complaining about imprecisions in the importance section of Monte Carlo Statistical methods , incl. the fact that self-normalisation was not truly addressing the infinite variance issue. His criticism is appropriate, we should rewrite this section towards more precise statements…)

Maybe this is to be expected. Here is a similar comparison for finite and infinite variance cases:

compar=function(df,N){
y=matrix(rt(df=df,n=N*100),nrow=100)
t=sqrt(abs(y))*dcauchy(y)/dt(y,df=df)
w=dcauchy(y)/dt(y,df=df)
tone=t(apply(t,1,cumsum)/(1:N))
wone=t(apply(t,1,cumsum)/apply(w,1,cumsum))
dim(tone)
ttwo=apply(tone,2,max)
wtwo=apply(wone,2,max)
three=apply(tone,2,min)
whree=apply(wone,2,min)
plot(apply(tone,2,mean),col="white",ylim=c(min(three),max(ttwo)))
if (diff(range(tone[,100]))<diff(range(wone[,100]))){
polygon(c(1:N,N:1),c(whree,rev(wtwo)),col="chocolate")
polygon(c(1:N,N:1),c(three,rev(ttwo)),col="wheat")}
else{
polygon(c(1:N,N:1),c(three,rev(ttwo)),col="chocolate")
polygon(c(1:N,N:1),c(whree,rev(wtwo)),col="wheat")}
}


The outcome is shown above, with an increased variability in the finite variance case (df=.5, left) and a (meaningful?) decrease in the infinite variance case (df=2.5, right).