**T**opi Paananen, Juho Piironen, Paul-Christian Bürkner and Aki Vehtari have recently arXived a work on constructing an adapted importance (sampling) distribution. The beginning is more a review than a new contribution, covering the earlier work by Vehtari, Gelman and Gabri (2017): estimating the Pareto rate for the importance weight distribution helps in assessing whether or not this distribution allows for a (necessary) second moment. In case it does not (seem to), the authors propose an affine transform of the importance distribution, using the earlier sample to match the first two moments of the distribution. Or of the targeted function. Adaptation that is controlled by the same Pareto rate technique, as in the above picture (from the paper). Predicting a natural objection as to the poor performances of the earlier samples, the paper suggests to use robust estimators of these moments, for instance via Pareto smoothing. It also suggests using multiple importance sampling as a way to regularise and robustify the estimates. While I buy the argument of fitting the target moments to achieve a better fit of the importance sampling, I remain unclear as to why an affine transform would change the (poor) tail behaviour of the importance sampler. Hence why it would apply in full generality. An alternative could consist in finding appropriate Box-Cox transforms, although the difficulty would certainly increase with the dimension.

## Archive for finite variance

## improved importance sampling via iterated moment matching

Posted in Statistics with tags curse of dimensionality, finite variance, importance sampling, infinite variance estimators, Pareto smoothed importance sampling on August 1, 2019 by xi'an## yet more questions about Monte Carlo Statistical Methods

Posted in Books, Statistics, University life with tags Brigham Young University, Cauchy-Schwarz inequality, finite variance, importance sampling, Monte Carlo Statistical Methods, Provo, simulation, textbook, typos, Utah, variance reduction on December 8, 2011 by xi'an**A**s a coincidence, here is the third email I this week about typos in * Monte Carlo Statistical Method*, from Peng Yu this time. (Which suits me well in terms of posts as I am currently travelling to Provo, Utah!)

I’m reading the section on importance sampling. But there are a fewcases in your book MCSM2 that are not clear to me.

On page 96: “Theorem 3.12 suggests looking for distributions g forwhich |h|f/g is almost constant with finite variance.”

What is the precise meaning of “almost constant”? If |h|f/g is almostconstant, how come its variance is not finite?

“Almost constant” is not a well-defined property, I am afraid. By this sentence on page 96 we meant using densities g that made *|h|f/g* as little varying as possible while being manageable. Hence the insistence on the finite variance. Of course, the closer *|h|f/g* is to a constant function the more likely the variance is to be finite.

“It is importantto note that although the finite variance constraint is not necessary for theconvergence of (3.8) and of (3.11), importance sampling performs quite poorlywhen (3.12) ….”

It is not obvious to me why when (3.12) importance sampling performspoorly. I might have overlooked some very simple facts. Would youplease remind me why it is the case?From the previous discussion in the same section, it seems that h(x) ismissing in (3.12). I think that (3.12) should be (please compare withthe first equation in section 3.3.2)

The preference for a finite variance of *f/g* and against (3.12) is that we would like the importance function *g* to work well for most integrable functions *h*. Hence a requirement that the importance weight *f/g* itself behaves well. It guarantees some robustness across the *h*‘s and also avoids checking for the finite variance (as in your displayed equation) for all functions *h* that are square-integrable against *g*, by virtue of the Cauchy-Schwarz inequality.