## improved importance sampling via iterated moment matching

Posted in Statistics with tags , , , , on August 1, 2019 by xi'an

Topi Paananen, Juho Piironen, Paul-Christian Bürkner and Aki Vehtari have recently arXived a work on constructing an adapted importance (sampling) distribution. The beginning is more a review than a new contribution, covering the earlier work by Vehtari, Gelman  and Gabri (2017): estimating the Pareto rate for the importance weight distribution helps in assessing whether or not this distribution allows for a (necessary) second moment. In case it does not (seem to), the authors propose an affine transform of the importance distribution, using the earlier sample to match the first two moments of the distribution. Or of the targeted function. Adaptation that is controlled by the same Pareto rate technique, as in the above picture (from the paper). Predicting a natural objection as to the poor performances of the earlier samples, the paper suggests to use robust estimators of these moments, for instance via Pareto smoothing. It also suggests using multiple importance sampling as a way to regularise and robustify the estimates. While I buy the argument of fitting the target moments to achieve a better fit of the importance sampling, I remain unclear as to why an affine transform would change the (poor) tail behaviour of the importance sampler. Hence why it would apply in full generality. An alternative could consist in finding appropriate Box-Cox transforms, although the difficulty would certainly increase with the dimension.

## yet more questions about Monte Carlo Statistical Methods

Posted in Books, Statistics, University life with tags , , , , , , , , , , on December 8, 2011 by xi'an

As a coincidence, here is the third email I this week about typos in Monte Carlo Statistical Method, from Peng Yu this time. (Which suits me well in terms of posts as  I am currently travelling to Provo, Utah!)

I’m reading the section on importance sampling. But there are a few cases in your book MCSM2 that are not clear to me.

On page 96: “Theorem 3.12 suggests looking for distributions g for which |h|f/g is almost constant with finite variance.”

What is the precise meaning of “almost constant”? If |h|f/g is almost constant, how come its variance is not finite?

“Almost constant” is not a well-defined property, I am afraid. By this sentence on page 96 we meant using densities g that made |h|f/g as little varying as possible while being manageable. Hence the insistence on the finite variance. Of course, the closer |h|f/g is to a constant function the more likely the variance is to be finite.

“It is important to note that although the finite variance constraint is not necessary for the convergence of (3.8) and of (3.11), importance sampling performs quite poorly when (3.12) ….”

It is not obvious to me why when (3.12) importance sampling performs poorly. I might have overlooked some very simple facts. Would you please remind me why it is the case? From the previous discussion in the same section, it seems that h(x) is missing in (3.12). I think that (3.12) should be (please compare with the first equation in section 3.3.2)

$\int h^2(x) f^2(x) / g(x) \text{d}x = + \infty$

The preference for a finite variance of f/g and against (3.12) is that we would like the importance function g to work well for most integrable functions h. Hence a requirement that the importance weight f/g itself behaves well. It guarantees some robustness across the h‘s and also avoids checking for the finite variance (as in your displayed equation) for all functions h that are square-integrable against g, by virtue of the Cauchy-Schwarz inequality.