Archive for multiple importance methods

optimal approximations for importance sampling

Posted in Mountains, pictures, Statistics, Travel with tags , , , , , , , , , , , on August 17, 2018 by xi'an

“…building such a zero variance estimator is most of the times not practical…”

As I was checking [while looking at Tofino inlet from my rental window] on optimal importance functions following a question on X validated, I came across this arXived note by Pantaleoni and Heitz, where they suggest using weighted sums of step functions to reach minimum variance. However, the difficulty with probability densities that are step functions is that they necessarily have a compact support, which thus make them unsuitable for targeted integrands with non-compact support. And making the purpose of the note and the derivation of the optimal weights moot. It points out its connection with the reference paper of Veach and Guibas (1995) as well as He and Owen (2014), a follow-up to the other reference paper by Owen and Zhou (2000).

multiple importance sampling

Posted in Books, Statistics, University life with tags , , , , , , , , on November 20, 2015 by xi'an

“Within this unified context, it is possible to interpret that all the MIS algorithms draw samples from a equal-weighted mixture distribution obtained from the set of available proposal pdfs.”

In a very special (important?!) week for importance sampling!, Elvira et al. arXived a paper about generalized multiple importance sampling. The setting is the same as in earlier papers by Veach and Gibas (1995) or Owen and Zhou (2000) [and in our AMIS paper], namely a collection of importance functions and of simulations from those functions. However, there is no adaptivity for the construction of the importance functions and no Markov (MCMC) dependence on the generation of the simulations.

“One of the goals of this paper is to provide the practitioner with solid theoretical results about the superiority of some specific MIS schemes.”

One first part deals with the fact that a random point taken from the conjunction of those samples is distributed from the equiweighted mixture. Which was a fact I had much appreciated when reading Owen and Zhou (2000). From there, the authors discuss the various choices of importance weighting. Meaning the different degrees of Rao-Blackwellisation that can be applied to the sample. As we discovered in our population Monte Carlo research [which is well-referred within this paper], conditioning too much leads to useless adaptivity. Again a sort of epiphany for me, in that a whole family of importance functions could be used for the same target expectation and the very same simulated value: it all depends on the degree of conditioning employed for the construction of the importance function. To get around the annoying fact that self-normalised estimators are never unbiased, the authors borrow Liu’s (2000) notion of proper importance sampling estimators, where the ratio of the expectations is returning the right quantity. (Which amounts to recover the correct normalising constant(s), I believe.) They then introduce five (5!) different possible importance weights that all produce proper estimators. However, those weights correspond to different sampling schemes, so do not apply to the same sample. In other words, they are not recycling weights as in AMIS. And do not cover the adaptive cases where the weights and parameters of the different proposals change along iterations. Unsurprisingly, the smallest variance estimator is the one based on sampling without replacement and an importance weight made of the entire mixture. But this result does not apply for the self-normalised version, whose variance remains intractable.

I find this survey of existing and non-existing multiple importance methods quite relevant and a must-read for my students (and beyond!). My reservations (for reservations there must be!) are that the study stops short of pushing further the optimisation. Indeed, the available importance functions are not equivalent in terms of the target and hence weighting them equally is sub-efficient. The adaptive part of the paper broaches upon this issue but does not conclude.