## why do we need importance sampling?

Posted in Books, Kids, Statistics with tags , , , , on August 14, 2022 by xi'an

A rather common question about using importance sampling, posted on X validated: why is importance sampling helping in the event the function used in the expectation has restricted support, i.e., is equal to zero with positive probability? Which is a recommendation I make each time I teach about importance sampling, namely that estimating zero is rarely necessary! In my Saturday Night answer, I tried to give some intuition about the gain brought by a correct support for the importance function, carried in the ideal case when the truncated importance function remains available with its normalising constant. But it is unclear this set of explanations managed to reach the OP.

## important Markov chains

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on July 21, 2022 by xi'an

With Charly Andral (PhD, Paris Dauphine), Randal Douc, and Hugo Marival (PhD, Telecom SudParis), we just arXived a paper on importance Markov chains that merges importance sampling and MCMC. An idea already mentioned in Hastings (1970) and even earlier in Fodsick (1963), and later exploited in Liu et al.  (2003) for instance. And somewhat dual of the vanilla Rao-Backwellisation paper Randal and I wrote a (long!) while ago. Given a target π with a dominating measure π⁰≥Mπ, using a Markov kernel to simulate from this dominating measure and subsampling by the importance weight ρ does produce a new Markov chain with the desired target measure as invariant distribution. However, the domination assumption is rather unrealistic and a generic approach can be implemented without it, by defining an extended Markov chain, with the addition of the number N of replicas as the supplementary term… And a transition kernel R(n|x) on N with expectation ρ, which is a minimal(ist) assumption for the validation of the algorithm.. While this initially defines a semi-Markov chain, an extended Markov representation is also feasible, by decreasing N one by one until reaching zero, and this is most helpful in deriving convergence properties for the resulting chain, including a CLT.  While the choice of the kernel R is free, the optimal choice is associated with residual sampling, where only the fractional part of ρ is estimated by a Bernoulli simulation.

## another first

Posted in Statistics with tags , , , , , , , , on July 1, 2022 by xi'an

A question related to the earlier post on the first importance sampling in print, about the fist Markov chain Monte Carlo in print. Again uncovered by Charly, a 1973 Chemical Physics paper by Patey and Valleau, the latter inventing umbrella sampling with Torrie at about the same time. (In a 1972 paper in the same journal with Card, Valleau uses Metropolis Monte Carlo. While Hastings, also at the University of Toronto uses Markov chain sampling.)

## of first importance

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on June 14, 2022 by xi'an

My PhD student Charly Andral came with the question of the birthdate of importance sampling. I was under the impression that it had been created at the same time as the plain Monte Carlo method, being essentially the same thing since

$\int_{\mathfrak X} h(x)f(x)\,\text dx = \int_{\mathfrak X} h(x)\frac{f(x)}{g(x)}g(x)\,\text dx$

hence due to von Neumann or Ulam, but he could not find a reference earlier than a 1949 proceeding publication by Hermann Kahn in a seminar on scientific computation run by IBM. Despite writing a series of Monte Carlo papers in the late 1940’s and 1950’s, Kahn is not well-known in these circles (although mentioned in Fishman’s book), while being popular to some extent for his theorisation of nuclear war escalation and deterence. (I wonder if the concept is developed in some of his earlier 1948 papers. In a 1951 paper with Goertzel, a footnote signals than the approach was called quota sampling in their earlier papers. Charly has actually traced the earliest proposal as being Kahn’s, in a 14 June 1949 RAND preprint, beating Goertzel’s Oak Ridge National Laboratory preprint on quota sampling and importance functions by five days.)

(As a further marginalia, Kahn wrote with T.E. Harris an earlier preprint on Monte Carlo methods in April 1949, the same Harris as in Harris recurrence.)

## rethinking the ESS published!

Posted in Statistics with tags , , , , , , , , on May 3, 2022 by xi'an

Our paper Rethinking the Effective Sample Size, with Victor Elvira (the driving force behind the paper!) and Luca Martino, has now been published in the International Statistical Review! As discussed earlier on this blog, we wanted to re-evaluate the pros and cons of the effective sample size (ESS), as a tool assessing the quality [or lack thereof] of a Monte Carlo approximation. It is particularly exploited in the specific context of importance sampling. Following a 1992 construction by Augustine Kong, his approximation has been widely used in the last 25 years, in part due to its simplicity as a practical rule of thumb. However, we show in this paper that the assumptions made in the derivation of this approximation make it difficult to consider it as a reasonable approximation of the ESS. Note that this reevaluation does not cover the use of ESS for Markov chain Monte Carlo algorithms, although there would also be much to tell about it..!