## on control variates

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , on May 27, 2023 by xi'an

A few months ago, I had to write a thesis evaluation of Rémi Leluc’s PhD, which contained several novel Monte Carlo proposals on control variates and importance techniques. For instance, Leluc et al. (Statistics and Computing, 2021) revisits the concept of control variables by adding a perspective of control variable selection using LASSO. This prior selection is relevant since control variables are not necessarily informative about the objective function being integrated and my experience is that the more variables the less reliable the improvement. The remarkable feature of the results is in obtaining explicit and non-asymptotic bounds.

The author obtains a concentration inequality on the error resulting from the use of control variables, under strict assumptions on the variables. The associated numerical experiment illustrates the difficulties of practically implementing these principles due to the number of parameters to calibrate. I found the example of a capture-recapture experiment on ducks (European Dipper) particularly interesting, not only because we had used it in our book but also because it highlights the dependence of estimates on the dominant measure.

Based on a NeurIPS 2022 poster presentation Chapter 3 is devoted to the use of control variables in sequential Monte Carlo, where a sequence of importance functions is constructed based on previous iterations to improve the approximation of the target distribution. Under relatively strong assumptions of importance functions dominating the target distribution (which could generally be achieved by using an increasing fraction of the data in a partial posterior distribution), of sub-Gaussian tails of an intractable distribution’s residual, a concentration inequality is established for the adaptive control variable estimator.

This chapter uses a different family of control variables, based on a Stein operator introduced in Mira et al. (2016). In the case where the target is a mixture in IRd, one of our benchmarks in Cappé et al. (2008), remarkable gains are obtained for relatively high dimensions. While the computational demands of these improvements are not mentioned, the comparison with an MCMC approach (NUTS) based on the same number of particles demonstrates a clear improvement in Bayesian estimation.

Chapter 4 corresponds to a very recent arXival and presents a very original approach to control variate correction by reproducing the interest rate law through an approximation using the closest neighbor (leave-one-out) method. It requires neither control function nor necessarily additional simulation, except for the evaluation of the integral, which is rather remarkable, forming a kind of parallel with the bootstrap. (Any other approximation of the distribution would also be acceptable if available at the same computational cost.) The thesis aims to establish the convergence of the method when integration is performed by a Voronoi tessellation, which leads to an optimal rate of order n-1-2/d for quadratic error (under conditions of integrand regularity). In the alternative where the integral must be evaluated by Monte Carlo, this optimality disappears, unless a massive amount of simulations are used. Numerical illustrations cover SDEs and a Bayesian hierarchical modeling already used in Oates et al. (2017), with massive gain in both cases.

## back to a correction of the harmonic mean estimator

Posted in Books, Statistics with tags , , , , , on May 11, 2023 by xi'an

In a 2009 JCGS paper, Peter Lenk proposed a bias correction of the harmonic mean estimator, which is somewhat surprising given that the estimator usually has no variance and hence that its consistency is purely formal, since no speed of convergence can be taken for granted. In particular, the conjugate Normal model serving as a motivation leads to an infinite variance. The author is however blaming the poor behaviour of the harmonic mean estimator on the overly concentrated support of the posterior distribution, despite having no reservation about the original identity (with standard notations)

$m(x)^{-1} = \int \dfrac{\pi(\theta|x)}{f(x|\theta)}\,\text d \theta$

but suggesting the corrected

$m(x)^{-1} = \int_A \dfrac{\pi(\theta|x)}{f(x|\theta)}\,\text d \theta\big/ \Pi(A)$

although this is only true when A is within the support of the posterior. (In which case it connects with our own 2009 correction.) Opting for a set A corresponding to a “simulation support” of the posterior with a very vague meaning, if somewhat connected with the nested sampling starting set.

## why do we need importance sampling?

Posted in Books, Kids, Statistics with tags , , , , on August 14, 2022 by xi'an

A rather common question about using importance sampling, posted on X validated: why is importance sampling helping in the event the function used in the expectation has restricted support, i.e., is equal to zero with positive probability? Which is a recommendation I make each time I teach about importance sampling, namely that estimating zero is rarely necessary! In my Saturday Night answer, I tried to give some intuition about the gain brought by a correct support for the importance function, carried in the ideal case when the truncated importance function remains available with its normalising constant. But it is unclear this set of explanations managed to reach the OP.

## important Markov chains

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on July 21, 2022 by xi'an

With Charly Andral (PhD, Paris Dauphine), Randal Douc, and Hugo Marival (PhD, Telecom SudParis), we just arXived a paper on importance Markov chains that merges importance sampling and MCMC. An idea already mentioned in Hastings (1970) and even earlier in Fodsick (1963), and later exploited in Liu et al.  (2003) for instance. And somewhat dual of the vanilla Rao-Backwellisation paper Randal and I wrote a (long!) while ago. Given a target π with a dominating measure π⁰≥Mπ, using a Markov kernel to simulate from this dominating measure and subsampling by the importance weight ρ does produce a new Markov chain with the desired target measure as invariant distribution. However, the domination assumption is rather unrealistic and a generic approach can be implemented without it, by defining an extended Markov chain, with the addition of the number N of replicas as the supplementary term… And a transition kernel R(n|x) on N with expectation ρ, which is a minimal(ist) assumption for the validation of the algorithm.. While this initially defines a semi-Markov chain, an extended Markov representation is also feasible, by decreasing N one by one until reaching zero, and this is most helpful in deriving convergence properties for the resulting chain, including a CLT.  While the choice of the kernel R is free, the optimal choice is associated with residual sampling, where only the fractional part of ρ is estimated by a Bernoulli simulation.

## another first

Posted in Statistics with tags , , , , , , , , on July 1, 2022 by xi'an

A question related to the earlier post on the first importance sampling in print, about the fist Markov chain Monte Carlo in print. Again uncovered by Charly, a 1973 Chemical Physics paper by Patey and Valleau, the latter inventing umbrella sampling with Torrie at about the same time. (In a 1972 paper in the same journal with Card, Valleau uses Metropolis Monte Carlo. While Hastings, also at the University of Toronto uses Markov chain sampling.)