on control variates

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , on May 27, 2023 by xi'an

A few months ago, I had to write a thesis evaluation of Rémi Leluc’s PhD, which contained several novel Monte Carlo proposals on control variates and importance techniques. For instance, Leluc et al. (Statistics and Computing, 2021) revisits the concept of control variables by adding a perspective of control variable selection using LASSO. This prior selection is relevant since control variables are not necessarily informative about the objective function being integrated and my experience is that the more variables the less reliable the improvement. The remarkable feature of the results is in obtaining explicit and non-asymptotic bounds.

The author obtains a concentration inequality on the error resulting from the use of control variables, under strict assumptions on the variables. The associated numerical experiment illustrates the difficulties of practically implementing these principles due to the number of parameters to calibrate. I found the example of a capture-recapture experiment on ducks (European Dipper) particularly interesting, not only because we had used it in our book but also because it highlights the dependence of estimates on the dominant measure.

Based on a NeurIPS 2022 poster presentation Chapter 3 is devoted to the use of control variables in sequential Monte Carlo, where a sequence of importance functions is constructed based on previous iterations to improve the approximation of the target distribution. Under relatively strong assumptions of importance functions dominating the target distribution (which could generally be achieved by using an increasing fraction of the data in a partial posterior distribution), of sub-Gaussian tails of an intractable distribution’s residual, a concentration inequality is established for the adaptive control variable estimator.

This chapter uses a different family of control variables, based on a Stein operator introduced in Mira et al. (2016). In the case where the target is a mixture in IRd, one of our benchmarks in Cappé et al. (2008), remarkable gains are obtained for relatively high dimensions. While the computational demands of these improvements are not mentioned, the comparison with an MCMC approach (NUTS) based on the same number of particles demonstrates a clear improvement in Bayesian estimation.

Chapter 4 corresponds to a very recent arXival and presents a very original approach to control variate correction by reproducing the interest rate law through an approximation using the closest neighbor (leave-one-out) method. It requires neither control function nor necessarily additional simulation, except for the evaluation of the integral, which is rather remarkable, forming a kind of parallel with the bootstrap. (Any other approximation of the distribution would also be acceptable if available at the same computational cost.) The thesis aims to establish the convergence of the method when integration is performed by a Voronoi tessellation, which leads to an optimal rate of order n-1-2/d for quadratic error (under conditions of integrand regularity). In the alternative where the integral must be evaluated by Monte Carlo, this optimality disappears, unless a massive amount of simulations are used. Numerical illustrations cover SDEs and a Bayesian hierarchical modeling already used in Oates et al. (2017), with massive gain in both cases.

label switching by optimal transport: Wasserstein to the rescue

Posted in Books, Statistics, Travel with tags , , , , , , , , , , , , , , on November 28, 2019 by xi'an

A new arXival by Pierre Monteiller et al. on resolving label switching by optimal transport. To appear in NeurIPS 2019, next month (where I will be, but extra muros, as I have not registered for the conference). Among other things, the paper was inspired from an answer of mine on X validated, presumably a première (and a dernière?!). Rather than picketing [in the likely unpleasant weather ]on the pavement outside the conference centre, here are my raw reactions to the proposal made in the paper. (Usual disclaimer: I was not involved in the review of this paper.)

“Previous methods such as the invariant losses of Celeux et al. (2000) and pivot alignments of Marin et al. (2005) do not identify modes in a principled manner.”

Unprincipled, me?! We did not aim at identifying all modes but only one of them, since the posterior distribution is invariant under reparameterisation. Without any bad feeling (!), I still maintain my position that using a permutation invariant loss function is a most principled and Bayesian approach towards a proper resolution of the issue. Even though figuring out the resulting Bayes estimate may prove tricky.

The paper thus adopts a different approach, towards giving a manageable meaning to the average of the mixture distributions over all permutations, not in a linear Euclidean sense but thanks to a Wasserstein barycentre. Which indeed allows for an averaged mixture density, although a point-by-point estimate that does not require switching to occur at all was already proposed in earlier papers of ours. Including the Bayesian Core. As shown above. What was first unclear to me is how necessary the Wasserstein formalism proves to be in this context. In fact, the major difference with the above picture is that the estimated barycentre is a mixture with the same number of components. Computing time? Bayesian estimate?

Green’s approach to the problem via a point process representation [briefly mentioned on page 6] of the mixture itself, as for instance presented in our mixture analysis handbook, should have been considered. As well as issues about Bayes factors examined in Gelman et al. (2003) and our more recent work with Kate Jeong Eun Lee. Where the practical impossibility of considering all possible permutations is processed by importance sampling.

An idle thought that came to me while reading this paper (in Seoul) was that a more challenging problem would be to face a model invariant under the action of a group with only a subset of known elements of that group. Or simply too many elements in the group. In which case averaging over the orbit would become an issue.

mea culpa!

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on October 9, 2017 by xi'an

An entry about our Bayesian Essentials book on X validated alerted me to a typo in the derivation of the Gaussian posterior..! When deriving the posterior (which was left as an exercise in the Bayesian Core), I just forgot the term expressing the divergence between the prior mean and the sample mean. Mea culpa!!!

a typo that went under the radar

Posted in Books, R, Statistics, University life with tags , , , , , , , on January 25, 2017 by xi'an

A chance occurrence on X validated: a question on an incomprehensible formula for Bayesian model choice: which, most unfortunately!, appeared in Bayesian Essentials with R! Eeech! It looks like one line in our LATEX file got erased and the likelihood part in the denominator altogether vanished. Apologies to all readers confused by this nonsensical formula!

Bayesian Essentials with R [book review]

Posted in Books, R, Statistics, University life with tags , , , , , , , on July 28, 2016 by xi'an

[A review of Bayesian Essentials that appeared in Technometrics two weeks ago, with the first author being rechristened Jean-Michael!]

“Overall this book is a very helpful and useful introduction to Bayesian methods of data analysis. I found the use of R, the code in the book, and the companion R package, bayess, to be helpful to those who want to begin using  Bayesian methods in data analysis. One topic that I would like to see added is the use of Bayesian methods in change point problems, a topic that we found useful in a recent article and which could be added to the time series chapter. Overall this is a solid book and well worth considering by its intended audience.”
David E. BOOTH
Kent State University