**L**ast Friday, Guanyang Wang arXived a paper on the use of multi-armed bandits (hence the reference to the three bandits) to handle intractable normalising constants. The bandit compares or mixes Møller et al. (2006) auxiliary variable solution with Murray et al. (2006) exchange algorithm. Which are both special cases of pseudo-marginal MCMC algorithms. In both cases, the auxiliary variables produce an unbiased estimator of the ratio of the constants. Rather than the ratio of two unbiased estimators as in the more standard pseudo-marginal MCMC. The current paper tries to compare the two approaches based on the variance of the ratio estimate, but cannot derive a general ordering. The multi-armed bandit algorithm exploits both estimators of the acceptance ratio to pick the one that is almost the largest, almost because there is a correction for validating the step by detailed balance. The bandit acceptance probability is the maximum [over the methods] of the minimum [over the time directions] of the original acceptance ratio. While this appears to be valid, note that the resulting algorithm implies four times as many auxiliary variates as the original ones, which makes me wonder at the gain when compared with a parallel implementation of these methods, coupled at random times. (The fundamental difficulty of simulating from likelihoods with an unknown normalising constant remains, see p.4.)

## Archive for normalising constant

## bandits for doubly intractable posteriors

Posted in Statistics with tags coupling, coupling from the past, detailed balance, doubly intractable posterior, Les Trois Brigands, multi-armed bandits, normalising constant, pseudo-marginal MCMC, ratio of integrals on April 17, 2019 by xi'an## Gibbs clashes with importance sampling

Posted in pictures, Statistics with tags Amsterdam, cross validated, Gibbs sampling, importance sampling, infinite variance estimators, normalising constant on April 11, 2019 by xi'an**I**n an X validated question, an interesting proposal was made: at each (component-wise) step of a Gibbs sampler, replace simulation from the exact full conditional with simulation from an alternate density and weight the resulting simulation with a term made of a product of (a) the previous weight (b) the ratio of the true conditional over the substitute for the new value and (c) the inverse ratio for the earlier value of the same component. Which does not work for several reasons:

- the reweighting is doomed by its very propagation in that it keeps multiplying ratios of expectation one, which means an almost sure chance of degenerating;
- the weights are computed for a previous value that has not been generated from the same proposal and is anyway already properly weighted;
- due to the change in dimension produced by Gibbs, the actual target is the full conditional, which involves an intractable normalising constant;
- there is no guarantee for the weights to have finite variance, esp. when the proposal has thinner tails than the target.

as can be readily checked by a quick simulation experiment. The funny thing is that a proper importance weight can be constructed when envisioning the sequence of Gibbs steps as a Metropolis proposal (in the dimension of the target).

## Bernoulli race particle filters

Posted in Books, pictures, Statistics, University life with tags auxiliary variable, Bernoulli distribution, Bernoulli factory, intractable constant, Jakob Bernoulli, Monte Carlo approximations, normalising constant, particle filters, University of Oxford on March 27, 2019 by xi'an**S**ebastian Schmon, Arnaud Doucet and George Deligiannidis have recently arXived an AISTATS paper with the above nice title. The motivation for the extension is facing intractable particle weights for state space models, as for instance in discretised diffusions. In most cases, actually, the weight associated with the optimal forward proposal involves an intractable integral which is the predictive of the current observed variate given the past hidden states. And in some cases, there exist unbiased and non-negative estimators of the targets, which can thus be substituted, *volens nolens*, to the original filter. As in many pseudo-marginal derivations, this new algorithm can be interpreted as targeting an augmented distribution that involves the auxiliary random variates behind the unbiased estimators of the particle weights. A worthwhile remark since it allows for the preservation of the original target as in (8) provided the auxiliary random variates are simulated from the right conditionals. (At least ideally as I have no clue when this is feasible.)

“if Bernoulli resampling is per-formed, the variance for any Monte Carlo estimate will be the same as if the true weights were known and one applies standard multinomial resampling.”

The Bernoulli race in the title stands for a version of the Bernoulli factory problem, where an intractable and bounded component of the weight can be turned into a probability, for which a Bernoulli draw is available, hence providing a Multinomial sampling with the intractable weights since replacing the exact probability with an estimate does not modify the Bernoulli distribution, amazingly so! Even with intractable normalising constants in particle filters. The practicality of the approach may however be restricted by the possibility of some intractable terms being very small and requiring many rejections for one acceptance, as the number of attempts is a compound geometric. The intractability may add to the time request the drawback of keeping this feature hidden as well. Or force some premature interruption in the settings of a parallel implementation.

## Bayesian inference with intractable normalizing functions

Posted in Books, Statistics with tags adaptive MCMC methods, American Statistical Association, auxiliary variable, benchmark, doubly intractable problems, importance sampling, Ising model, JASA, MCMC algorithms, noisy MCMC, normalising constant, Russian roulette on December 13, 2018 by xi'an**I**n the latest September issue of JASA I received a few days ago, I spotted a review paper by Jaewoo Park & Murali Haran on intractable normalising constants Z(θ). There have been many proposals for solving this problem as well as several surveys, some conferences and even a book. The current survey focus on MCMC solutions, from auxiliary variable approaches to likelihood approximation algorithms (albeit without ABC entries, even though the 2006 auxiliary variable solutions of Møller et al. et of Murray et al. do simulate pseudo-observations and hence…). This includes the MCMC approximations to auxiliary sampling proposed by Faming Liang and co-authors across several papers. And the paper Yves Atchadé, Nicolas Lartillot and I wrote ten years ago on an adaptive MCMC targeting Z(θ) and using stochastic approximation à la Wang-Landau. Park & Haran stress the relevance of using sufficient statistics in this approach towards fighting computational costs, which makes me wonder if an ABC version could be envisioned. The paper also includes pseudo-marginal techniques like Russian Roulette (once spelled Roullette) and noisy MCMC as proposed in Alquier et al. (2016). These methods are compared on three examples: (1) the Ising model, (2) a social network model, the Florentine business dataset used in our original paper, and a larger one where most methods prove too costly, and (3) an attraction-repulsion point process model. In conclusion, an interesting survey, taking care to spell out the calibration requirements and the theoretical validation, if of course depending on the chosen benchmarks.

## unbiased estimation of log-normalising constants

Posted in Statistics with tags Bayesian model choice, Cross Validation, estimating a constant, leave-one-out calibration, normalising constant, path sampling, sequential Monte Carlo, unbiased estimation on October 16, 2018 by xi'anMaxime Rischard, Pierre Jacob, and Natesh Pillai [warning: both of whom are co-authors and friends of mine!] have just arXived a paper on the use of path sampling (a.k.a., thermodynamic integration) for log-constant unbiased approximation and the resulting consequences on Bayesian model comparison by X validation. If the goal is the estimation of the log of a ratio of two constants, creating an artificial path between the corresponding distributions and looking at the derivative at any point of this path of the log-density produces an unbiased estimator. Meaning that random sampling along the path, corrected by the distribution of the sampling still produces an unbiased estimator. From there the authors derive an unbiased estimator for any X validation objective function, CV(V,T)=-log p(V|T), taking m observations T in and leaving n-m observations T out… The marginal conditional log density in the criterion is indeed estimated by an unbiased path sampler, using a powered conditional likelihood. And unbiased MCMC schemes à la Jacob et al. for simulating unbiased MCMC realisations of the intermediary targets on the path. Tuning it towards an approximately constant cost for all powers.

So in all objectivity and fairness (!!!), I am quite excited by this new proposal within my favourite area! Or rather two areas since it brings together the estimation of constants and an alternative to Bayes factors for Bayesian testing. (Although the paper does not broach upon the calibration of the X validation values.)

## controlled SMC

Posted in Books, pictures, Statistics, University life with tags BiPS, dynamic programming, hidden Markov models, importance sampling, normalising constant, sequential Monte Carlo on December 18, 2017 by xi'an**A**t the end of [last] August, Jeremy Heng, Adrian Bishop†, George Deligiannidis and Arnaud Doucet arXived a paper on controlled sequential Monte Carlo (SMC). That we read today at the BiPs reading group in Paris-Saclay, when I took these notes. The setting is classical SMC, but with a twist in that the proposals at each time iteration are modified by an importance function. (I was quite surprised to discover that this was completely new in that I was under the false impression that it had been tried ages ago!) This importance sampling setting can be interpreted as a change of measures on both the hidden Markov chain and on its observed version. So that the overall normalising constant remains the same. And then being in an importance sampling setting there exists an optimal choice for the importance functions. That results in a zero variance estimated normalising constant, unsurprisingly. And the optimal solution is actually the backward filter familiar to SMC users.

A large part of the paper actually concentrates on figuring out an implementable version of this optimal solution. Using dynamic programming. And projection of each local generator over a simple linear space with Gaussian kernels (aka Gaussian mixtures). Which becomes feasible through the particle systems generated at earlier iterations of said dynamic programming.

The paper is massive, both in terms of theoretical results and of the range of simulations, and we could not get through it within the 90 minutes Sylvain LeCorff spent on presenting it. I can only wonder at this stage how much Rao-Blackwellisation or AMIS could improve the performances of the algorithm. (A point I find quite amazing in Proposition 1 is that the normalising constant Z of the filtering distribution does not change along observations when using the optimal importance function, which translates into the estimates being nearly constant after a few iterations.)