Archive for normalising constant

revisiting the balance heuristic

Posted in Statistics with tags , , , , , , , on October 24, 2019 by xi'an

Last August, Felipe Medina-Aguayo (a former student at Warwick) and Richard Everitt (who has now joined Warwick) arXived a paper on multiple importance sampling (for normalising constants) that goes “exploring some improvements and variations of the balance heuristic via a novel extended-space representation of the estimator, leading to straightforward annealing schemes for variance reduction purposes”, with the interesting side remark that Rao-Blackwellisation may prove sub-optimal when there are many terms in the proposal family, in the sense that not every term in the mixture gets sampled. As already noticed by Victor Elvira and co-authors, getting rid of the components that are not used being an improvement without inducing a bias. The paper also notices that the loss due to using sample sizes rather than expected sample sizes is of second order, compared with the variance of the compared estimators. It further relates to a completion or auxiliary perspective that reminds me of the approaches we adopted in the population Monte Carlo papers and in the vanilla Rao-Blackwellisation paper. But it somewhat diverges from this literature when entering a simulated annealing perspective, in that the importance distributions it considers are freely chosen as powers of a generic target. It is quite surprising that, despite the normalising weights being unknown, a simulated annealing approach produces an unbiased estimator of the initial normalising constant. While another surprise therein is that the extended target associated to their balance heuristic does not admit the right density as marginal but preserves the same normalising constant… (This paper will be presented at BayesComp 2020.)

likelihood-free inference by ratio estimation

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on September 9, 2019 by xi'an

“This approach for posterior estimation with generative models mirrors the approach of Gutmann and Hyvärinen (2012) for the estimation of unnormalised models. The main difference is that here we classify between two simulated data sets while Gutmann and Hyvärinen (2012) classified between the observed data and simulated reference data.”

A 2018 arXiv posting by Owen Thomas et al. (including my colleague at Warwick, Rito Dutta, CoI warning!) about estimating the likelihood (and the posterior) when it is intractable. Likelihood-free but not ABC, since the ratio likelihood to marginal is estimated in a non- or semi-parametric (and biased) way. Following Geyer’s 1994 fabulous estimate of an unknown normalising constant via logistic regression, the current paper which I read in preparation for my discussion in the ABC optimal design in Salzburg uses probabilistic classification and an exponential family representation of the ratio. Opposing data from the density and data from the marginal, assuming both can be readily produced. The logistic regression minimizing the asymptotic classification error is the logistic transform of the log-ratio. For a finite (double) sample, this minimization thus leads to an empirical version of the ratio. Or to a smooth version if the log-ratio is represented as a convex combination of summary statistics, turning the approximation into an exponential family,  which is a clever way to buckle the buckle towards ABC notions. And synthetic likelihood. Although with a difference in estimating the exponential family parameters β(θ) by minimizing the classification error, parameters that are indeed conditional on the parameter θ. Actually the paper introduces a further penalisation or regularisation term on those parameters β(θ), which could have been processed by Bayesian Lasso instead. This step is essentially dirving the selection of the summaries, except that it is for each value of the parameter θ, at the expense of a X-validation step. This is quite an original approach, as far as I can tell, but I wonder at the link with more standard density estimation methods, in particular in terms of the precision of the resulting estimate (and the speed of convergence with the sample size, if convergence there is).

bandits for doubly intractable posteriors

Posted in Statistics with tags , , , , , , , , on April 17, 2019 by xi'an

Last Friday, Guanyang Wang arXived a paper on the use of multi-armed bandits (hence the reference to the three bandits) to handle intractable normalising constants. The bandit compares or mixes Møller et al. (2006) auxiliary variable solution with Murray et al. (2006) exchange algorithm. Which are both special cases of pseudo-marginal MCMC algorithms. In both cases, the auxiliary variables produce an unbiased estimator of the ratio of the constants. Rather than the ratio of two unbiased estimators as in the more standard pseudo-marginal MCMC. The current paper tries to compare the two approaches based on the variance of the ratio estimate, but cannot derive a general ordering. The multi-armed bandit algorithm exploits both estimators of the acceptance ratio to pick the one that is almost the largest, almost because there is a correction for validating the step by detailed balance. The bandit acceptance probability is the maximum [over the methods] of the minimum [over the time directions] of the original acceptance ratio. While this appears to be valid, note that the resulting algorithm implies four times as many auxiliary variates as the original ones, which makes me wonder at the gain when compared with a parallel implementation of these methods, coupled at random times. (The fundamental difficulty of simulating from likelihoods with an unknown normalising constant remains, see p.4.)

Gibbs clashes with importance sampling

Posted in pictures, Statistics with tags , , , , , on April 11, 2019 by xi'an

In an X validated question, an interesting proposal was made: at each (component-wise) step of a Gibbs sampler, replace simulation from the exact full conditional with simulation from an alternate density and weight the resulting simulation with a term made of a product of (a) the previous weight (b) the ratio of the true conditional over the substitute for the new value and (c) the inverse ratio for the earlier value of the same component. Which does not work for several reasons:

  1. the reweighting is doomed by its very propagation in that it keeps multiplying ratios of expectation one, which means an almost sure chance of degenerating;
  2. the weights are computed for a previous value that has not been generated from the same proposal and is anyway already properly weighted;
  3. due to the change in dimension produced by Gibbs, the actual target is the full conditional, which involves an intractable normalising constant;
  4. there is no guarantee for the weights to have finite variance, esp. when the proposal has thinner tails than the target.

as can be readily checked by a quick simulation experiment. The funny thing is that a proper importance weight can be constructed when envisioning  the sequence of Gibbs steps as a Metropolis proposal (in the dimension of the target).

Bernoulli race particle filters

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 27, 2019 by xi'an

Sebastian Schmon, Arnaud Doucet and George Deligiannidis have recently arXived an AISTATS paper with the above nice title. The motivation for the extension is facing intractable particle weights for state space models, as for instance in discretised diffusions.  In most cases, actually, the weight associated with the optimal forward proposal involves an intractable integral which is the predictive of the current observed variate given the past hidden states. And in some cases, there exist unbiased and non-negative estimators of the targets,  which can thus be substituted, volens nolens,  to the original filter. As in many pseudo-marginal derivations, this new algorithm can be interpreted as targeting an augmented distribution that involves the auxiliary random variates behind the unbiased estimators of the particle weights. A worthwhile remark since it allows for the preservation of the original target as in (8) provided the auxiliary random variates are simulated from the right conditionals. (At least ideally as I have no clue when this is feasible.)

“if Bernoulli resampling is per-formed, the variance for any Monte Carlo estimate will be the same as if the true weights were known and one applies standard multinomial resampling.”

The Bernoulli race in the title stands for a version of the Bernoulli factory problem, where an intractable and bounded component of the weight can be turned into a probability, for which a Bernoulli draw is available, hence providing a Multinomial sampling with the intractable weights since replacing the exact probability with an estimate does not modify the Bernoulli distribution, amazingly so! Even with intractable normalising constants in particle filters. The practicality of the approach may however be restricted by the possibility of some intractable terms being very small and requiring many rejections for one acceptance, as the number of attempts is a compound geometric. The intractability may add to the time request the drawback of keeping this feature hidden as well. Or force some premature interruption in the settings of a parallel implementation.

Bayesian inference with intractable normalizing functions

Posted in Books, Statistics with tags , , , , , , , , , , , on December 13, 2018 by xi'an

In the latest September issue of JASA I received a few days ago, I spotted a review paper by Jaewoo Park & Murali Haran on intractable normalising constants Z(θ). There have been many proposals for solving this problem as well as several surveys, some conferences and even a book. The current survey focus on MCMC solutions, from auxiliary variable approaches to likelihood approximation algorithms (albeit without ABC entries, even though the 2006 auxiliary variable solutions of Møller et al. et of Murray et al. do simulate pseudo-observations and hence…). This includes the MCMC approximations to auxiliary sampling proposed by Faming Liang and co-authors across several papers. And the paper Yves Atchadé, Nicolas Lartillot and I wrote ten years ago on an adaptive MCMC targeting Z(θ) and using stochastic approximation à la Wang-Landau. Park & Haran stress the relevance of using sufficient statistics in this approach towards fighting computational costs, which makes me wonder if an ABC version could be envisioned.  The paper also includes pseudo-marginal techniques like Russian Roulette (once spelled Roullette) and noisy MCMC as proposed in Alquier et al.  (2016). These methods are compared on three examples: (1) the Ising model, (2) a social network model, the Florentine business dataset used in our original paper, and a larger one where most methods prove too costly, and (3) an attraction-repulsion point process model. In conclusion, an interesting survey, taking care to spell out the calibration requirements and the theoretical validation, if of course depending on the chosen benchmarks.

unbiased estimation of log-normalising constants

Posted in Statistics with tags , , , , , , , on October 16, 2018 by xi'an

Maxime Rischard, Pierre Jacob, and Natesh Pillai [warning: both of whom are co-authors and friends of mine!] have just arXived a paper on the use of path sampling (a.k.a., thermodynamic integration) for log-constant unbiased approximation and the resulting consequences on Bayesian model comparison by X validation. If the goal is the estimation of the log of a ratio of two constants, creating an artificial path between the corresponding distributions and looking at the derivative at any point of this path of the log-density produces an unbiased estimator. Meaning that random sampling along the path, corrected by the distribution of the sampling still produces an unbiased estimator. From there the authors derive an unbiased estimator for any X validation objective function, CV(V,T)=-log p(V|T), taking m observations T in and leaving n-m observations T out… The marginal conditional log density in the criterion is indeed estimated by an unbiased path sampler, using a powered conditional likelihood. And unbiased MCMC schemes à la Jacob et al. for simulating unbiased MCMC realisations of the intermediary targets on the path. Tuning it towards an approximately constant cost for all powers.

So in all objectivity and fairness (!!!), I am quite excited by this new proposal within my favourite area! Or rather two areas since it brings together the estimation of constants and an alternative to Bayes factors for Bayesian testing. (Although the paper does not broach upon the calibration of the X validation values.)