Archive for normalising constant

Bayesian inference with intractable normalizing functions

Posted in Books, Statistics with tags , , , , , , , , , , , on December 13, 2018 by xi'an

In the latest September issue of JASA I received a few days ago, I spotted a review paper by Jaewoo Park & Murali Haran on intractable normalising constants Z(θ). There have been many proposals for solving this problem as well as several surveys, some conferences and even a book. The current survey focus on MCMC solutions, from auxiliary variable approaches to likelihood approximation algorithms (albeit without ABC entries, even though the 2006 auxiliary variable solutions of Møller et al. et of Murray et al. do simulate pseudo-observations and hence…). This includes the MCMC approximations to auxiliary sampling proposed by Faming Liang and co-authors across several papers. And the paper Yves Atchadé, Nicolas Lartillot and I wrote ten years ago on an adaptive MCMC targeting Z(θ) and using stochastic approximation à la Wang-Landau. Park & Haran stress the relevance of using sufficient statistics in this approach towards fighting computational costs, which makes me wonder if an ABC version could be envisioned.  The paper also includes pseudo-marginal techniques like Russian Roulette (once spelled Roullette) and noisy MCMC as proposed in Alquier et al.  (2016). These methods are compared on three examples: (1) the Ising model, (2) a social network model, the Florentine business dataset used in our original paper, and a larger one where most methods prove too costly, and (3) an attraction-repulsion point process model. In conclusion, an interesting survey, taking care to spell out the calibration requirements and the theoretical validation, if of course depending on the chosen benchmarks.

unbiased estimation of log-normalising constants

Posted in Statistics with tags , , , , , , , on October 16, 2018 by xi'an

Maxime Rischard, Pierre Jacob, and Natesh Pillai [warning: both of whom are co-authors and friends of mine!] have just arXived a paper on the use of path sampling (a.k.a., thermodynamic integration) for log-constant unbiased approximation and the resulting consequences on Bayesian model comparison by X validation. If the goal is the estimation of the log of a ratio of two constants, creating an artificial path between the corresponding distributions and looking at the derivative at any point of this path of the log-density produces an unbiased estimator. Meaning that random sampling along the path, corrected by the distribution of the sampling still produces an unbiased estimator. From there the authors derive an unbiased estimator for any X validation objective function, CV(V,T)=-log p(V|T), taking m observations T in and leaving n-m observations T out… The marginal conditional log density in the criterion is indeed estimated by an unbiased path sampler, using a powered conditional likelihood. And unbiased MCMC schemes à la Jacob et al. for simulating unbiased MCMC realisations of the intermediary targets on the path. Tuning it towards an approximately constant cost for all powers.

So in all objectivity and fairness (!!!), I am quite excited by this new proposal within my favourite area! Or rather two areas since it brings together the estimation of constants and an alternative to Bayes factors for Bayesian testing. (Although the paper does not broach upon the calibration of the X validation values.)

controlled SMC

Posted in Books, pictures, Statistics, University life with tags , , , , , on December 18, 2017 by xi'an

At the end of [last] August, Jeremy Heng, Adrian Bishop†, George Deligiannidis and Arnaud Doucet arXived a paper on controlled sequential Monte Carlo (SMC). That we read today at the BiPs reading group in Paris-Saclay, when I took these notes. The setting is classical SMC, but with a twist in that the proposals at each time iteration are modified by an importance function. (I was quite surprised to discover that this was completely new in that I was under the false impression that it had been tried ages ago!) This importance sampling setting can be interpreted as a change of measures on both the hidden Markov chain and on its observed version. So that the overall normalising constant remains the same. And then being in an importance sampling setting there exists an optimal choice for the importance functions. That results in a zero variance estimated normalising constant, unsurprisingly. And the optimal solution is actually the backward filter familiar to SMC users.

A large part of the paper actually concentrates on figuring out an implementable version of this optimal solution. Using dynamic programming. And projection of each local generator over a simple linear space with Gaussian kernels (aka Gaussian mixtures). Which becomes feasible through the particle systems generated at earlier iterations of said dynamic programming.

The paper is massive, both in terms of theoretical results and of the range of simulations, and we could not get through it within the 90 minutes Sylvain LeCorff spent on presenting it. I can only wonder at this stage how much Rao-Blackwellisation or AMIS could improve the performances of the algorithm. (A point I find quite amazing in Proposition 1 is that the normalising constant Z of the filtering distribution does not change along observations when using the optimal importance function, which translates into the estimates being nearly constant after a few iterations.)

probabilities larger than one…

Posted in Statistics with tags , , , , , , on November 9, 2017 by xi'an

slice sampling for Dirichlet mixture process

Posted in Books, Statistics, University life with tags , , , , , , , on June 21, 2017 by xi'an

When working with my PhD student Changye in Dauphine this morning I realised that slice sampling also applies to discrete support distributions and could even be of use in such settings. That it works is (now) straightforward in that the missing variable representation behind the slice sampler also applies to densities defined with respect to a discrete measure. That this is useful transpires from the short paper of Stephen Walker (2007) where we saw this, as Stephen relies on the slice sampler to sample from the Dirichlet mixture model by eliminating the tail problem associated with this distribution. (This paper appeared in Communications in Statistics and it is through Pati & Dunson (2014) taking advantage of this trick that Changye found about its very existence. I may have known about it in an earlier life, but I had clearly forgotten everything!)

While the prior distribution (of the weights) of the Dirichlet mixture process is easy to generate via the stick breaking representation, the posterior distribution is trickier as the weights are multiplied by the values of the sampling distribution (likelihood) at the corresponding parameter values and they cannot be normalised. Introducing a uniform to replace all weights in the mixture with an indicator that the uniform is less than those weights corresponds to a (latent variable) completion [or a demarginalisation as we called this trick in Monte Carlo Statistical Methods]. As elaborated in the paper, the Gibbs steps corresponding to this completion are easy to implement, involving only a finite number of components. Meaning the allocation to a component of the mixture can be operated rather efficiently. Or not when considering that the weights in the Dirichlet mixture are not monotone, hence that a large number of them may need to be computed before picking the next index in the mixture when the uniform draw happens to be quite small.

Posted in Books, Kids, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on April 12, 2017 by xi'an

The reason for my short visit to Berlin last week was an OxWaSP (Oxford and Warwick Statistics Program) workshop hosted by Amazon Berlin with talks between statistics and machine learning, plus posters from our second year students. While the workshop was quite intense, I enjoyed very much the atmosphere and the variety of talks there. (Just sorry that I left too early to enjoy the social programme at a local brewery, Brauhaus Lemke, and the natural history museum. But still managed nice runs east and west!) One thing I found most interesting (if obvious in retrospect) was the different focus of academic and production talks, where the later do not aim at a full generality or at a guaranteed improvement over the existing, provided the new methodology provides a gain in efficiency over the existing.

This connected nicely with me reading several Nature articles on quantum computing during that trip,  where researchers from Google predict commercial products appearing in the coming five years, even though the technology is far from perfect and the outcome qubit error prone. Among the examples they provided, quantum simulation (not meaning what I consider to be simulation!), quantum optimisation (as a way to overcome multimodality), and quantum sampling (targeting given probability distributions). I find the inclusion of the latest puzzling in that simulation (in that sense) shows very little tolerance for errors, especially systematic bias. It may be that specific quantum architectures can be designed for specific probability distributions, just like some are already conceived for optimisation. (It may even be the case that quantum solutions are (just next to) available for intractable constants as in Ising or Potts models!)

Russian roulette still rolling

Posted in Statistics with tags , , , , , , , , , , , , on March 22, 2017 by xi'an

Colin Wei and Iain Murray arXived a new version of their paper on doubly-intractable distributions, which is to be presented at AISTATS. It builds upon the Russian roulette estimator of Lyne et al. (2015), which itself exploits the debiasing technique of McLeish et al. (2011) [found earlier in the physics literature as in Carter and Cashwell, 1975, according to the current paper]. Such an unbiased estimator of the inverse of the normalising constant can be used for pseudo-marginal MCMC, except that the estimator is sometimes negative and has to be so as proved by Pierre Jacob and co-authors. As I discussed in my post on the Russian roulette estimator, replacing the negative estimate with its absolute value does not seem right because a negative value indicates that the quantity is close to zero, hence replacing it with zero would sound more appropriate. Wei and Murray start from the property that, while the expectation of the importance weight is equal to the normalising constant, the expectation of the inverse of the importance weight converges to the inverse of the weight for an MCMC chain. This however sounds like an harmonic mean estimate because the property would also stand for any substitute to the importance density, as it only requires the density to integrate to one… As noted in the paper, the variance of the resulting Roulette estimator “will be high” or even infinite. Following Glynn et al. (2014), the authors build a coupled version of that solution, which key feature is to cut the higher order terms in the debiasing estimator. This does not guarantee finite variance or positivity of the estimate, though. In order to decrease the variance (assuming it is finite), backward coupling is introduced, with a Rao-Blackwellisation step using our 1996 Biometrika derivation. Which happens to be of lower cost than the standard Rao-Blackwellisation in that special case, O(N) versus O(N²), N being the stopping rule used in the debiasing estimator. Under the assumption that the inverse importance weight has finite expectation [wrt the importance density], the resulting backward-coupling Russian roulette estimator can be proven to be unbiased, as it enjoys a finite expectation. (As in the generalised harmonic mean case, the constraint imposes thinner tails on the importance function, which then hampers the convergence of the MCMC chain.) No mention is made of achieving finite variance for those estimators, which again is a serious concern due to the similarity with harmonic means…