Archive for bridge sampling

insufficient Gibbs sampling bridges as well!

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , , , , on March 3, 2024 by xi'an

Antoine Luciano, Robin Ryder and I posted a revised version of our insufficient Gibbs sampler on arXiv last week (along with three other revisions or new deposits of mine’s!), following comments and suggestions from referees. Thanks to this revision, we realised that the evidence based on an (insufficient) statistic was also available for approximation by a Monte Carlo estimate attached to the completed sample simulated by the insufficient sampler. Better, a bridge sampling estimator can be used in the same conditions as when the full data is available! In this new version, we thus revisited toy examples first explored in some of my ABC papers on testing (with insufficient statistics), as illustrated by both graphs on this post.

telescope on evidence for graphical models

Posted in Books, Statistics, University life with tags , , , , , , , , , on February 29, 2024 by xi'an

A recent paper on evidence by Anindya Bhadra, Ksheera Sagar, Sayantan Banerjee (whom I met during Rito’s seminar, since he was also visiting Ismael in Paris, and who mentioned this work), and Jyotishka Datta, on computing the evidence for graphical models. Obtaining an approximation of the evidence attached with a model and a prior on the covariance matrix Ω is a challenge they manage to address in a particularly clever manner.

“the conditional posterior density [of the last column of the covariance matrix] can be evaluated as a product of normal and gamma densities under suitable priors (…) We resolve this [difficulty with the integrated likelihood] by evaluating the required densities in one row or column at a time, and proceeding backwards starting from the p-th row, with appropriate adjustments to Ωp×p at each step via Schur complement. “

Using a telescoping trick, the authors exploit the fact that the decomposition

\log f(y_{1:p})=\log f(y_p|y_{1:p-1},\theta_p)+\log f (y_{1:p-1}|\theta_p)+\log f(\theta_p)-\log f(\theta_p|y_{1:p})

involves a problematic second term that can be ignored by successive cancellations, as shown by Figure 1. The other terms are manageable for some classes of priors on Ω. Like a Wishart. This allows them to call for Chib’s (two-black) method, which requires two independent MCMC runs. Actually, an unfortunate aspect of the approach is that its computational complexity is of order O(M p), where M is the number of MCMC samples, due to the telescopic trick involving calling Chib’s approach for each of the p columns of Ω. While the numerical outcomes compare with nested sampling, annealed importance sampling, and even harmonic mean estimates (!), the computing time usually exceeds those for these other methods, esp. harmonic mean estimates For the specific G-Wishart case, the solution proposed by Atay-Kayis and Massam (2005) proves far superior. Since the main purpose of using evidence is in deriving Bayes factors, I wonder at possible gains in recycling simulations between models, even though this would seem to call for bridge sampling, no considered in the paper.

evidence estimation in finite and infinite mixture models

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on May 20, 2022 by xi'an

Adrien Hairault (PhD student at Dauphine), Judith and I just arXived a new paper on evidence estimation for mixtures. This may sound like a well-trodden path that I have repeatedly explored in the past, but methinks that estimating the model evidence doth remain a notoriously difficult task for large sample or many component finite mixtures and even more for “infinite” mixture models corresponding to a Dirichlet process. When considering different Monte Carlo techniques advocated in the past, like Chib’s (1995) method, SMC, or bridge sampling, they exhibit a range of performances, in terms of computing time… One novel (?) approach in the paper is to write Chib’s (1995) identity for partitions rather than parameters as (a) it bypasses the label switching issue (as we already noted in Hurn et al., 2000), another one is to exploit  Geyer (1991-1994) reverse logistic regression technique in the more challenging Dirichlet mixture setting, and yet another one a sequential importance sampling solution à la  Kong et al. (1994), as also noticed by Carvalho et al. (2010). [We did not cover nested sampling as it quickly becomes onerous.]

Applications are numerous. In particular, testing for the number of components in a finite mixture model or against the fit of a finite mixture model for a given dataset has long been and still is an issue of much interest and diverging opinions, albeit yet missing a fully satisfactory resolution. Using a Bayes factor to find the right number of components K in a finite mixture model is known to provide a consistent procedure. We furthermore establish there the consistence of the Bayes factor when comparing a parametric family of finite mixtures against the nonparametric ‘strongly identifiable’ Dirichlet Process Mixture (DPM) model.

taking advantage of the constant

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , on May 19, 2022 by xi'an

A question from X validated had enough appeal for me to procrastinate about it for ½ an hour: what difference does it make [for simulation purposes] that a target density is properly normalised? In the continuous case, I do not see much to exploit about this knowledge, apart from the value potentially leading to a control variate (in a Gelfand and Dey 1996 spirit) and possibly to a stopping rule (by checking that the portion of the space visited so far has mass close to one, but this is more delicate than it sounds).

In a (possibly infinite) countable setting, it seems to me one gain (?) is that approximating expectations by Monte Carlo no longer requires iid simulations in the sense that once visited,  atoms need not be visited again. Self-avoiding random walks and their generalisations thus appear as a natural substitute for MC(MC) methods in this setting, provided finding unexplored atoms proves manageable. For instance, a stopping rule is always available, namely that the cumulated weight of the visited fraction of the space is close enough to one. The above picture shows a toy example on a 500 x 500 grid with 0.1% of the mass remaining at the almost invisible white dots. (In my experiment, neighbours for the random exploration were chosen at random over the grid, as I assumed no global information was available about the repartition over the grid either of mass function or of the function whose expectation was seeked.)

[more than] everything you always wanted to know about marginal likelihood

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on February 10, 2022 by xi'an

Earlier this year, F. Llorente, L. Martino, D. Delgado, and J. Lopez-Santiago have arXived an updated version of their massive survey on marginal likelihood computation. Which I can only warmly recommend to anyone interested in the matter! Or looking for a base camp to initiate a graduate project. They break the methods into four families

  1. Deterministic approximations (e.g., Laplace approximations)
  2. Methods based on density estimation (e.g., Chib’s method, aka the candidate’s formula)
  3. Importance sampling, including sequential Monte Carlo, with a subsection connecting with MCMC
  4. Vertical representations (mostly, nested sampling)

Besides sheer computation, the survey also broaches upon issues like improper priors and alternatives to Bayes factors. The parts I would have done in more details are reversible jump MCMC and the long-lasting impact of Geyer’s reverse logistic regression (with the noise contrasting extension), even though the link with bridge sampling is briefly mentioned there. There is even a table reporting on the coverage of earlier surveys. Of course, the following postnote of the manuscript

The Christian Robert’s blog deserves a special mention , since Professor C. Robert has devoted several entries of his blog with very interesting comments regarding the marginal likelihood estimation and related topics.

does not in the least make me less objective! Some of the final recommendations

  • use of Naive Monte Carlo [simulate from the prior] should be always considered [assuming a proper prior!]
  • a multiple-try method is a good choice within the MCMC schemes
  • optimal umbrella sampling estimator is difficult and costly to implement , so its best performance may not be achieved in practice
  • adaptive importance sampling uses the posterior samples to build a suitable normalized proposal, so it benefits from localizing samples in regions of high posterior probability while preserving the properties of standard importance sampling
  • Chib’s method is a good alternative, that provide very good performances [but is not always available]
  • the success [of nested sampling] in the literature is surprising.