Charlie Geyer | Xi'an's Og

Archive for Charlie Geyer

evidence estimation in finite and infinite mixture models

Posted in Books, Statistics, University life with tags arXiv, Bayes factor, Bayesian model evaluation, bridge sampling, candidate's formula, Charlie Geyer, Chib's approximation, Dirichlet process mixture, nested sampling, noise contrasting estimation, sequential Monte Carlo, SIS, SMC, Université Paris Dauphine on May 20, 2022 by xi'an

Adrien Hairault (PhD student at Dauphine), Judith and I just arXived a new paper on evidence estimation for mixtures. This may sound like a well-trodden path that I have repeatedly explored in the past, but methinks that estimating the model evidence doth remain a notoriously difficult task for large sample or many component finite mixtures and even more for “infinite” mixture models corresponding to a Dirichlet process. When considering different Monte Carlo techniques advocated in the past, like Chib’s (1995) method, SMC, or bridge sampling, they exhibit a range of performances, in terms of computing time… One novel (?) approach in the paper is to write Chib’s (1995) identity for partitions rather than parameters as (a) it bypasses the label switching issue (as we already noted in Hurn et al., 2000), another one is to exploit Geyer (1991-1994) reverse logistic regression technique in the more challenging Dirichlet mixture setting, and yet another one a sequential importance sampling solution à la Kong et al. (1994), as also noticed by Carvalho et al. (2010). [We did not cover nested sampling as it quickly becomes onerous.]

Applications are numerous. In particular, testing for the number of components in a finite mixture model or against the fit of a finite mixture model for a given dataset has long been and still is an issue of much interest and diverging opinions, albeit yet missing a fully satisfactory resolution. Using a Bayes factor to find the right number of components K in a finite mixture model is known to provide a consistent procedure. We furthermore establish there the consistence of the Bayes factor when comparing a parametric family of finite mixtures against the nonparametric ‘strongly identifiable’ Dirichlet Process Mixture (DPM) model.

Leave a comment »

accronyms [CDT lectures]

Posted in Books, Statistics with tags ABC, Bayesian synthetic likelihood, CDT, Charlie Geyer, consistency, GANs, noise contrasting estimation, normalising flow, reverse logistic, short course, slides, University of Warwick, VAEs, variational autoencoders, Wasserstein distance, WGANs on May 16, 2022 by xi'an

This week, I gave a short and introductory course in Warwick for the CDT (PhD) students on my perceived connections between reverse logistic regression à la Geyer and GANS, among other things. The first attempt was cancelled in 2020 due to the pandemic, the second one in 2021 was on-line and thus offered little possibilities for interactions. Preparing for this third attempt made me read more papers on some statistical analyses of GANs and WGANs, which was more satisfactory [for me] even though I could not get into the technical details…

Leave a comment »

[more than] everything you always wanted to know about marginal likelihood

Posted in Books, Statistics, University life with tags adaptive importance sampling, arXiv, Bayes factor, bridge sampling, candidate's formula, Charlie Geyer, Chib's approximation, CRiSM, improper prior, Julian Besag, Laplace approximation, Madrid, nested sampling, noise contrasting estimation, path sampling, reversible jump MCMC, sequential Monte Carlo, surveys, umbrella sampling, Universidad Carlos III de Madrid, University of Warwick, Warwickshire on February 10, 2022 by xi'an

Earlier this year, F. Llorente, L. Martino, D. Delgado, and J. Lopez-Santiago have arXived an updated version of their massive survey on marginal likelihood computation. Which I can only warmly recommend to anyone interested in the matter! Or looking for a base camp to initiate a graduate project. They break the methods into four families

Deterministic approximations (e.g., Laplace approximations)
Methods based on density estimation (e.g., Chib’s method, aka the candidate’s formula)
Importance sampling, including sequential Monte Carlo, with a subsection connecting with MCMC
Vertical representations (mostly, nested sampling)

Besides sheer computation, the survey also broaches upon issues like improper priors and alternatives to Bayes factors. The parts I would have done in more details are reversible jump MCMC and the long-lasting impact of Geyer’s reverse logistic regression (with the noise contrasting extension), even though the link with bridge sampling is briefly mentioned there. There is even a table reporting on the coverage of earlier surveys. Of course, the following postnote of the manuscript

The Christian Robert’s blog deserves a special mention , since Professor C. Robert has devoted several entries of his blog with very interesting comments regarding the marginal likelihood estimation and related topics.

does not in the least make me less objective! Some of the final recommendations

use of Naive Monte Carlo [simulate from the prior] should be always considered [assuming a proper prior!]
a multiple-try method is a good choice within the MCMC schemes
optimal umbrella sampling estimator is difficult and costly to implement , so its best performance may not be achieved in practice
adaptive importance sampling uses the posterior samples to build a suitable normalized proposal, so it benefits from localizing samples in regions of high posterior probability while preserving the properties of standard importance sampling
Chib’s method is a good alternative, that provide very good performances [but is not always available]
the success [of nested sampling] in the literature is surprising.

1 Comment »

ABC by classification

Posted in pictures, Statistics, Travel, University life with tags ABC, Bayesian GANs, Biometrika, BIRS-CMO, Casa Matemática Oaxaca, Charlie Geyer, generalised Bayes estimators, neural network, Oaxaca, technical report, Université Paris Dauphine on December 21, 2021 by xi'an

As a(nother) coincidence, yesterday, we had a reading group discussion at Paris Dauphine a few days after Veronika Rockova presented the paper in person in Oaxaca. The idea in ABC by classification that she co-authored with Yuexi Wang and Tetsuya Kaj is to use the empirical Kullback-Leibler divergence as a substitute to the intractable likelihood at the parameter value θ. In the generalised Bayes setting of Bissiri et al. Since this quantity is not available it is estimated as well. By a classification method that somehow relates to Geyer’s 1994 inverse logistic proposal, using the (ABC) pseudo-data generated from the model associated with θ. The convergence of the algorithm obviously depends on the choice of the discriminator used in practice. The paper also makes a connection with GANs as a potential alternative for the generalised Bayes representation. It mostly focus on the frequentist validation of the ABC posterior, in the sense of exhibiting a posterior concentration rate in n, the sample size, while requiring performances of the discriminators that may prove hard to check in practice. Expanding our 2018 result to this setting, with the tolerance decreasing more slowly than the Kullback-Leibler estimation error.

Besides the shared appreciation that working with the Kullback-Leibler divergence was a nice and under-appreciated direction, one point that came out of our discussion is that using the (estimated) Kullback-Leibler divergence as a form of distance (attached with a tolerance) is less prone to variability (or more robust) than using directly (and without tolerance) the estimate as a substitute to the intractable likelihood, if we interpreted the discrepancy in Figure 3 properly. Another item was about the discriminator function itself: while a machine learning methodology such as neural networks could be used, albeit with unclear theoretical guarantees, it was unclear to us whether or not a new discriminator needed be constructed for each value of the parameter θ. Even when the simulations are run by a deterministic transform.

Leave a comment »

21w5107 [½day 3]

Posted in pictures, Statistics, Travel, University life with tags ABC, all models are wrong, Bayesian GANs, Brier score, Casa Matemática Oaxaca, Charlie Geyer, classification, confounders, empirical Bayes methods, Hyvärinen score, Laplace approximation, Mexico, Oaxaca, objective Bayes, One World ABC Seminar, robust Bayesian methods on December 2, 2021 by xi'an

Day [or half-day] three started without firecrackers and with David Rossell (formerly Warwick) presenting an empirical Bayes approach to generalised linear model choice with a high degree of confounding, using approximate Laplace approximations. With considerable improvements in the experimental RMSE. Making feeling sorry there was no apparent fully (and objective?) Bayesian alternative! (Two more papers on my reading list that I should have read way earlier!) Then Veronika Rockova discussed her work on approximate Metropolis-Hastings by classification. (With only a slight overlap with her One World ABC seminar.) Making me once more think of Geyer’s n⁰564 technical report, namely the estimation of a marginal likelihood by a logistic discrimination representation. Her ABC resolution replaces the tolerance step by an exponential of minus the estimated Kullback-Leibler divergence between the data density and the density associated with the current value of the parameter. (I wonder if there is a residual multiplicative constant there… Presumably not. Great idea!) The classification step need be run at every iteration, which could be sped up by subsampling.

On the always fascinating theme of loss based posteriors, à la Bissiri et al., Jack Jewson (formerly Warwick) exposed his work generalised Bayesian and improper models (from Birmingham!). Using data to decide between model and loss, which sounds highly unorthodox! First difficulty is that losses are unscaled. Or even not integrable after an exponential transform. Hence the notion of improper models. As in the case of robust Tukey’s loss, which is bounded by an arbitrary κ. Immediately I wonder if the fact that the pseudo-likelihood does not integrate is important beyond the (obvious) absence of a normalising constant. And the fact that this is not a generative model. And the answer came a few slides later with the use of the Hyvärinen score. Rather than the likelihood score. Which can itself be turned into a H-posterior, very cool indeed! Although I wonder at the feasibility of finding an [objective] prior on κ.

Rajesh Ranganath completed the morning session with a talk on [the difficulty of] connecting Bayesian models and complex prediction models. Using instead a game theoretic approach with Brier scores under censoring. While there was a connection with Veronika’s use of a discriminator as a likelihood approximation, I had trouble catching the overall message…

Leave a comment »

	xi'an on new arXiv rendering
	xi'an on new arXiv rendering
	David Firth on new arXiv rendering
	Coin Flipping Conund… on joint fiddlin
	Art Owen on Jerome Spanier (1930-2024)

Xi'an's Og

Archive for Charlie Geyer

evidence estimation in finite and infinite mixture models

accronyms [CDT lectures]

[more than] everything you always wanted to know about marginal likelihood

ABC by classification

21w5107 [½day 3]

blogs & links

Recent entries

Latest comments

Og\’s RSS

Xi'an's Og

Archive for Charlie Geyer

evidence estimation in finite and infinite mixture models

Share:

accronyms [CDT lectures]

Share:

[more than] everything you always wanted to know about marginal likelihood

Share:

ABC by classification

Share:

21w5107 [½day 3]

Share:

blogs & links

Recent entries

Latest comments

Og\’s RSS