Archive for noise contrasting estimation

evidence estimation in finite and infinite mixture models

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on May 20, 2022 by xi'an

Adrien Hairault (PhD student at Dauphine), Judith and I just arXived a new paper on evidence estimation for mixtures. This may sound like a well-trodden path that I have repeatedly explored in the past, but methinks that estimating the model evidence doth remain a notoriously difficult task for large sample or many component finite mixtures and even more for “infinite” mixture models corresponding to a Dirichlet process. When considering different Monte Carlo techniques advocated in the past, like Chib’s (1995) method, SMC, or bridge sampling, they exhibit a range of performances, in terms of computing time… One novel (?) approach in the paper is to write Chib’s (1995) identity for partitions rather than parameters as (a) it bypasses the label switching issue (as we already noted in Hurn et al., 2000), another one is to exploit  Geyer (1991-1994) reverse logistic regression technique in the more challenging Dirichlet mixture setting, and yet another one a sequential importance sampling solution à la  Kong et al. (1994), as also noticed by Carvalho et al. (2010). [We did not cover nested sampling as it quickly becomes onerous.]

Applications are numerous. In particular, testing for the number of components in a finite mixture model or against the fit of a finite mixture model for a given dataset has long been and still is an issue of much interest and diverging opinions, albeit yet missing a fully satisfactory resolution. Using a Bayes factor to find the right number of components K in a finite mixture model is known to provide a consistent procedure. We furthermore establish there the consistence of the Bayes factor when comparing a parametric family of finite mixtures against the nonparametric ‘strongly identifiable’ Dirichlet Process Mixture (DPM) model.

accronyms [CDT lectures]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , on May 16, 2022 by xi'an

This week, I gave a short and introductory course in Warwick for the CDT (PhD) students on my perceived connections between reverse logistic regression à la Geyer and GANS, among other things. The first attempt was cancelled in 2020 due to the pandemic, the second one in 2021 was on-line and thus offered little possibilities for interactions. Preparing for this third attempt made me read more papers on some statistical analyses of GANs and WGANs, which was more satisfactory [for me] even though I could not get into the technical details…

[more than] everything you always wanted to know about marginal likelihood

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on February 10, 2022 by xi'an

Earlier this year, F. Llorente, L. Martino, D. Delgado, and J. Lopez-Santiago have arXived an updated version of their massive survey on marginal likelihood computation. Which I can only warmly recommend to anyone interested in the matter! Or looking for a base camp to initiate a graduate project. They break the methods into four families

  1. Deterministic approximations (e.g., Laplace approximations)
  2. Methods based on density estimation (e.g., Chib’s method, aka the candidate’s formula)
  3. Importance sampling, including sequential Monte Carlo, with a subsection connecting with MCMC
  4. Vertical representations (mostly, nested sampling)

Besides sheer computation, the survey also broaches upon issues like improper priors and alternatives to Bayes factors. The parts I would have done in more details are reversible jump MCMC and the long-lasting impact of Geyer’s reverse logistic regression (with the noise contrasting extension), even though the link with bridge sampling is briefly mentioned there. There is even a table reporting on the coverage of earlier surveys. Of course, the following postnote of the manuscript

The Christian Robert’s blog deserves a special mention , since Professor C. Robert has devoted several entries of his blog with very interesting comments regarding the marginal likelihood estimation and related topics.

does not in the least make me less objective! Some of the final recommendations

  • use of Naive Monte Carlo [simulate from the prior] should be always considered [assuming a proper prior!]
  • a multiple-try method is a good choice within the MCMC schemes
  • optimal umbrella sampling estimator is difficult and costly to implement , so its best performance may not be achieved in practice
  • adaptive importance sampling uses the posterior samples to build a suitable normalized proposal, so it benefits from localizing samples in regions of high posterior probability while preserving the properties of standard importance sampling
  • Chib’s method is a good alternative, that provide very good performances [but is not always available]
  • the success [of nested sampling] in the literature is surprising.

improving bridge samplers by GANs

Posted in Books, pictures, Statistics with tags , , , , , , , on July 20, 2021 by xi'an

Hanwen Xing from Oxford recently posted a paper on arXiv about using GANs to improve the overlap bewtween the densities in bridge sampling. Bringing out new connections with noise contrastive estimation. The idea is to optimise a transform of one of the densities h() to bring it closer to the other density k(), using for instance normalising flows. (The call to transforms for bridge is not new, dating at least to Voter in 1985, the year I was starting my PhD!) Furthermore, using an f-divergence as a measure of functional distance allows for a reasonably straightforward update of the transform. That can be reformulated as a GAN target, which is somewhat natural in that the transform aims at confusing simulation from the transform of h and from k. This is quite an interesting proposal,  even though calculating the optimal transform is time-consuming and subjet to the curse of dimensionality. I also wonder at whether or not iterating the optimisation, one density after the other, would be bring further improvement.

NCE, VAEs, GANs & even ABC…

Posted in Statistics with tags , , , , , , , , , , , , , on May 14, 2021 by xi'an

As I was preparing my (new) lectures for a PhD short course “at” Warwick (meaning on Teams!), I read a few surveys and other papers on all these acronyms. It included the massive Guttmann and Hyvärinen 2012 NCE JMLR paperGoodfellow’s NIPS 2016 tutorial on GANs, and  Kingma and Welling 2019 introduction to VAEs. Which I found a wee bit on the light side, maybe missing the fundamentals of the notion… As well as the pretty helpful 2019 survey on normalising flows by Papamakarios et al., although missing on the (statistical) density estimation side.  And also a nice (2017) survey of GANs by Shakir Mohamed and Balaji Lakshminarayanan with a somewhat statistical spirit, even though convergence issues are not again not covered. But misspecification is there. And the many connections between ABC and GANs, if definitely missing on the uncertainty aspects. While Deep Learning by Goodfellow, Bengio and Courville adresses both the normalising constant (or partition function) and GANs, it was somehow not deep enough (!) to use for the course, offering only a few pages on NCE, VAEs and GANs. (And also missing on the statistical references addressing the issue, incl. [or excl.]  Geyer, 1994.) Overall, the infinite variations offered on GANs leave me uncertain about their statistical relevance, as it is unclear how good the regularisation therein is for handling overfitting and consistent estimation. (And if I spot another decomposition of the Kullback-Leibler divergence, I may start crying…)

%d bloggers like this: