Archive for San Antonio

approximation of Bayes Factors via mixing

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on December 21, 2020 by xi'an

A [new version of a] paper by Chenguang Dai and Jun S. Liu got my attention when it appeared on arXiv yesterday. Due to its title which reminded me of a solution to the normalising constant approximation that we proposed in the 2010 nested sampling evaluation paper we wrote with Nicolas. Recovering bridge sampling—mentioned by Dai and Liu as an alternative to their approach rather than an early version—by a type of Charlie Geyer (1990-1994) trick. (The attached slides are taken from my MCMC graduate course, with a section on the approximation of Bayesian normalising constants I first wrote for a short course at Jim Berger’s 70th anniversary conference, in San Antonio.)

A difference with the current paper is that the authors “form a mixture distribution with an adjustable mixing parameter tuned through the Wang-Landau algorithm.” While we chose it by hand to achieve sampling from both components. The weight is updated by a simple (binary) Wang-Landau version, where the partition is determined by which component is simulated, ie by the mixture indicator auxiliary variable. Towards using both components on an even basis (à la Wang-Landau) and stabilising the resulting evaluation of the normalising constant. More generally, the strategy applies to a sequence of surrogate densities, which are chosen by variational approximations in the paper.

marginal likelihoods from MCMC

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on April 26, 2017 by xi'an

A new arXiv entry on ways to approximate marginal likelihoods based on MCMC output, by astronomers (apparently). With an application to the 2015 Planck satellite analysis of cosmic microwave background radiation data, which reminded me of our joint work with the cosmologists of the Paris Institut d’Astrophysique ten years ago. In the literature review, the authors miss several surveys on the approximation of those marginals, including our San Antonio chapter, on Bayes factors approximations, but mention our ABC survey somewhat inappropriately since it is not advocating the use of ABC for such a purpose. (They mention as well variational Bayes approximations, INLA, powered likelihoods, if not nested sampling.)

The proposal of this paper is to identify the marginal m [actually denoted a there] as the normalising constant of an unnormalised posterior density. And to do so the authors estimate the posterior by a non-parametric approach, namely a k-nearest-neighbour estimate. With the additional twist of producing a sort of Bayesian posterior on the constant m. [And the unusual notion of number density, used for the unnormalised posterior.] The Bayesian estimation of m relies on a Poisson sampling assumption on the k-nearest neighbour distribution. (Sort of, since k is actually fixed, not random.)

If the above sounds confusing and imprecise it is because I am myself rather mystified by the whole approach and find it difficult to see the point in this alternative. The Bayesian numerics does not seem to have other purposes than producing a MAP estimate. And using a non-parametric density estimate opens a Pandora box of difficulties, the most obvious one being the curse of dimension(ality). This reminded me of the commented paper of Delyon and Portier where they achieve super-efficient convergence when using a kernel estimator, but with a considerable cost and a similar sensitivity to dimension.

R/Rmetrics in Paris [alas!]

Posted in Mountains, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , on June 30, 2014 by xi'an

Bernard1Today I gave a talk on Bayesian model choice in a fabulous 13th Century former monastery in the Latin Quarter of Paris… It is the Collège des Bernardins, close to Jussieu and Collège de France, unbelievably hidden to the point I was not aware of its existence despite having studied and worked in Jussieu since 1982… I mixed my earlier San Antonio survey on importance sampling approximations to Bayes factors with an entry to our most recent work on ABC with random forests. This was the first talk of the 8th R/Rmetrics workshop taking place in Paris this year. (Rmetrics is aiming at aggregating R packages with econometrics and finance applications.) And I had a full hour and a half to deliver my lecture to the workshop audience. Nice place, nice people, new faces and topics (and even andouille de Vire for lunch!): why should I complain with an alas in the title?!Bernard2What happened is that the R/Rmetrics meetings have been till this year organised in Meielisalp, Switzerland. Which stands on top of Thuner See and… just next to the most famous peaks of the Bernese Alps! And that I had been invited last year but could not make it… Meaning I lost a genuine opportunity to climb one of my five dream routes, the Mittelegi ridge of the Eiger. As the future R/Rmetrics meetings will not take place there.

A lunch discussion at the workshop led me to experiment the compiler library in R, library that I was unaware of. The impact on the running time is obvious: recycling the fowler function from the last Le Monde puzzle,

> bowler=cmpfun(fowler)
> N=20;n=10;system.time(fowler(pred=N))
   user  system elapsed 
 52.647   0.076  56.332 
> N=20;n=10;system.time(bowler(pred=N))
   user  system elapsed 
 51.631   0.004  51.768 
> N=20;n=15;system.time(bowler(pred=N))
   user  system elapsed 
 51.924   0.024  52.429 
> N=20;n=15;system.time(fowler(pred=N))
   user  system elapsed 
 52.919   0.200  61.960 

shows a ten- to twenty-fold gain in system time, if not in elapsed time (re-alas!).

a remarkably simple and accurate method for computing the Bayes factor &tc.

Posted in Statistics with tags , , , , , , , , on February 13, 2013 by xi'an

This recent arXiv posting by Martin Weinberg and co-authors was pointed out to me by friends because of its title! It indeed sounded a bit inflated. And also reminded me of old style papers where the title was somehow the abstract. Like An Essay towards Solving a Problem in the Doctrine of Chances… So I had a look at it on my way to Gainesville. The paper starts from the earlier paper by Weinberg (2012) in Bayesian Analysis where he uses an HPD region to determine the Bayes factor by a safe harmonic mean estimator (an idea we already advocated earlier with Jean-Michel Marin in the San Antonio volume and with Darren Wraith in the MaxEnt volume). An extra idea is to try to optimise [against the variance of the resulting evidence] the region over which the integration is performed: “choose a domain that results in the most accurate integral with the smallest number of samples” (p.3). The authors proceed by volume peeling, using some quadrature formula for the posterior coverage of the region, either by Riemann or Lebesgue approximations (p.5). I was fairly lost at this stage and the third proposal based on adaptively managing hyperrectangles (p.7) went completely over my head! The sentence “the results are clearly worse with O() errors, but are still remarkably better for high dimensionality”(p.11) did not make sense either… The method may thus be remarkably simple, but the paper is not written in a way that conveys this impression!

Publication from the frontier

Posted in Books, Statistics, Travel with tags , , , on September 29, 2010 by xi'an

In conjunction with the conference in San Antonio last March, I have received the book Frontiers of Statistical Decision Making and Bayesian Analysis: In Honor of James O. Berger edited by Ming-Hui Chen (University of Connecticut), Dipak K. Dey (University of Connecticut), Peter Müller (University of Texas M. D. Anderson Cancer Center), Dongchu Sun (University of Missouri- Columbia) and Keying Ye (University of Texas at San Antonio), who, incidentally, were are PhD students of Jim Berger at the time I visited Purdue University. The book has been edited in depth and so it reads very well, with contributions regrouped by chapters. Here is the table of contents:

  1. Introduction.
  2. Objective Bayesian inference with applications.
  3. Bayesian decision based estimation and predictive inference.
  4. Bayesian model selection and hypothesis tests.
  5. Bayesian computer models.
  6. Bayesian nonparametrics and semi-parametrics.
  7. Bayesian case influence and frequentist interface.
  8. Bayesian clinical trials.
  9. Bayesian methods for genomics, molecular, and systems biology.
  10. Bayesian data mining and machine learning.
  11. Bayesian inference in political and social sciences, finance, and marketing.
  12. Bayesian categorical data analysis.
  13. Bayesian geophysical, spatial, and temporal statistics.
  14. Posterior simulation and Monte Carlo methods.

whose final chapter (the only one missing Bayesian from the title!) contains our contribution with Jean-Michel Marin.