Archive for San Antonio

marginal likelihoods from MCMC

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on April 26, 2017 by xi'an

A new arXiv entry on ways to approximate marginal likelihoods based on MCMC output, by astronomers (apparently). With an application to the 2015 Planck satellite analysis of cosmic microwave background radiation data, which reminded me of our joint work with the cosmologists of the Paris Institut d’Astrophysique ten years ago. In the literature review, the authors miss several surveys on the approximation of those marginals, including our San Antonio chapter, on Bayes factors approximations, but mention our ABC survey somewhat inappropriately since it is not advocating the use of ABC for such a purpose. (They mention as well variational Bayes approximations, INLA, powered likelihoods, if not nested sampling.)

The proposal of this paper is to identify the marginal m [actually denoted a there] as the normalising constant of an unnormalised posterior density. And to do so the authors estimate the posterior by a non-parametric approach, namely a k-nearest-neighbour estimate. With the additional twist of producing a sort of Bayesian posterior on the constant m. [And the unusual notion of number density, used for the unnormalised posterior.] The Bayesian estimation of m relies on a Poisson sampling assumption on the k-nearest neighbour distribution. (Sort of, since k is actually fixed, not random.)

If the above sounds confusing and imprecise it is because I am myself rather mystified by the whole approach and find it difficult to see the point in this alternative. The Bayesian numerics does not seem to have other purposes than producing a MAP estimate. And using a non-parametric density estimate opens a Pandora box of difficulties, the most obvious one being the curse of dimension(ality). This reminded me of the commented paper of Delyon and Portier where they achieve super-efficient convergence when using a kernel estimator, but with a considerable cost and a similar sensitivity to dimension.

R/Rmetrics in Paris [alas!]

Posted in Mountains, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , on June 30, 2014 by xi'an

Bernard1Today I gave a talk on Bayesian model choice in a fabulous 13th Century former monastery in the Latin Quarter of Paris… It is the Collège des Bernardins, close to Jussieu and Collège de France, unbelievably hidden to the point I was not aware of its existence despite having studied and worked in Jussieu since 1982… I mixed my earlier San Antonio survey on importance sampling approximations to Bayes factors with an entry to our most recent work on ABC with random forests. This was the first talk of the 8th R/Rmetrics workshop taking place in Paris this year. (Rmetrics is aiming at aggregating R packages with econometrics and finance applications.) And I had a full hour and a half to deliver my lecture to the workshop audience. Nice place, nice people, new faces and topics (and even andouille de Vire for lunch!): why should I complain with an alas in the title?!Bernard2What happened is that the R/Rmetrics meetings have been till this year organised in Meielisalp, Switzerland. Which stands on top of Thuner See and… just next to the most famous peaks of the Bernese Alps! And that I had been invited last year but could not make it… Meaning I lost a genuine opportunity to climb one of my five dream routes, the Mittelegi ridge of the Eiger. As the future R/Rmetrics meetings will not take place there.

A lunch discussion at the workshop led me to experiment the compiler library in R, library that I was unaware of. The impact on the running time is obvious: recycling the fowler function from the last Le Monde puzzle,

> bowler=cmpfun(fowler)
> N=20;n=10;system.time(fowler(pred=N))
   user  system elapsed 
 52.647   0.076  56.332 
> N=20;n=10;system.time(bowler(pred=N))
   user  system elapsed 
 51.631   0.004  51.768 
> N=20;n=15;system.time(bowler(pred=N))
   user  system elapsed 
 51.924   0.024  52.429 
> N=20;n=15;system.time(fowler(pred=N))
   user  system elapsed 
 52.919   0.200  61.960 

shows a ten- to twenty-fold gain in system time, if not in elapsed time (re-alas!).

a remarkably simple and accurate method for computing the Bayes factor &tc.

Posted in Statistics with tags , , , , , , , , on February 13, 2013 by xi'an

This recent arXiv posting by Martin Weinberg and co-authors was pointed out to me by friends because of its title! It indeed sounded a bit inflated. And also reminded me of old style papers where the title was somehow the abstract. Like An Essay towards Solving a Problem in the Doctrine of Chances… So I had a look at it on my way to Gainesville. The paper starts from the earlier paper by Weinberg (2012) in Bayesian Analysis where he uses an HPD region to determine the Bayes factor by a safe harmonic mean estimator (an idea we already advocated earlier with Jean-Michel Marin in the San Antonio volume and with Darren Wraith in the MaxEnt volume). An extra idea is to try to optimise [against the variance of the resulting evidence] the region over which the integration is performed: “choose a domain that results in the most accurate integral with the smallest number of samples” (p.3). The authors proceed by volume peeling, using some quadrature formula for the posterior coverage of the region, either by Riemann or Lebesgue approximations (p.5). I was fairly lost at this stage and the third proposal based on adaptively managing hyperrectangles (p.7) went completely over my head! The sentence “the results are clearly worse with O() errors, but are still remarkably better for high dimensionality”(p.11) did not make sense either… The method may thus be remarkably simple, but the paper is not written in a way that conveys this impression!

Publication from the frontier

Posted in Books, Statistics, Travel with tags , , , on September 29, 2010 by xi'an

In conjunction with the conference in San Antonio last March, I have received the book Frontiers of Statistical Decision Making and Bayesian Analysis: In Honor of James O. Berger edited by Ming-Hui Chen (University of Connecticut), Dipak K. Dey (University of Connecticut), Peter Müller (University of Texas M. D. Anderson Cancer Center), Dongchu Sun (University of Missouri- Columbia) and Keying Ye (University of Texas at San Antonio), who, incidentally, were are PhD students of Jim Berger at the time I visited Purdue University. The book has been edited in depth and so it reads very well, with contributions regrouped by chapters. Here is the table of contents:

  1. Introduction.
  2. Objective Bayesian inference with applications.
  3. Bayesian decision based estimation and predictive inference.
  4. Bayesian model selection and hypothesis tests.
  5. Bayesian computer models.
  6. Bayesian nonparametrics and semi-parametrics.
  7. Bayesian case influence and frequentist interface.
  8. Bayesian clinical trials.
  9. Bayesian methods for genomics, molecular, and systems biology.
  10. Bayesian data mining and machine learning.
  11. Bayesian inference in political and social sciences, finance, and marketing.
  12. Bayesian categorical data analysis.
  13. Bayesian geophysical, spatial, and temporal statistics.
  14. Posterior simulation and Monte Carlo methods.

whose final chapter (the only one missing Bayesian from the title!) contains our contribution with Jean-Michel Marin.

Savage-Dickey to be revised

Posted in Statistics with tags , , on April 6, 2010 by xi'an

Last night, I received the very nice news from the Electronic Journal of Statistics that our paper on the Savage-Dickey paradox was in for revision. One referee recommended acceptance as is and the second referee asked for more details and examples in order to broaden the audience. All reports are very nice and consistent with the response I got from my talk in San Antonio. Maybe giving the talk there helped with the positive decision!

[wet] impressions from the frontier

Posted in Running, Travel with tags , , , on March 21, 2010 by xi'an

This morning, before going runnin, I took a look by my hotel window and noticed the road below was wet. Since there were warnings of a thunderstorm, I  checked on a website the current forecast—taking advantage of an opening in the local networks!— and saw that the local conditons were a “light drizzle”. When I went out, it was hardly drizzling, indeed, and so I went on my “usual” round following the riverwalk for about two miles. At the turning point—an underbridge with huge fish sculptures hanging from the bridge—, rain started to fall rather heavily and on the way back I soon found myself in the midst of a thunderstorm! I had to stop under a bridge to wait for the rain to abate and the storm front to move away. After a few minutes, the strength of the rain went down but by then the riverwalk had turned into a river and stairs into cascades, the river being below street level. At some point, the flow of water falling from the street was so strong that I had to turn back to cross to the other side… This was thus an interesting experience, teaching to pay more attention in the future to storm warnings.

[more] impressions from the frontier

Posted in Statistics, Travel, University life with tags , , , , on March 20, 2010 by xi'an

Apart from my (theoretical) Bayes factor session, I attended non-parametric, and two other Bayes factor sessions  at Frontiers of Statistical Decision Making and Bayesian Analysis today! (I forgot to mention the talk of Kenneth Rice yesterday that related to my early 1990’s work with George Casella and Juinn Hwang on integrating estimation loss functions into testing losses). I liked very much Ed George’s extension of the g-prior, as well as Elias Moreno’s prior modelling of clustering models (along with Guido Consonni’s and Adrian Raftery’s talks in the same session). As commented by many participants to the conference, one major difficulty was to figure out which of the three parallel sessions to choose…  The plenary sessions were superb as well, with Steve Fienberg describing an analysis of a highly (highly) complex model on aging, and Persi Diaconis giving us the flavour of his lastest work with David Aldous on quasi-exchangeability, with exctracts from de Finetti.

on of the g-prior, as well as Elias Moreno’s prior modelling of clustering models (along with Guido Consonni’s and Adrian Raftery’s talks in the same session).