Archive for Bayesian statistics

workshop a Padova

Posted in pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on March 22, 2013 by xi'an

Needless to say, it is with great pleasure I am back in beautiful Padova for the workshop Recent Advances in statistical inference: theory and case studies, organised by Laura Ventura and Walter Racugno. Esp. when considering this is one of the last places I met with George Casella, in June 2010. As we have plenty of opportunities to remember him with so many of his friends here. (Tomorrow we will run around Prato della Valle in his memory.)

The workshop is of a “traditional Bayesian facture”, I mean one I enjoy very much: long talks with predetermined discussants and discussion from the floor. This makes for less talks (although we had eight today!) but also for more exciting sessions if the talks are broad and innovative. This was the case today (not including my talk of course) and I enjoyed the sessions a lot.

Jim Berger gave the first talk on “global” objective priors, starting from the desiderata to build a “general” reference prior when one does not want to separate parameters of interest from nuisance parameters and when one already has marginal reference priors on those parameters. This setting was actually addressed in Berger and Sun (AoS, 2008) and Jim presented some of the solutions therein: while I could not really see a strong incentive in using an arithmetic average of those, because it does not make much sense with improper priors, I definitely liked the notion of geometric averages, which evacuate the problem of the normalising constants. (There are open questions as well, about whether one improper prior could dwarf another one in the geometric average. Tail-wise for instance. Gauri Datta mentioned in his discussion that the geometric average is a specific Kullback-Leibler optimum.)

In his discussion of Tom Severini’s paper on integrated likelihood (which really stands at the margin of Bayesian inference), Brunero Liseo proposed a new use of ABC to approximate the likelihood function (while regular ABC relies on an approximation of the likelihood), a bit à la Chib. I cannot tell about the precision of this approximation but this is rather exciting!

Laura Ventura presented four of her current papers on the use of high order asymptotics in approximating (Bayesian) posteriors, following the JASA 2012 paper by Ventura, Cabras and Racugno. (The same issue featured a paper by Gill and Casella, coincidentally.) She showed the improvement brought by moving from first order (normal) to third order (non-normal). This is in a sense at the antipode of ABC, e.g. I’d like to see the requirements on the likelihood functions to be able to come up with a manageable Laplace approximation. She also mentioned a resolution of the Jeffreys-Lindley paradox via the Pereira et al. (2008) evidence, which computes a sort of Bayesian p-value by assessing the posterior probability of the posterior density being lower than its value at the null. I had missed or forgotten about this idea, but I wonder at some caveats like the impact of parameterisation, the connection with the testing problem, the calibration of the quantity, the extension to non-nested models, &tc. (Note that Ventura et al. developed an R package called hoa, for higher-order asymptotics.)

David Dunson presented some very recent work on compressed sensing that summed up for me into the idea of massively projecting (huge vectors of) regressors into much smaller dimension convex combinations, using random matrices for the projections. This point was somehow unclear to me. And to the first discussant Michael Wiper as well, who stressed that a completely random selection of those matrices could produce “mostly rubbish”, unless a learning mechanism was instated. The second discussant, Peter Müller, made the same point about this completely random search in a huge dimension space, while considering the survival frequency of covariates could help towards the efficiency of the method.

Biometrika, volume 100

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on March 5, 2013 by xi'an

I had been privileged to have a look at a preliminary version of the now-published retrospective written by Mike Titterington on the 100 first issues of Biometrika (more exactly, “from volume 28 onwards“, as the title state). Mike was the dedicated editor of Biometrika for many years and edited a nice book for the 100th anniversary of the journal. He started from the 100th most highly cited papers within the journal to build a coherent chronological coverage. From a Bayesian perspective, this retrospective starts with Maurice Kendall trying to reconcile frequentists and non-frequentists in 1949, while having a hard time with fiducial statistics. Then Dennis Lindley makes it to the top 100 in 1957 with the Lindley-Jeffreys paradox. From 1958 till 1961, Darroch is quoted several times for his (fine) formalisation of the capture-recapture experiments we were to study much later (Biometrika, 1992) with Ed George… In the 1960′s, Bayesian papers became more visible, including Don Fraser (1961) and Arthur Dempster’ Demspter-Shafer theory of evidence, as well as George Box and co-authors (1965, 1968) and Arnold Zellner (1964). Keith Hastings’ 1970 paper stands as the fifth most highly cited paper, even though it was ignored for almost two decades. The number of Bayesian papers kept increasing. including Binder’s (1978) cluster estimation, Efron and Morris’ (1972) James-Stein estimators, and Efron and Thisted’s (1978) terrific evaluation of Shakespeare’s vocabulary. From then, the number of Bayesian papers gets too large to cover in its entirety. The 1980′s saw papers by Julian Besag (1977, 1989, 1989 with Peter Clifford, which was yet another precursor MCMC) and Luke Tierney’s work (1989) on Laplace approximation. Carter and Kohn’s (1994) MCMC algorithm on state space models made it to the top 40, while Peter Green’s (1995) reversible jump algorithm came close to Hastings’ (1970) record, being the 8th most highly cited paper. Since the more recent papers do not make it to the top 100 list, Mike Titterington’s coverage gets more exhaustive as the years draw near, with an almost complete coverage for the final years. Overall, a fascinating journey through the years and the reasons why Biometrika is such a great journal and constantly so.

rise of the B word

Posted in Statistics with tags , , , on February 26, 2013 by xi'an

comparison of the uses of the words Bayesian, maximum likelihood, and frequentist, using Google NgramWhile preparing a book chapter, I checked on Google Ngram viewer the comparative uses of the words Bayesian (blue), maximum likelihood (red) and frequentist (yellow), producing the above (screen-copy quality, I am afraid!). It shows an increase of the use of the B word from the early 80′s and not the sudden rise in the 90′s I was expecting. The inclusion of “frequentist” is definitely in the joking mode, as this is not a qualification used by frequentists to describe their methods. In other words (!), “frequentist” does not occur very often in frequentist papers (and not as often as in Bayesian papers!)…

the BUGS Book [guest post]

Posted in Books, R, Statistics with tags , , , , , , , , , , on February 25, 2013 by xi'an

(My colleague Jean-Louis Fouley, now at I3M, Montpellier, kindly agreed to write a review on the BUGS book for CHANCE. Here is the review, en avant-première! Watch out, it is fairly long and exhaustive! References will be available in the published version. The additions of book covers with BUGS in the title and of the corresponding Amazon links are mine!)

If a book has ever been so much desired in the world of statistics, it is for sure this one. Many people have been expecting it for more than 20 years ever since the WinBUGS software has been in use. Therefore, the tens of thousands of users of WinBUGS are indebted to the leading team of the BUGS project (D Lunn, C Jackson, N Best, A Thomas and D Spiegelhalter) for having eventually succeeded in finalizing the writing of this book and for making sure that the long-held expectations are not dashed.

As well explained in the Preface, the BUGS project initiated at Cambridge was a very ambitious one and at the forefront of the MCMC movement that revolutionized the development of Bayesian statistics in the early 90’s after the pioneering publication of Gelfand and Smith on Gibbs sampling.

This book comes out after several textbooks have already been published in the area of computational Bayesian statistics using BUGS and/or R (Gelman and Hill, 2007; Marin and Robert, 2007; Ntzoufras, 2009; Congdon, 2003, 2005, 2006, 2010; Kéry, 2010; Kéry and Schaub, 2011 and others). It is neither a theoretical book on foundations of Bayesian statistics (e.g. Bernardo and Smith, 1994; Robert, 2001) nor an academic textbook on Bayesian inference (Gelman et al, 2004, Carlin and Louis, 2008). Instead, it reflects very well the aims and spirit of the BUGS project and is meant to be a manual “for anyone who would like to apply Bayesian methods to real-world problems”.

In spite of its appearance, the book is not elementary. On the contrary, it addresses most of the critical issues faced by statisticians who want to apply Bayesian statistics in a clever and autonomous manner. Although very dense, its typical fluid British style of exposition based on real examples and simple arguments helps the reader to digest without too much pain such ingredients as regression and hierarchical models, model checking and comparison and all kinds of more sophisticated modelling approaches (spatial, mixture, time series, non linear with differential equations, non parametric, etc…).

The book consists of twelve chapters and three appendices specifically devoted to BUGS (A: syntax; B: functions and C: distributions) which are very helpful for practitioners. The book is illustrated with numerous examples. The exercises are well presented and explained, and the corresponding code is made available on a web site. Read more »

about randomness (im Hamburg)

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , on February 20, 2013 by xi'an

exhibit in DESY campus, Hamburg, Germany, Feb. 19, 2013True randomness was the topic of the `Random numbers; fifty years later’ talk in DESY by Frederick James from CERN. I had discussed a while ago a puzzling book related to this topic. This talk went along a rather different route, focussing on random generators. James put this claim that there are computer based physical generators that are truly random. (He had this assertion that statisticians do not understand randomness because they do not know quantum mechanics.) He distinguished those from pseudo-random generators: “nobody understood why they were (almost) random”, “IBM did not know how to generate random numbers”… But then spent the whole talk discussing those pseudo-random generators. Among other pieces of trivia, James mentioned that George Marsaglia was the one exhibiting the hyperplane features of congruential generators. That Knuth achieved no successful definition of what randomness is in his otherwise wonderful books! James thus introduced Kolmogorov’s mixing (not Kolmogorov’s complexity, mind you!) as advocated by Soviet physicists to underlie randomness. Not producing anything useful for RNGs in the 60′s. He then moved to the famous paper by Ferrenberg, Landau and Wong (1992) that I remember reading more or less at the time. In connection with the phase transition critical slowing down phenomena in Ising model simulations. And connecting with the Wang-Landau algorithm of flipping many sites at once (which exhibited long-term dependences in the generators). Most interestingly, a central character in this story is Martin Lüscher, based in DESY, who expressed the standard generator of the time RCARRY into one studied by those Soviet mathematicians,

X’=AX

showing that it enjoyed Kolmogorov mixing, but with a very poor Lyapunov coefficient. I partly lost track there as RCARRY was not perfect. And on how this Kolmogorov mixing would relate to long-term dependencies. One explanation by James was that this property is only asymptotic. (I would even say statistical!) Also interestingly, the 1994 paper by Lüscher produces the number of steps necessary to attain complete mixing, namely 15 steps, which thus works as a cutoff point. (I wonder why a 15-step RCARRY is slower, since A15 can be computed at once… It may be due to the fact that A is sparse while A15 is not.) James mentioned that Marsaglia’s Die Hard battery of tests is now obsolete and superseded by Pierre Lecuyer’s TestU01.

In conclusion, I did very much like this presentation from an insider, but still do not feel it makes a contribution to the debate on randomness, as it stayed put on pseudorandom generators. To keep the connection with von Neumann, they all produce wrong answers from a randomness point of view, if not from a statistical one. (A final quote from the talk: “Among statisticians and number theorists who are supposed to be specialists, they do not know about Kolmogorov mixing.”) [Discussing with Fred James at the reception after the talk was obviously extremely pleasant, as he happened to know a lot of my Bayesian acquaintances!]

Follow

Get every new post delivered to your Inbox.

Join 335 other followers