## workshop a Padova

Posted in pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on March 22, 2013 by xi'an

Needless to say, it is with great pleasure I am back in beautiful Padova for the workshop Recent Advances in statistical inference: theory and case studies, organised by Laura Ventura and Walter Racugno. Esp. when considering this is one of the last places I met with George Casella, in June 2010. As we have plenty of opportunities to remember him with so many of his friends here. (Tomorrow we will run around Prato della Valle in his memory.)

The workshop is of a “traditional Bayesian facture”, I mean one I enjoy very much: long talks with predetermined discussants and discussion from the floor. This makes for less talks (although we had eight today!) but also for more exciting sessions if the talks are broad and innovative. This was the case today (not including my talk of course) and I enjoyed the sessions a lot.

Jim Berger gave the first talk on “global” objective priors, starting from the desiderata to build a “general” reference prior when one does not want to separate parameters of interest from nuisance parameters and when one already has marginal reference priors on those parameters. This setting was actually addressed in Berger and Sun (AoS, 2008) and Jim presented some of the solutions therein: while I could not really see a strong incentive in using an arithmetic average of those, because it does not make much sense with improper priors, I definitely liked the notion of geometric averages, which evacuate the problem of the normalising constants. (There are open questions as well, about whether one improper prior could dwarf another one in the geometric average. Tail-wise for instance. Gauri Datta mentioned in his discussion that the geometric average is a specific Kullback-Leibler optimum.)

In his discussion of Tom Severini’s paper on integrated likelihood (which really stands at the margin of Bayesian inference), Brunero Liseo proposed a new use of ABC to approximate the likelihood function (while regular ABC relies on an approximation of the likelihood), a bit à la Chib. I cannot tell about the precision of this approximation but this is rather exciting!

Laura Ventura presented four of her current papers on the use of high order asymptotics in approximating (Bayesian) posteriors, following the JASA 2012 paper by Ventura, Cabras and Racugno. (The same issue featured a paper by Gill and Casella, coincidentally.) She showed the improvement brought by moving from first order (normal) to third order (non-normal). This is in a sense at the antipode of ABC, e.g. I’d like to see the requirements on the likelihood functions to be able to come up with a manageable Laplace approximation. She also mentioned a resolution of the Jeffreys-Lindley paradox via the Pereira et al. (2008) evidence, which computes a sort of Bayesian p-value by assessing the posterior probability of the posterior density being lower than its value at the null. I had missed or forgotten about this idea, but I wonder at some caveats like the impact of parameterisation, the connection with the testing problem, the calibration of the quantity, the extension to non-nested models, &tc. (Note that Ventura et al. developed an R package called hoa, for higher-order asymptotics.)

David Dunson presented some very recent work on compressed sensing that summed up for me into the idea of massively projecting (huge vectors of) regressors into much smaller dimension convex combinations, using random matrices for the projections. This point was somehow unclear to me. And to the first discussant Michael Wiper as well, who stressed that a completely random selection of those matrices could produce “mostly rubbish”, unless a learning mechanism was instated. The second discussant, Peter Müller, made the same point about this completely random search in a huge dimension space, while considering the survival frequency of covariates could help towards the efficiency of the method.

## why noninformative priors?

Posted in Books, Statistics, University life with tags , , , , on May 9, 2012 by xi'an

Answering a question around this theme on StackExchange, I wrote the following reply:

The debate about non-informative priors has been going on for ages, at least since the end of the 19th century with criticisms by Bertrand and de Morgan about the lack of invariance of Laplace’s uniform priors (the same criticism reported by Stéphane Laurent in the above comments). This lack of invariance sounded like a death stroke for the Bayesian approach and, while some Bayesians were desperately trying to cling to specific distributions, using less-than-formal arguments, others had a wider vision of a larger picture where priors could be used in situations where there was hardly any prior information, beyond the shape of the likelihood itself. (This was even before Abraham Wald established his admissibility and complete class results about Bayes procedures. And at about the same time as E.J.G. Pitman gave an “objective” derivation of the best invariant estimator as a Bayes estimator against the corresponding Haar measure…)

This vision is best represented by Jeffreys’ distributions, where the information matrix of the sampling model, $I(\theta)$, is turned into a prior distribution

$\pi(\theta) \propto |I(\theta)|^{1/2}$

which is most often improper, i.e. does not integrate to a finite value. The label “non-informative” associated with Jeffreys’ priors is rather unfortunate, as they represent an input from the statistician, hence are informative about something! Similarly, “objective” has an authoritative weight I dislike… I thus prefer the label “reference prior”, used for instance by José Bernado.

Those priors indeed give a reference against which one can compute either the reference estimator/test/prediction or one’s own estimator/test/prediction using a different prior motivated by subjective and objective items of information. To answer directly the question, “why not use only informative priors?”, there is actually no answer. A prior distribution is a choice made by the statistician, neither a state of Nature nor a hidden variable. In other words, there is no “best prior” that one “should use”. Because this is the nature of statistical inference that there is no “best answer”.

Hence my defence of the noninformative/reference choice! It is providing the same range of inferential tools as other priors, but gives answers that are only inspired by the shape of the likelihood function, rather than induced by some opinion about the range of the unknown parameters.

## another Le Monde column

Posted in Books, Statistics, University life with tags , , , , , , , on February 16, 2012 by xi'an

Another column in Le Monde (Sciences) had most unjustly escaped my attention: it mentioned Thomas Bayes on the very front page and I missed it till my most recent breakfast! This article was written by a neuroscientist columnist reporting on current research led by Tali Sharot, UCL, on the prediction mechanisms (if not on her book). The argument is not only that the brain operates in a Bayesian fashion, actualising predictions based on current observations (as exposed at Bayes 250), but also that the updating is not “objective”! While this may sound as if the neuroscientists have entered the debate between objective and subjective Bayesians, the study actually reports a bias toward optimism, when comparing predictions with “objective statistics”. The article concludes on the psychological advantages of this optimism bias. Not so much about Bayesian statistics, then, even though having almost everyone (subconsciously) working with his/her optimistic prior sounds rather cool!

## join ISBA

Posted in University life with tags , , , , , on October 22, 2011 by xi'an

News from ISBA: good time to join for new members! (There is a section on Bayesian non-parametrics and another one on Objective Bayesian methodology. Feel free to propose new sections, like…Bayesian computing.)

ISBA elections  are underway and as part of the Bayesian community we hope that you will participate!  We are updating the electoral lists nightly so if you added a membership after the 15th of October  you will have the opportunity to vote.

We are running a new member promotion: all new members who join ISBA now will have their membership extended by an extra year (except for Lifetime memberships which never expire)!   For example, a 1 year Student membership will expire December 31, 2012, rather than December 31 2011.  Member dues are modest – $15 for student or reduced rate memberships or$35 for regular memberships. This promotion also applies to  all new section memberships in the Objective Bayes or the Bayesian Nonparametrics sections!   Section dues are $5 annually and are available with 1-3 year options to synchronize with ISBA dues; Section Lifetime memberships are available for$75.  You must be a section member to vote in the section elections.

## principles of uncertainty

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , on October 14, 2011 by xi'an

Bayes Theorem is a simple consequence of the axioms of probability, and is therefore accepted by all as valid. However, some who challenge the use of personal probability reject certain applications of Bayes Theorem.“  J. Kadane, p.44

Principles of uncertainty by Joseph (“Jay”) Kadane (Carnegie Mellon University, Pittsburgh) is a profound and mesmerising book on the foundations and principles of subjectivist or behaviouristic Bayesian analysis. Jay Kadane wrote Principles of uncertainty over a period of several years and, more or less in his own words, it represents the legacy he wants to leave for the future. The book starts with a large section on Jay’s definition of a probability model, with rigorous mathematical derivations all the way to Lebesgue measure (or more exactly the McShane-Stieltjes measure). This section contains many side derivations that pertain to mathematical analysis, in order to explain the subtleties of infinite countable and uncountable sets, and the distinction between finitely additive and countably additive (probability) measures. Unsurprisingly, the role of utility is emphasized in this book that keeps stressing the personalistic entry to Bayesian statistics. Principles of uncertainty also contains a formal development on the validity of Markov chain Monte Carlo methods that is superb and missing in most equivalent textbooks. Overall, the book is a pleasure to read. And highly recommended for teaching as it can be used at many different levels. Read more »