Archive for Prato della Valle
This morning, in a brisk sunrise over Sant’ Antonio basilica, we (a dozen participants to the workshop) had a memorial run around Prato della Valle, doing a few loops around this unique piazza in rememberance of this hardcore runner among hardcore runners, George Casella. The light was exactly the same as the last time I had see George running around the structure and we shared a good time reminiscing about him. Thanks to all who managed to made it there despite the early hour (and the nice dinner at Café Pedrocchi the night before).
Needless to say, it is with great pleasure I am back in beautiful Padova for the workshop Recent Advances in statistical inference: theory and case studies, organised by Laura Ventura and Walter Racugno. Esp. when considering this is one of the last places I met with George Casella, in June 2010. As we have plenty of opportunities to remember him with so many of his friends here. (Tomorrow we will run around Prato della Valle in his memory.)
The workshop is of a “traditional Bayesian facture”, I mean one I enjoy very much: long talks with predetermined discussants and discussion from the floor. This makes for less talks (although we had eight today!) but also for more exciting sessions if the talks are broad and innovative. This was the case today (not including my talk of course) and I enjoyed the sessions a lot.
Jim Berger gave the first talk on “global” objective priors, starting from the desiderata to build a “general” reference prior when one does not want to separate parameters of interest from nuisance parameters and when one already has marginal reference priors on those parameters. This setting was actually addressed in Berger and Sun (AoS, 2008) and Jim presented some of the solutions therein: while I could not really see a strong incentive in using an arithmetic average of those, because it does not make much sense with improper priors, I definitely liked the notion of geometric averages, which evacuate the problem of the normalising constants. (There are open questions as well, about whether one improper prior could dwarf another one in the geometric average. Tail-wise for instance. Gauri Datta mentioned in his discussion that the geometric average is a specific Kullback-Leibler optimum.)
In his discussion of Tom Severini’s paper on integrated likelihood (which really stands at the margin of Bayesian inference), Brunero Liseo proposed a new use of ABC to approximate the likelihood function (while regular ABC relies on an approximation of the likelihood), a bit à la Chib. I cannot tell about the precision of this approximation but this is rather exciting!
Laura Ventura presented four of her current papers on the use of high order asymptotics in approximating (Bayesian) posteriors, following the JASA 2012 paper by Ventura, Cabras and Racugno. (The same issue featured a paper by Gill and Casella, coincidentally.) She showed the improvement brought by moving from first order (normal) to third order (non-normal). This is in a sense at the antipode of ABC, e.g. I’d like to see the requirements on the likelihood functions to be able to come up with a manageable Laplace approximation. She also mentioned a resolution of the Jeffreys-Lindley paradox via the Pereira et al. (2008) evidence, which computes a sort of Bayesian p-value by assessing the posterior probability of the posterior density being lower than its value at the null. I had missed or forgotten about this idea, but I wonder at some caveats like the impact of parameterisation, the connection with the testing problem, the calibration of the quantity, the extension to non-nested models, &tc. (Note that Ventura et al. developed an R package called hoa, for higher-order asymptotics.)
David Dunson presented some very recent work on compressed sensing that summed up for me into the idea of massively projecting (huge vectors of) regressors into much smaller dimension convex combinations, using random matrices for the projections. This point was somehow unclear to me. And to the first discussant Michael Wiper as well, who stressed that a completely random selection of those matrices could produce “mostly rubbish”, unless a learning mechanism was instated. The second discussant, Peter Müller, made the same point about this completely random search in a huge dimension space, while considering the survival frequency of covariates could help towards the efficiency of the method.