Archive for O’Bayes 2019

O’Bayes 19/4

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on July 4, 2019 by xi'an

Last talks of the conference! With Rui Paulo (along with Gonzalo Garcia-Donato) considering the special case of factors when doing variable selection. Which is an interesting question that I had never considered, as at best I would remove all leves or keeping them all. Except that there may be misspecification in the factors as for instance when several levels have the same impact.With Michael Evans discussing a paper that he wrote for the conference! Following his own approach to statistical evidence. And including his reluctance to cover infinity (calling on Gauß for backup!) or continuity, and his call to falsify a Bayesian model by checking it can be contradicted by the data. His assumption that checking for prior is separable from checking for [sampling] model is debatable. (With another mention made of the Savage-Dickey ratio.)

And with Dimitris Fouskakis giving a wide ranging assessment [which Mark Steel (Warwick) called a PEP talk!] of power-expected-posterior priors, used with reference (and usually improper) priors. Which in retrospect would have suited better the beginning of the conference as it provided a background to several of the talks. Raising a question (from my perspective) on using the maximum likelihood estimator as a pseudo-sufficient statistic when this MLE is computed for the base (simplest) model. Maybe an ABC induced bias in this question as it would not work for ABC model choice.

Overall, I think the scientific outcomes of the conference were quite positive: a wide range of topics and perspectives, a reasonable and diverse attendance, especially when considering the heavy load of related conferences in the surrounding weeks (the “June fatigue”!), animated poster sessions. I am obviously not the one to assess the organisation of the conference! Things I forgot to do in this regard: organise transportation from Oxford to Warwick University, provide an attached room for in-pair research, insist on sustainability despite the imposed catering solution, facilitate sharing joint transportation to and from the Warwick campus, mention that tap water was potable, and… wear long pants when running in nettles.

O’Bayes 19/3.5

Posted in Books, pictures, Travel, University life with tags , , , , on July 3, 2019 by xi'an


Among the posters at the second poster session yesterday night, one by Judith ter Schure visually standing out by following the #betterposter design suggested by Mike Morrison a few months ago. Design on which I have ambivalent feelings. On the one hand, reducing the material on a poster is generally a good idea as they tend to be saturated and hard to read, especially in crowded conditions. Having the main idea or theorem immediately visible should indeed be a requirement, from immediately getting the point to starting from the result in explaining the advances in the corresponding work. But if this format becomes the standard, it will become harder to stand out! More fundamentally, this proposal may fall into the same abyss as powerpoint presentations, which is that insisting in making the contents simpler and sparser may reach the no-return point of no content [which was not the case of the above poster, let me hasten to state!]. Mathematical statistics poster may be automatically classified as too complicated for this #betterposter challenge as containing maths formulas! Or too many Greek letters as someone complained after one of my talks. And treating maths formulas as detail makes them even smaller than usual, which sounds like the opposite of the intended effect. (The issue is discussed on the betterposter blog, for a variety of opinions, mostly at odds with mine’s.)

O’Bayes 19/3

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on July 2, 2019 by xi'an

Nancy Reid gave the first talk of the [Canada] day, in an impressive comparison of all approaches in statistics that involve a distribution of sorts on the parameter, connected with the presentation she gave at BFF4 in Harvard two years ago, including safe Bayes options this time. This was related to several (most?) of the talks at the conference, given the level of worry (!) about the choice of a prior distribution. But the main assessment of the methods still seemed to be centred on a frequentist notion of calibration, meaning that epistemic interpretations of probabilities and hence most of Bayesian answers were disqualified from the start.

In connection with Nancy’s focus, Peter Hoff’s talk also concentrated on frequency valid confidence intervals in (linear) hierarchical models. Using prior information or structure to build better and shrinkage-like confidence intervals at a given confidence level. But not in the decision-theoretic way adopted by George Casella, Bill Strawderman and others in the 1980’s. And also making me wonder at the relevance of contemplating a fixed coverage as a natural goal. Above, a side result shown by Peter that I did not know and which may prove useful for Monte Carlo simulation.

Jaeyong Lee worked on a complex model for banded matrices that starts with a regular Wishart prior on the unrestricted space of matrices, computes the posterior and then projects this distribution onto the constrained subspace. (There is a rather consequent literature on this subject, including works by David Dunson in the past decade of which I was unaware.) This is a smart demarginalisation idea but I wonder a wee bit at the notion as the constrained space has measure zero for the larger model. This could explain for the resulting posterior not being a true posterior for the constrained model in the sense that there is no prior over the constrained space that could return such a posterior. Another form of marginalisation paradox. The crux of the paper is however about constructing a functional form of minimaxity. In his discussion of the paper, Guido Consonni provided a representation of the post-processed posterior (P³) that involves the Dickey-Savage ratio, sort of, making me more convinced of the connection.

As a lighter aside, one item of local information I should definitely have broadcasted more loudly and long enough in advance to the conference participants is that the University of Warwick is not located in ye olde town of Warwick, where there is no university, but on the outskirts of the city of Coventry, but not to be confused with the University of Coventry. Located in Coventry.


O’Bayes 19/2

Posted in Books, pictures, Running, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 1, 2019 by xi'an

One talk on Day 2 of O’Bayes 2019 was by Ryan Martin on data dependent priors (or “priors”). Which I have already discussed in this blog. Including the notion of a Gibbs posterior about quantities that “are not always defined through a model” [which is debatable if one sees it like part of a semi-parametric model]. Gibbs posterior that is built through a pseudo-likelihood constructed from the empirical risk, which reminds me of Bissiri, Holmes and Walker. Although requiring a prior on this quantity that is  not part of a model. And is not necessarily a true posterior and not necessarily with the same concentration rate as a true posterior. Constructing a data-dependent distribution on the parameter does not necessarily mean an interesting inference and to keep up with the theme of the conference has no automated claim to [more] “objectivity”.

And after calling a prior both Beauty and The Beast!, Erlis Ruli argued about a “bias-reduction” prior where the prior is solution to a differential equation related with some cumulants, connected with an earlier work of David Firth (Warwick).  An interesting conundrum is how to create an MCMC algorithm when the prior is that intractable, with a possible help from PDMP techniques like the Zig-Zag sampler.

While Peter Orbanz’ talk was centred on a central limit theorem under group invariance, further penalised by being the last of the (sun) day, Peter did a magnificent job of presenting the result and motivating each term. It reminded me of the work Jim Bondar was doing in Ottawa in the 1980’s on Haar measures for Bayesian inference. Including the notion of amenability [a term due to von Neumann] I had not met since then. (Neither have I met Jim since the last summer I spent in Carleton.) The CLT and associated LLN are remarkable in that the average is not over observations but over shifts of the same observation under elements of a sub-group of transformations. I wondered as well at the potential connection with the Read Paper of Kong et al. in 2003 on the use of group averaging for Monte Carlo integration [connection apart from the fact that both discussants, Michael Evans and myself, are present at this conference].

O’Bayes 19/1 [snapshots]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , on June 30, 2019 by xi'an

Although the tutorials of O’Bayes 2019 of yesterday were poorly attended, albeit them being great entries into objective Bayesian model choice, recent advances in MCMC methodology, and the multiple layers of BART, for which I have to blame myself for sticking the beginning of O’Bayes too closely to the end of BNP as only the most dedicated could achieve the commuting from Oxford to Coventry to reach Warwick in time, the first day of talks were well attended, despite weekend commitments, conference fatigue, and perfect summer weather! Here are some snapshots from my bench (and apologies for not covering better the more theoretical talks I had trouble to follow, due to an early and intense morning swimming lesson! Like Steve Walker’s utility based derivation of priors that generalise maximum entropy priors. But being entirely independent from the model does not sound to me like such a desirable feature… And Natalia Bochkina’s Bernstein-von Mises theorem for a location scale semi-parametric model, including a clever construct of a mixture of two Dirichlet priors to achieve proper convergence.)

Jim Berger started the day with a talk on imprecise probabilities, involving the society for imprecise probability, which I discovered while reading Keynes’ book, with a neat resolution of the Jeffreys-Lindley paradox, when re-expressing the null as an imprecise null, with the posterior of the null no longer converging to one, with a limit depending on the prior modelling, if involving a prior on the bias as well, with Chris discussing the talk and mentioning a recent work with Edwin Fong on reinterpreting marginal likelihood as exhaustive X validation, summing over all possible subsets of the data [using log marginal predictive].Håvard Rue did a follow-up talk from his Valencià O’Bayes 2015 talk on PC-priors. With a pretty hilarious introduction on his difficulties with constructing priors and counseling students about their Bayesian modelling. With a list of principles and desiderata to define a reference prior. However, I somewhat disagree with his argument that the Kullback-Leibler distance from the simpler (base) model cannot be scaled, as it is essentially a log-likelihood. And it feels like multivariate parameters need some sort of separability to define distance(s) to the base model since the distance somewhat summarises the whole departure from the simpler model. (Håvard also joined my achievement of putting an ostrich in a slide!) In his discussion, Robin Ryder made a very pragmatic recap on the difficulties with constructing priors. And pointing out a natural link with ABC (which brings us back to Don Rubin’s motivation for introducing the algorithm as a formal thought experiment).

Sara Wade gave the final talk on the day about her work on Bayesian cluster analysis. Which discussion in Bayesian Analysis I alas missed. Cluster estimation, as mentioned frequently on this blog, is a rather frustrating challenge despite the simple formulation of the problem. (And I will not mention Larry’s tequila analogy!) The current approach is based on loss functions directly addressing the clustering aspect, integrating out the parameters. Which produces the interesting notion of neighbourhoods of partitions and hence credible balls in the space of partitions. It still remains unclear to me that cluster estimation is at all achievable, since the partition space explodes with the sample size and hence makes the most probable cluster more and more unlikely in that space. Somewhat paradoxically, the paper concludes that estimating the cluster produces a more reliable estimator on the number of clusters than looking at the marginal distribution on this number. In her discussion, Clara Grazian also pointed the ambivalent use of clustering, where the intended meaning somehow diverges from the meaning induced by the mixture model.