Archive for objective Bayes

O’Bayes 19/1 [snapshots]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , on June 30, 2019 by xi'an

Although the tutorials of O’Bayes 2019 of yesterday were poorly attended, albeit them being great entries into objective Bayesian model choice, recent advances in MCMC methodology, and the multiple layers of BART, for which I have to blame myself for sticking the beginning of O’Bayes too closely to the end of BNP as only the most dedicated could achieve the commuting from Oxford to Coventry to reach Warwick in time, the first day of talks were well attended, despite weekend commitments, conference fatigue, and perfect summer weather! Here are some snapshots from my bench (and apologies for not covering better the more theoretical talks I had trouble to follow, due to an early and intense morning swimming lesson! Like Steve Walker’s utility based derivation of priors that generalise maximum entropy priors. But being entirely independent from the model does not sound to me like such a desirable feature… And Natalia Bochkina’s Bernstein-von Mises theorem for a location scale semi-parametric model, including a clever construct of a mixture of two Dirichlet priors to achieve proper convergence.)

Jim Berger started the day with a talk on imprecise probabilities, involving the society for imprecise probability, which I discovered while reading Keynes’ book, with a neat resolution of the Jeffreys-Lindley paradox, when re-expressing the null as an imprecise null, with the posterior of the null no longer converging to one, with a limit depending on the prior modelling, if involving a prior on the bias as well, with Chris discussing the talk and mentioning a recent work with Edwin Fong on reinterpreting marginal likelihood as exhaustive X validation, summing over all possible subsets of the data [using log marginal predictive].Håvard Rue did a follow-up talk from his Valencià O’Bayes 2015 talk on PC-priors. With a pretty hilarious introduction on his difficulties with constructing priors and counseling students about their Bayesian modelling. With a list of principles and desiderata to define a reference prior. However, I somewhat disagree with his argument that the Kullback-Leibler distance from the simpler (base) model cannot be scaled, as it is essentially a log-likelihood. And it feels like multivariate parameters need some sort of separability to define distance(s) to the base model since the distance somewhat summarises the whole departure from the simpler model. (Håvard also joined my achievement of putting an ostrich in a slide!) In his discussion, Robin Ryder made a very pragmatic recap on the difficulties with constructing priors. And pointing out a natural link with ABC (which brings us back to Don Rubin’s motivation for introducing the algorithm as a formal thought experiment).

Sara Wade gave the final talk on the day about her work on Bayesian cluster analysis. Which discussion in Bayesian Analysis I alas missed. Cluster estimation, as mentioned frequently on this blog, is a rather frustrating challenge despite the simple formulation of the problem. (And I will not mention Larry’s tequila analogy!) The current approach is based on loss functions directly addressing the clustering aspect, integrating out the parameters. Which produces the interesting notion of neighbourhoods of partitions and hence credible balls in the space of partitions. It still remains unclear to me that cluster estimation is at all achievable, since the partition space explodes with the sample size and hence makes the most probable cluster more and more unlikely in that space. Somewhat paradoxically, the paper concludes that estimating the cluster produces a more reliable estimator on the number of clusters than looking at the marginal distribution on this number. In her discussion, Clara Grazian also pointed the ambivalent use of clustering, where the intended meaning somehow diverges from the meaning induced by the mixture model.

O’Bayes 2019 has now started!

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on June 28, 2019 by xi'an

The O’Bayes 2019 conference in Warwick University has now started, with about 100 participants meeting over four days (plus one of tutorials) in the Zeeman maths building of the University. Quite a change of location and weather when compared with the previous one in Austin. As an organiser I hope all goes well at the practical level and want to thank the other persons who helped me towards this goal, first and foremost Paula Matthews who solved web and lodging and planning issues all over these past months, as well as Mark Steel and Cristiano Villa. As a member of the scientific committee, I am looking forward the talks and discussants along the coming four days, again hoping all speakers and discussants show up and are not hindered by travel or visa issues…

O’Bayes 2019 conference program

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on May 13, 2019 by xi'an

The full and definitive program of the O’Bayes 2019 conference in Warwick is now on line. Including discussants for all papers. And the three [and free] tutorials on Friday afternoon, 28 June, on model selection (M. Barbieri), MCMC recent advances (G.O. Roberts) and BART (E.I. George). Registration remains open at the reduced rate and submissions of posters can still be sent to me for all conference participants.

O’Bayes 2019: speakers, discussants, posters!

Posted in pictures, Statistics, University life with tags , , , , on January 2, 2019 by xi'an

The program for the next O’Bayes conference in Warwick, 28 June-02 July, 2019, is now set. Speakers and discussants have been contacted by the scientific committee and accepted our invitation! As usual, there will be poster sessions on the nights of 29 and 30 June and the call is open for poster submissions, until January 31, to be sent to me as a one page pdf document containing either the poster itself or title, abstract and references. (Using my email address at either Dauphine or Warwick is fine. Or bayesianstatistics on gmail.)

noninformative Bayesian prior with a finite support

Posted in Statistics, University life with tags , , , , , , on December 4, 2018 by xi'an

A few days ago, Pierre Jacob pointed me to a PNAS paper published earlier this year on a form of noninformative Bayesian analysis by Henri Mattingly and coauthors. They consider a prior that “maximizes the mutual information between parameters and predictions”, which sounds very much like José Bernardo’s notion of reference priors. With the rather strange twist of having the prior depending on the data size m even they work under an iid assumption. Here information is defined as the difference between the entropy of the prior and the conditional entropy which is not precisely defined in the paper but looks like the expected [in the data x] Kullback-Leibler divergence between prior and posterior. (I have general issues with the paper in that I often find it hard to read for a lack of precision and of definition of the main notions.)

One highly specific (and puzzling to me) feature of the proposed priors is that they are supported by a finite number of atoms, which reminds me very much of the (minimax) least favourable priors over compact parameter spaces, as for instance in the iconic paper by Casella and Strawderman (1984). For the same mathematical reason that non-constant analytic functions must have separated maxima. This is conducted under the assumption and restriction of a compact parameter space, which must be chosen in most cases. somewhat arbitrarily and not without consequences. I can somehow relate to the notion that a finite support prior translates the limited precision in the estimation brought by a finite sample. In other words, given a sample size of m, there is a maximal precision one can hope for, producing further decimals being silly. Still, the fact that the support of the prior is fixed a priori, completely independently of the data, is both unavoidable (for the prior to be prior!) and very dependent on the choice of the compact set. I would certainly prefer to see a maximal degree of precision expressed a posteriori, meaning that the support would then depend on the data. And handling finite support posteriors is rather awkward in that many notions like confidence intervals do not make much sense in that setup. (Similarly, one could argue that Bayesian non-parametric procedures lead to estimates with a finite number of support points but these are determined based on the data, not a priori.)

Interestingly, the derivation of the “optimal” prior is operated by iterations where the next prior is the renormalised version of the current prior times the exponentiated Kullback-Leibler divergence, which is “guaranteed to converge to the global maximum” for a discretised parameter space. The authors acknowledge that the resolution is poorly suited to multidimensional settings and hence to complex models, and indeed the paper only covers a few toy examples of moderate and even humble dimensions.

Another difficulty with the paper is the absence of temporal consistency: since the prior depends on the sample size, the posterior for n i.i.d. observations is no longer the prior for the (n+1)th observation.

“Because it weights the irrelevant parameter volume, the Jeffreys prior has strong dependence on microscopic effects invisible to experiment”

I simply do not understand the above sentence that apparently counts as a criticism of Jeffreys (1939). And would appreciate anyone enlightening me! The paper goes into comparing priors through Bayes factors, which ignores the main difficulty of an automated solution such as Jeffreys priors in its inability to handle infinite parameter spaces by being almost invariably improper.