Archive for prior assessment

O’Bayes 19/1 [snapshots]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , on June 30, 2019 by xi'an

Although the tutorials of O’Bayes 2019 of yesterday were poorly attended, albeit them being great entries into objective Bayesian model choice, recent advances in MCMC methodology, and the multiple layers of BART, for which I have to blame myself for sticking the beginning of O’Bayes too closely to the end of BNP as only the most dedicated could achieve the commuting from Oxford to Coventry to reach Warwick in time, the first day of talks were well attended, despite weekend commitments, conference fatigue, and perfect summer weather! Here are some snapshots from my bench (and apologies for not covering better the more theoretical talks I had trouble to follow, due to an early and intense morning swimming lesson! Like Steve Walker’s utility based derivation of priors that generalise maximum entropy priors. But being entirely independent from the model does not sound to me like such a desirable feature… And Natalia Bochkina’s Bernstein-von Mises theorem for a location scale semi-parametric model, including a clever construct of a mixture of two Dirichlet priors to achieve proper convergence.)

Jim Berger started the day with a talk on imprecise probabilities, involving the society for imprecise probability, which I discovered while reading Keynes’ book, with a neat resolution of the Jeffreys-Lindley paradox, when re-expressing the null as an imprecise null, with the posterior of the null no longer converging to one, with a limit depending on the prior modelling, if involving a prior on the bias as well, with Chris discussing the talk and mentioning a recent work with Edwin Fong on reinterpreting marginal likelihood as exhaustive X validation, summing over all possible subsets of the data [using log marginal predictive].Håvard Rue did a follow-up talk from his Valencià O’Bayes 2015 talk on PC-priors. With a pretty hilarious introduction on his difficulties with constructing priors and counseling students about their Bayesian modelling. With a list of principles and desiderata to define a reference prior. However, I somewhat disagree with his argument that the Kullback-Leibler distance from the simpler (base) model cannot be scaled, as it is essentially a log-likelihood. And it feels like multivariate parameters need some sort of separability to define distance(s) to the base model since the distance somewhat summarises the whole departure from the simpler model. (Håvard also joined my achievement of putting an ostrich in a slide!) In his discussion, Robin Ryder made a very pragmatic recap on the difficulties with constructing priors. And pointing out a natural link with ABC (which brings us back to Don Rubin’s motivation for introducing the algorithm as a formal thought experiment).

Sara Wade gave the final talk on the day about her work on Bayesian cluster analysis. Which discussion in Bayesian Analysis I alas missed. Cluster estimation, as mentioned frequently on this blog, is a rather frustrating challenge despite the simple formulation of the problem. (And I will not mention Larry’s tequila analogy!) The current approach is based on loss functions directly addressing the clustering aspect, integrating out the parameters. Which produces the interesting notion of neighbourhoods of partitions and hence credible balls in the space of partitions. It still remains unclear to me that cluster estimation is at all achievable, since the partition space explodes with the sample size and hence makes the most probable cluster more and more unlikely in that space. Somewhat paradoxically, the paper concludes that estimating the cluster produces a more reliable estimator on the number of clusters than looking at the marginal distribution on this number. In her discussion, Clara Grazian also pointed the ambivalent use of clustering, where the intended meaning somehow diverges from the meaning induced by the mixture model.

probabilistic numerics and uncertainty in computations

Posted in Books, pictures, Statistics, University life with tags , , , , , , on June 10, 2015 by xi'an

“We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations.” (p.1)

Philipp Hennig, Michael Osborne and Mark Girolami (Warwick) posted on arXiv a paper to appear in Proceedings A of the Royal Statistical Society that relates to the probabilistic numerics workshop they organised in Warwick with Chris Oates two months ago. The paper is both a survey and a tribune about the related questions the authors find of most interest. The overall perspective is proceeding along Persi Diaconis’ call for a principled Bayesian approach to numerical problems. One interesting argument made from the start of the paper is that numerical methods can be seen as inferential rules, in that a numerical approximation of a deterministic quantity like an integral can be interpreted as an estimate, even as a Bayes estimate if a prior is used on the space of integrals. I am always uncertain about this perspective, as for instance illustrated in the post about the missing constant in Larry Wasserman’s paradox. The approximation may look formally the same as an estimate, but there is a design aspect that is almost always attached to numerical approximations and rarely analysed as such. Not mentioning the somewhat philosophical issue that the integral itself is a constant with no uncertainty (while a statistical model should always entertain the notion that a model can be mis-specified). The distinction explains why there is a zero variance importance sampling estimator, while there is no uniformly zero variance estimator in most parametric models. At a possibly deeper level, the debate that still invades the use of Bayesian inference to solve statistical problems would most likely resurface in numerics, in that the significance of a probability statement surrounding a mathematical quantity can only be epistemic and relate to the knowledge (or lack thereof) about this quantity rather than to the quantity itself.

“(…) formulating quadrature as probabilistic regression precisely captures a trade-off between prior assumptions inherent in a computation and the computational effort required in that computation to achieve a certain precision. Computational rules arising from a strongly constrained hypothesis class can perform much better than less restrictive rules if the prior assumptions are valid.” (p.7)

Another general worry [repeating myself] about setting a prior in those functional spaces is that the posterior may then mostly reflect the choice of the prior rather than the information contained in the “data”. The above quote mentions prior assumptions that seem hard to build from prior opinion about the functional of interest. And even less about the function itself. Coming back from a gathering of “objective Bayesians“, it seems equally hard to agree upon a reference prior. However, since I like the alternative notion of using decision theory in conjunction with probabilistic numerics, it seems hard to object to the use of priors, given the “invariance” of prior x loss… But I would like to understand better how it is possible to check for prior assumption (p.7) without using the data. Or maybe it does not matter so much in this setting? Unlikely, as indicated in the remarks about the bias resulting from the active design (p.13).

A last issue I find related to the exploratory side of the paper is the “big world versus small worlds” debate, namely whether we can use the Bayesian approach to solve a sequence of small problems rather than trying to solve the big problem all at once. Which forces us to model the entirety of unknowns. And almost certainly fail. (This is was the point of the Robbins-Wasserman counterexample.) Adopting a sequence of solutions may be construed as incoherent in that the prior distribution is adapted to the problem rather than encompassing all problems. Although this would not shock the proponents of reference priors.

O’Bayes 2015 [day #3]

Posted in Statistics, Travel, University life, Wines with tags , , , , , , , , on June 5, 2015 by xi'an

vale6The third day of the meeting was a good illustration of the diversity of the themes [says a member of the scientific committee!], from “traditional” O’Bayes talks on reference priors by the father of all reference priors (!), José Bernardo, re-examinations of expected posterior priors, on properties of Bayes factors, or on new versions of the Lindley-Jeffreys paradox, to the radically different approach of Simpson et al. presented by Håvard Rue. I was obviously most interested in posterior expected priors!, with the new notion brought in by Dimitris Fouskakis, Ioannis Ntzoufras and David Draper of a lower impact of the minimal sample on the resulting prior by the trick of a lower (than one) power of the likelihood. Since this change seemed to go beyond the “minimal” in minimal sample size, I am somehow puzzled that this can be achieved, but the normal example shows this is indeed possible. The next difficulty is then in calibrating this power as I do not see any intuitive justification in a specific power. The central talk of the day was in my opinion Håvard’s as it challenged most tenets of the Objective Bayes approach, presented in a most eager tone, even though it did not generate particularly heated comments from the audience. I have already discussed here an earlier version of this paper and I keep on thinking this proposal for PC priors is a major breakthrough in the way we envision priors and their derivation. I was thus sorry to hear the paper had not been selected as a Read Paper by the Royal Statistical Society, as it would have nicely suited an open discussion, but I hope it will find another outlet that allows for a discussion! As an aside, Håvard discussed the case of a Student’s t degree of freedom as particularly challenging for prior construction, albeit I would have analysed the problem using instead a model choice perspective (on an usually continuous space of models).

montanaAs this conference day had a free evening, I took the tram with friends to the town beach and we had a fantastic [if hurried] dinner in a small bodega [away from the uninspiring beach front] called Casa Montaña, a place decorated with huge barrels, offering amazing tapas and wines, a perfect finale to my Spanish trip. Too bad we had to vacate the dinner room for the next batch of customers…