**J**eremias Knoblauch, Jack Jewson and Theodoros Damoulas, all affiliated with Warwick (hence a potentially biased reading!), arXived a paper on loss-based Bayesian inference that Jack discussed with me on my last visit to Warwick. As I was somewhat scared by the 61 pages, of which the 8 first pages are in NeurIPS style. The authors argue for a decision-theoretic approach to Bayesian inference that involves a loss over distributions and a divergence from the prior. For instance, when using the log-score as the loss and the Kullback-Leibler divergence, the regular posterior emerges, as shown by Arnold Zellner. Variational inference also falls under this hat. The argument for this generalization is that any form of loss can be used and still returns a distribution that is used to assess uncertainty about the parameter (of interest). In the axioms they produce for justifying the derivation of the optimal procedure, including cases where the posterior is restricted to a certain class, one [Axiom 4] generalizes the likelihood principle. Given the freedom brought by this general framework, plenty of fringe Bayes methods like standard variational Bayes can be seen as solutions to such a decision problem. Others like EP do not. Of interest to me are the potentials for this formal framework to encompass misspecification and likelihood-free settings, as well as for assessing priors, which is always a fishy issue. (The authors mention in addition the capacity to build related specific design Bayesian deep networks, of which I know nothing.) The obvious reaction of mine is one of facing an abundance of wealth (!) but encompassing approximate Bayesian solutions within a Bayesian framework remains an exciting prospect.

## Archive for University of Warwick

## a generalized representation of Bayesian inference

Posted in Books with tags approximate Bayesian inference, Bayesian decision theory, Bayesian robustness, Kullback-Leibler divergence, Likelihood Principle, University of Warwick, variational inference on July 5, 2019 by xi'an## O’Bayes 19/4

Posted in Books, pictures, Running, Statistics, Travel, University life with tags Bayesian model choice, Carl Friedrich Gauss, Coventry, Dickey-Savage ratio, factor analysis, improper priors, large p small n, nettles, O'Bayes 2019, Power-Expected-Posterior Priors, sent to Coventry, University of Warwick on July 4, 2019 by xi'anLast talks of the conference! With Rui Paulo (along with Gonzalo Garcia-Donato) considering the special case of factors when doing variable selection. Which is an interesting question that I had never considered, as at best I would remove all leves or keeping them all. Except that there may be misspecification in the factors as for instance when several levels have the same impact.With Michael Evans discussing a paper that he wrote for the conference! Following his own approach to statistical evidence. And including his reluctance to cover infinity (calling on Gauß for backup!) or continuity, and his call to falsify a Bayesian model by checking it can be contradicted by the data. His assumption that checking for prior is separable from checking for [sampling] model is debatable. (With another mention made of the Savage-Dickey ratio.)

And with Dimitris Fouskakis giving a wide ranging assessment [which Mark Steel (Warwick) called a PEP talk!] of power-expected-posterior priors, used with reference (and usually improper) priors. Which in retrospect would have suited better the beginning of the conference as it provided a background to several of the talks. Raising a question (from my perspective) on using the maximum likelihood estimator as a pseudo-sufficient statistic when this MLE is computed for the base (simplest) model. Maybe an ABC induced bias in this question as it would not work for ABC model choice.

Overall, I think the scientific outcomes of the conference were quite positive: a wide range of topics and perspectives, a reasonable and diverse attendance, especially when considering the heavy load of related conferences in the surrounding weeks (the “June fatigue”!), animated poster sessions. I am obviously not the one to assess the organisation of the conference! Things I forgot to do in this regard: organise transportation from Oxford to Warwick University, provide an attached room for in-pair research, insist on sustainability despite the imposed catering solution, facilitate sharing joint transportation to and from the Warwick campus, mention that tap water was potable, and… wear long pants when running in nettles.

## O’Bayes 19/3.5

Posted in Books, pictures, Travel, University life with tags #betterposter, Beamer, O'Bayes 2019, poster session, University of Warwick on July 3, 2019 by xi'an

**A**mong the posters at the second poster session yesterday night, one by Judith ter Schure visually standing out by following the #betterposter design suggested by Mike Morrison a few months ago. Design on which I have ambivalent feelings. On the one hand, reducing the material on a poster is generally a good idea as they tend to be saturated and hard to read, especially in crowded conditions. Having the main idea or theorem immediately visible should indeed be a requirement, from immediately getting the point to starting from the result in explaining the advances in the corresponding work. But if this format becomes the standard, it will become harder to stand out! More fundamentally, this proposal may fall into the same abyss as powerpoint presentations, which is that insisting in making the contents simpler and sparser may reach the no-return point of no content [which was not the case of the above poster, let me hasten to state!]. Mathematical statistics poster may be automatically classified as too complicated for this #betterposter challenge as containing maths formulas! Or too many Greek letters as someone complained after one of my talks. And treating maths formulas as detail makes them even smaller than usual, which sounds like the opposite of the intended effect. (The issue is discussed on the betterposter blog, for a variety of opinions, mostly at odds with mine’s.)

## O’Bayes 19/3

Posted in Books, pictures, Statistics, Travel, University life with tags #betterposter, Beamer, BFF4, Coventry, Dickey-Savage ratio, dominating measure, frequency properties, minimaxity, O'Bayes 2019, poster session, SafeBayes, sent to Coventry, town of Warwick, uniform distribution, University of Coventry, University of Warwick on July 2, 2019 by xi'an**N**ancy Reid gave the first talk of the [Canada] day, in an impressive comparison of all approaches in statistics that involve a distribution of sorts on the parameter, connected with the presentation she gave at BFF4 in Harvard two years ago, including safe Bayes options this time. This was related to several (most?) of the talks at the conference, given the level of worry (!) about the choice of a prior distribution. But the main assessment of the methods still seemed to be centred on a frequentist notion of calibration, meaning that epistemic interpretations of probabilities and hence most of Bayesian answers were disqualified from the start.

In connection with Nancy’s focus, Peter Hoff’s talk also concentrated on frequency valid confidence intervals in (linear) hierarchical models. Using prior information or structure to build better and shrinkage-like confidence intervals at a given confidence level. But not in the decision-theoretic way adopted by George Casella, Bill Strawderman and others in the 1980’s. And also making me wonder at the relevance of contemplating a fixed coverage as a natural goal. Above, a side result shown by Peter that I did not know and which may prove useful for Monte Carlo simulation.

Jaeyong Lee worked on a complex model for banded matrices that starts with a regular Wishart prior on the unrestricted space of matrices, computes the posterior and then projects this distribution onto the constrained subspace. (There is a rather consequent literature on this subject, including works by David Dunson in the past decade of which I was unaware.) This is a smart demarginalisation idea but I wonder a wee bit at the notion as the constrained space has measure zero for the larger model. This could explain for the resulting posterior not being a true posterior for the constrained model in the sense that there is no prior over the constrained space that could return such a posterior. Another form of marginalisation paradox. The crux of the paper is however about constructing a functional form of minimaxity. In his discussion of the paper, Guido Consonni provided a representation of the post-processed posterior (P³) that involves the Dickey-Savage ratio, sort of, making me more convinced of the connection.

As a lighter aside, one item of local information I should definitely have broadcasted more loudly and long enough in advance to the conference participants is that the University of Warwick is not located in ye olde town of Warwick, where there is no university, but on the outskirts of the city of Coventry, but not to be confused with the University of Coventry. Located in Coventry.

## O’Bayes 19/2

Posted in Books, pictures, Running, Travel, University life with tags amenability, Bayesian inference, bronze sculpture, Carleton University, ergodicity, Gibbs posterior, group invariance, John von Neumann, Kenilworth, likelihood-free methods, O'Bayes 2019, Ottawa, Read paper, Royal Statistical Society, summer of British conferences, sunrise, University of Warwick, Zig-Zag on July 1, 2019 by xi'an**O**ne talk on Day 2 of O’Bayes 2019 was by Ryan Martin on data dependent priors (or “priors”). Which I have already discussed in this blog. Including the notion of a Gibbs posterior about quantities that “are not always defined through a model” [which is debatable if one sees it like part of a semi-parametric model]. Gibbs posterior that is built through a pseudo-likelihood constructed from the empirical risk, which reminds me of Bissiri, Holmes and Walker. Although requiring a prior on this quantity that is not part of a model. And is not necessarily a true posterior and not necessarily with the same concentration rate as a true posterior. Constructing a data-dependent distribution on the parameter does not necessarily mean an interesting inference and to keep up with the theme of the conference has no automated claim to [more] “objectivity”.

And after calling a prior both Beauty and The Beast!, Erlis Ruli argued about a “bias-reduction” prior where the prior is solution to a differential equation related with some cumulants, connected with an earlier work of David Firth (Warwick). An interesting conundrum is how to create an MCMC algorithm when the prior is that intractable, with a possible help from PDMP techniques like the Zig-Zag sampler.

While Peter Orbanz’ talk was centred on a central limit theorem under group invariance, further penalised by being the last of the (sun) day, Peter did a magnificent job of presenting the result and motivating each term. It reminded me of the work Jim Bondar was doing in Ottawa in the 1980’s on Haar measures for Bayesian inference. Including the notion of *amenability* [a term due to von Neumann] I had not met since then. (Neither have I met Jim since the last summer I spent in Carleton.) The CLT and associated LLN are remarkable in that the average is not over observations but over shifts of the same observation under elements of a sub-group of transformations. I wondered as well at the potential connection with the Read Paper of Kong et al. in 2003 on the use of group averaging for Monte Carlo integration [connection apart from the fact that both discussants, Michael Evans and myself, are present at this conference].

## O’Bayes 19/1 [snapshots]

Posted in Books, pictures, Statistics, University life with tags ABC algorithm, Bayesian decision theory, clustering, Galaxy, imprecise probabilities, indoor swimming, Jeffreys-Lindley paradox, loss functions, maximum entropy, mixtures of distributions, O'Bayes 2019, objective Bayes, PC priors, prior assessment, score function, Society for Imprecise Probability, University of Warwick, Valencia meeting on June 30, 2019 by xi'an**A**lthough the tutorials of O’Bayes 2019 of yesterday were poorly attended, albeit them being great entries into objective Bayesian model choice, recent advances in MCMC methodology, and the multiple layers of BART, for which I have to blame myself for sticking the beginning of O’Bayes too closely to the end of BNP as only the most dedicated could achieve the commuting from Oxford to Coventry to reach Warwick in time, the first day of talks were well attended, despite weekend commitments, conference fatigue, and perfect summer weather! Here are some snapshots from my bench (and apologies for not covering better the more theoretical talks I had trouble to follow, due to an early and intense morning swimming lesson! Like Steve Walker’s utility based derivation of priors that generalise maximum entropy priors. But being entirely independent from the model does not sound to me like such a desirable feature… And Natalia Bochkina’s Bernstein-von Mises theorem for a location scale semi-parametric model, including a clever construct of a mixture of two Dirichlet priors to achieve proper convergence.)

Jim Berger started the day with a talk on imprecise probabilities, involving the society for imprecise probability, which I discovered while reading Keynes’ book, with a neat resolution of the Jeffreys-Lindley paradox, when re-expressing the null as an imprecise null, with the posterior of the null no longer converging to one, with a limit depending on the prior modelling, if involving a prior on the bias as well, with Chris discussing the talk and mentioning a recent work with Edwin Fong on reinterpreting marginal likelihood as exhaustive X validation, summing over all possible subsets of the data [using log marginal predictive].Håvard Rue did a follow-up talk from his Valencià O’Bayes 2015 talk on PC-priors. With a pretty hilarious introduction on his difficulties with constructing priors and counseling students about their Bayesian modelling. With a list of principles and desiderata to define a reference prior. However, I somewhat disagree with his argument that the Kullback-Leibler distance from the simpler (base) model cannot be scaled, as it is essentially a log-likelihood. And it feels like multivariate parameters need some sort of separability to define distance(s) to the base model since the distance somewhat summarises the whole departure from the simpler model. (Håvard also joined my achievement of putting an ostrich in a slide!) In his discussion, Robin Ryder made a very pragmatic recap on the difficulties with constructing priors. And pointing out a natural link with ABC (which brings us back to Don Rubin’s motivation for introducing the algorithm as a formal thought experiment).

Sara Wade gave the final talk on the day about her work on Bayesian cluster analysis. Which discussion in Bayesian Analysis I alas missed. Cluster estimation, as mentioned frequently on this blog, is a rather frustrating challenge despite the simple formulation of the problem. (And I will not mention Larry’s tequila analogy!) The current approach is based on loss functions directly addressing the clustering aspect, integrating out the parameters. Which produces the interesting notion of neighbourhoods of partitions and hence credible balls in the space of partitions. It still remains unclear to me that cluster estimation is at all achievable, since the partition space explodes with the sample size and hence makes the most probable cluster more and more unlikely in that space. Somewhat paradoxically, the paper concludes that estimating the cluster produces a more reliable estimator on the number of clusters than looking at the marginal distribution on this number. In her discussion, Clara Grazian also pointed the ambivalent use of clustering, where the intended meaning somehow diverges from the meaning induced by the mixture model.

## O’Bayes 2019 has now started!

Posted in pictures, Running, Statistics, Travel, University life with tags Coventry, Great-Britain, mathematics department, O'Bayes 2019, objective Bayes, subjective versus objective Bayes, summer of British conferences, University of Warwick, Warwickshire, Zeeman building on June 28, 2019 by xi'an**T**he O’Bayes 2019 conference in Warwick University has now started, with about 100 participants meeting over four days (plus one of tutorials) in the Zeeman maths building of the University. Quite a change of location and weather when compared with the previous one in Austin. As an organiser I hope all goes well at the practical level and want to thank the other persons who helped me towards this goal, first and foremost Paula Matthews who solved web and lodging and planning issues all over these past months, as well as Mark Steel and Cristiano Villa. As a member of the scientific committee, I am looking forward the talks and discussants along the coming four days, again hoping all speakers and discussants show up and are not hindered by travel or visa issues…