Archive for objective Bayes

21w5107 [½day 4]

Posted in Statistics with tags , , , , , , , , , , , , , , on December 3, 2021 by xi'an

Final ½ day of the 21w5107 workshop for me, as our initial plans were to stop today due to the small number of participants on site. And I had booked plane tickets early, too early. I will thus sadly miss the four afternoon talks, mea culpa! However I did attend Noiritt Chandra’s talk on Bayesian factor analysis. Which has always been a bit of a mystery to me in the sense that the number q of factors need be specified, which is a prior input one rarely controls. Here the goal is to estimate a covariance matrix with a sparse representation. And q is estimated by empirical likelihood ahead of the estimation of the matrix. The focus was on minimaxity and MCMC implementation rather than objective Bayes per se! Then, Daniele Durante spoke about analytical posteriors for probit models using unified skew-Normal priors (following a 2019 Biometrika paper). Including marginal posteriors and marginal likelihood. And for various extensions like dynamic probit models. Opening other computational issues such as simulating high dimensional truncated Normal distributions. (Potential use of delayed acceptance there?) This second talk was also drifting away from objective Bayes! In the first half of his talk, Filippo Ascolani introduced us to trees of random probability measures, each mother node being the distribution of the atoms of the children nodes. (Interestingly, Kingman is both connected to (coalescent) trees and to completely random measures.) My naïve first impression was that the distributions would get more and more degenerate as the number of levels in the tree would increase, however I am unsure this is correct as Filippo mentioned getting observations on all nodes. The talk also made me wonder at how this could be related Radford Neal’s Dirichlet trees. (Which I discovered at my first ICMS workshop about 20 years ago.) Yang Ni concluded the morning with a talk on causality that provided (to me) a very smooth (re)introduction to Bayesian causal graphs.

Even more than last time, I enormously enjoyed the workshop, its location, the fantastic staff at the hotel, and the reconnection with dear friends!, just regretting we could not be a few more. I appreciate the efforts made by on-line participants to stay connected and intervene (thanks, Ed!), but the quality of interactions is sadly of another magnitude when spending all our time together. Hopefully there will be a next time and hopefully we’ll then be back to larger size (and hopefully the location will remain the same). Hasta luego, Oaxaca!

21w5107 [½day 3]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on December 2, 2021 by xi'an

Day [or half-day] three started without firecrackers and with David Rossell (formerly Warwick) presenting an empirical Bayes approach to generalised linear model choice with a high degree of confounding, using approximate Laplace approximations. With considerable improvements in the experimental RMSE. Making feeling sorry there was no apparent fully (and objective?) Bayesian alternative! (Two more papers on my reading list that I should have read way earlier!) Then Veronika Rockova discussed her work on approximate Metropolis-Hastings by classification. (With only a slight overlap with her One World ABC seminar.) Making me once more think of Geyer’s n⁰564 technical report, namely the estimation of a marginal likelihood by a logistic discrimination representation. Her ABC resolution replaces the tolerance step by an exponential of minus the estimated Kullback-Leibler divergence between the data density and the density associated with the current value of the parameter. (I wonder if there is a residual multiplicative constant there… Presumably not. Great idea!) The classification step need be run at every iteration, which could be sped up by subsampling.

On the always fascinating theme of loss based posteriors, à la Bissiri et al., Jack Jewson (formerly Warwick) exposed his work generalised Bayesian and improper models (from Birmingham!). Using data to decide between model and loss, which sounds highly unorthodox! First difficulty is that losses are unscaled. Or even not integrable after an exponential transform. Hence the notion of improper models. As in the case of robust Tukey’s loss, which is bounded by an arbitrary κ. Immediately I wonder if the fact that the pseudo-likelihood does not integrate is important beyond the (obvious) absence of a normalising constant. And the fact that this is not a generative model. And the answer came a few slides later with the use of the Hyvärinen score. Rather than the likelihood score. Which can itself be turned into a H-posterior, very cool indeed! Although I wonder at the feasibility of finding an [objective] prior on κ.

Rajesh Ranganath completed the morning session with a talk on [the difficulty of] connecting Bayesian models and complex prediction models. Using instead a game theoretic approach with Brier scores under censoring. While there was a connection with Veronika’s use of a discriminator as a likelihood approximation, I had trouble catching the overall message…

two ISBA meetings in 2022

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on May 11, 2021 by xi'an

As in 2019, both the O’Bayes and BNP conferences will occur the same year, if not back-to-back as in 2019 (when they were in neighbouring Oxford and Warwick, respectively). To quote from the current Chair of the OBayes section of ISBA, my friend Gonzalo,

“the next International Workshop on Objective Bayes Methodology (O-Bayes, OBayes, O’Bayes, Ohhh Bayes,..) is scheduled for September 2022 from 7th (Wed) to 10th (Sat) and will be hosted by University of California, Santa Cruz. This will be the 14th meeting of one of the longest-running and preeminent meetings in Bayesian statistics (the 1st was in USA 1996; the last one in 2019 in UK). In this conference, we will be celebrating the 70th birthday of Luis Pericchi an extraordinary person who has been very influential in the successful development of OBayesian ideas.”

thus seeing the O’Bayes meeting taking place in North America in early Fall, followed by BNP 13 in South America a month later (and thus Spring!), quoting from Alessandra Guglielmi:

“Due to the ongoing COVID-19 pandemic, the 13th World Meeting on Bayesian Nonparametrics (BNP Workshop) is postponed until 2022. The meeting is currently scheduled to take place in Puerto Varas, Chile, October 24-28, 2022.”

El lago Llanquihue, con el volcán Osorno al fondo Yuri de Mesquita Bar / Getty Images/iStockphoto

 

a case for Bayesian deep learnin

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on September 30, 2020 by xi'an

Andrew Wilson wrote a piece about Bayesian deep learning last winter. Which I just read. It starts with the (posterior) predictive distribution being the core of Bayesian model evaluation or of model (epistemic) uncertainty.

“On the other hand, a flat prior may have a major effect on marginalization.”

Interesting sentence, as, from my viewpoint, using a flat prior is a no-no when running model evaluation since the marginal likelihood (or evidence) is no longer a probability density. (Check Lindley-Jeffreys’ paradox in this tribune.) The author then goes for an argument in favour of a Bayesian approach to deep neural networks for the reason that data cannot be informative on every parameter in the network, which should then be integrated out wrt a prior. He also draws a parallel between deep ensemble learning, where random initialisations produce different fits, with posterior distributions, although the equivalent to the prior distribution in an optimisation exercise is somewhat vague.

“…we do not need samples from a posterior, or even a faithful approximation to the posterior. We need to evaluate the posterior in places that will make the greatest contributions to the [posterior predictive].”

The paper also contains an interesting point distinguishing between priors over parameters and priors over functions, ony the later mattering for prediction. Which must be structured enough to compensate for the lack of data information about most aspects of the functions. The paper further discusses uninformative priors (over the parameters) in the O’Bayes sense as a default way to select priors. It is however unclear to me how this discussion accounts for the problems met in high dimensions by standard uninformative solutions. More aggressively penalising priors may be needed, as those found in high dimension variable selection. As in e.g. the 10⁷ dimensional space mentioned in the paper. Interesting read all in all!

O’Bayes 19/1 [snapshots]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , on June 30, 2019 by xi'an

Although the tutorials of O’Bayes 2019 of yesterday were poorly attended, albeit them being great entries into objective Bayesian model choice, recent advances in MCMC methodology, and the multiple layers of BART, for which I have to blame myself for sticking the beginning of O’Bayes too closely to the end of BNP as only the most dedicated could achieve the commuting from Oxford to Coventry to reach Warwick in time, the first day of talks were well attended, despite weekend commitments, conference fatigue, and perfect summer weather! Here are some snapshots from my bench (and apologies for not covering better the more theoretical talks I had trouble to follow, due to an early and intense morning swimming lesson! Like Steve Walker’s utility based derivation of priors that generalise maximum entropy priors. But being entirely independent from the model does not sound to me like such a desirable feature… And Natalia Bochkina’s Bernstein-von Mises theorem for a location scale semi-parametric model, including a clever construct of a mixture of two Dirichlet priors to achieve proper convergence.)

Jim Berger started the day with a talk on imprecise probabilities, involving the society for imprecise probability, which I discovered while reading Keynes’ book, with a neat resolution of the Jeffreys-Lindley paradox, when re-expressing the null as an imprecise null, with the posterior of the null no longer converging to one, with a limit depending on the prior modelling, if involving a prior on the bias as well, with Chris discussing the talk and mentioning a recent work with Edwin Fong on reinterpreting marginal likelihood as exhaustive X validation, summing over all possible subsets of the data [using log marginal predictive].Håvard Rue did a follow-up talk from his Valencià O’Bayes 2015 talk on PC-priors. With a pretty hilarious introduction on his difficulties with constructing priors and counseling students about their Bayesian modelling. With a list of principles and desiderata to define a reference prior. However, I somewhat disagree with his argument that the Kullback-Leibler distance from the simpler (base) model cannot be scaled, as it is essentially a log-likelihood. And it feels like multivariate parameters need some sort of separability to define distance(s) to the base model since the distance somewhat summarises the whole departure from the simpler model. (Håvard also joined my achievement of putting an ostrich in a slide!) In his discussion, Robin Ryder made a very pragmatic recap on the difficulties with constructing priors. And pointing out a natural link with ABC (which brings us back to Don Rubin’s motivation for introducing the algorithm as a formal thought experiment).

Sara Wade gave the final talk on the day about her work on Bayesian cluster analysis. Which discussion in Bayesian Analysis I alas missed. Cluster estimation, as mentioned frequently on this blog, is a rather frustrating challenge despite the simple formulation of the problem. (And I will not mention Larry’s tequila analogy!) The current approach is based on loss functions directly addressing the clustering aspect, integrating out the parameters. Which produces the interesting notion of neighbourhoods of partitions and hence credible balls in the space of partitions. It still remains unclear to me that cluster estimation is at all achievable, since the partition space explodes with the sample size and hence makes the most probable cluster more and more unlikely in that space. Somewhat paradoxically, the paper concludes that estimating the cluster produces a more reliable estimator on the number of clusters than looking at the marginal distribution on this number. In her discussion, Clara Grazian also pointed the ambivalent use of clustering, where the intended meaning somehow diverges from the meaning induced by the mixture model.