Archive for prior selection

O’Bayes 2019 conference program

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on May 13, 2019 by xi'an

The full and definitive program of the O’Bayes 2019 conference in Warwick is now on line. Including discussants for all papers. And the three [and free] tutorials on Friday afternoon, 28 June, on model selection (M. Barbieri), MCMC recent advances (G.O. Roberts) and BART (E.I. George). Registration remains open at the reduced rate and submissions of posters can still be sent to me for all conference participants.

leave Bayes factors where they once belonged

Posted in Statistics with tags , , , , , , , , , , on February 19, 2019 by xi'an

In the past weeks I have received and read several papers (and X validated entries)where the Bayes factor is used to compare priors. Which does not look right to me, not on the basis of my general dislike of Bayes factors!, but simply because this seems to clash with the (my?) concept of Bayesian model choice and also because data should not play a role in that situation, from being used to select a prior, hence at least twice to run the inference, to resort to a single parameter value (namely the one behind the data) to decide between two distributions, to having no asymptotic justification, to eventually favouring the prior concentrated on the maximum likelihood estimator. And more. But I fear that this reticence to test for prior adequacy also extends to the prior predictive, or Box’s p-value, namely the probability under this prior predictive to observe something “more extreme” than the current observation, to quote from David Spiegelhalter.

JSM 2018 [#4½]

Posted in Statistics, University life with tags , , , , , , , , on August 10, 2018 by xi'an

As I wrote my previous blog entry on JSM2018 before the sessions, I did not have the chance to comment on our mixture session, which I found most interesting!, with new entries on the topic and a great discussion by Bettina Grün. Including the important call for linking weights with the other parameters, as both groups being independent does not make sense when the number of components is uncertain. (Incidentally our paper with Kaniav kamary and Kate Lee does create a dependence.) The talk by Deborah Kunkel was about anchored mixture estimation, a joint work with Mario Peruggia, another arXival that I had missed.

The notion of anchoring found in this paper is to allocate specific observations to specific components. These observations are thus anchored to these components. Among other things, this modification of the sampling model implies a removal of the unidentifiability problem. Hence formally of the label-switching or lack thereof issue. (Although, as Peter Green repeatedly mentioned, visualising the parameter space as a point process eliminates the issue.) This idea is somewhat connected with the constraint Jean Diebolt and I imposed in our 1990 mixture paper, namely that no component would have less than two observations allocated to it, but imposing which ones are which of course reduces drastically the complexity of the model. Another (related) aspect of anchoring is that the observations that are anchored to the components act as parts of the prior model, modifying the initial priors (which can then become improper as in our 1990 paper). The difficulty of the anchoring approach is to find observations to anchor in an unsupervised setting. The paper proceeds by optimising the allocations, which somewhat turns the prior into a data-dependent prior since all observations are used to set the anchors and then used again for the standard Bayesian processing. In that respect, I would rather follow the sequential procedure developed by Nicolas Chopin and Florian Pelgrin, where the number of components grows by steps with the number of observations.


JSM 2018 [#4]

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 3, 2018 by xi'an

As last ½ day of sessions at JSM2018 in an almost deserted conference centre, with a first session set together by Mario Peruggia and a second on Advances in Bayesian Nonparametric Modeling and Computation for Complex Data. Here are the slides of my talk this morning in the Bayesian mixture estimation session.

which I updated last night (Slideshare most absurdly does not let you update versions!)

Since I missed the COPSS Award ceremony for a barbecue with friends on Locarno Beach, I only discovered this morning that the winner this year is Richard Samworth, from Cambridge University, who eminently deserves this recognition, if only because of his contributions to journal editing, as I can attest from my years with JRSS B. Congrats to him as well as to Bin Yu and Susan Murphy for their E.L. Scott and R.A. Fisher Awards!  I also found out from an email to JSM participants that the next edition is in Denver, Colorado, which I visited only once in 1993 on a trip to Fort Collins visiting Kerrie Mengersen and Richard Tweedie. Given the proximity to the Rockies, I am thinking of submitting an invited session on ABC issues, which were not particularly well covered by this edition of JSM. (Feel free to contact me if you are interested in joining the session.)

objectivity in prior distributions for the multinomial model

Posted in Statistics, University life with tags , , , , , on March 17, 2016 by xi'an

Today, Danilo Alvares visiting from the Universitat de Valencià gave a talk at CREST about choosing a prior for the Multinomial distribution. Comparing different Dirichlet priors. In a sense this is an hopeless task, first because there is no reason to pick a particular prior unless one picks a very specific and a-Bayesian criterion to discriminate between priors, second because the multinomial is a weird distribution, hardly a distribution at all in that it results from grouping observations into classes, often based on the observations themselves. A construction that should be included within the choice of the prior maybe? But there lurks a danger of ending up with a data-dependent prior. My other remark about this problem is that, among the token priors, Perk’s prior using 1/k as its hyper-parameter [where k is the number of categories] is rather difficult to justify compared with 1/k² or 1/k³, except for aggregation consistency to some extent. And Laplace’s prior gets highly concentrated as the number of categories grows.