Archive for default prior

off to Vancouver

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on July 29, 2018 by xi'an

I am off today to Vancouver for JSM2018, eight years after I visited the West Coast for another JSM! And a contender for the Summer of British Conferences, since it is in British Columbia.

And again looking forward the city, (some of) the meeting, and getting together with long-time-no-see friends. Followed by a fortnight of vacations on Vancouver Island where ‘Og posting may get sparse…

I hope I can take advantage of the ten hours in the plane from Paris to write my talk from scratch about priors for mixtures of distributions. Based on our papers with Clara Grazian and with Kaniav Kamary and Kate Lee. Still having some leeway since my talk is on Thursday morning, on the last day of the meeting…

likelihood-free Bayesian inference on the minimum clinically important difference

Posted in Books, Statistics, University life with tags , , , , , on January 20, 2015 by xi'an

Last week, Likelihood-free Bayesian inference on the minimum clinically important difference was arXived by Nick Syring and Ryan Martin and I read it over the weekend, slowly coming to the realisation that their [meaning of] “likelihood free” was not my [meaning of] “likelihood free”, namely that it has nothing to do with ABC! The idea therein is to create a likelihood out of a loss function, in the spirit of Bassiri, Holmes and Walker, the loss being inspired here by a clinical trial concept, the minimum clinically important difference, defined as

\theta^* = \min_\theta\mathbb{P}(Y\ne\text{sign}(X-\theta))

which defines a loss function per se when considering the empirical version. In clinical trials, Y is a binary outcome and X a vector of explanatory variables. This model-free concept avoids setting a joint distribution  on the pair (X,Y), since creating a distribution on a large vector of covariates is always an issue. As a marginalia, the authors actually mention our MCMC book in connection with a logistic regression (Example 7.11) and for a while I thought we had mentioned MCID therein, realising later it was a standard description of MCMC for logistic models.

The central and interesting part of the paper is obviously defining the likelihood-free posterior as

\pi_n(\theta) \propto \exp\{-n L_n(\theta) \}\pi(\theta)

The authors manage to obtain the rate necessary for the estimation to be asymptotically consistent, which seems [to me] to mean that a better representation of the likelihood-free posterior should be

\pi_n(\theta) \propto \exp\{-n^{-2/5} L_n(\theta) \}\pi(\theta)

(even though this rescaling does not appear verbatim in the paper). This is quite an interesting application of the concept developed by Bissiri, Holmes and Walker, even though it also illustrates the difficulty of defining a specific prior, given that the minimised target above can be transformed by an arbitrary increasing function. And the mathematical difficulty in finding a rate.

penalising model component complexity

Posted in Books, Mountains, pictures, Statistics, University life with tags , , , , , , , , , , on April 1, 2014 by xi'an

“Prior selection is the fundamental issue in Bayesian statistics. Priors are the Bayesian’s greatest tool, but they are also the greatest point for criticism: the arbitrariness of prior selection procedures and the lack of realistic sensitivity analysis (…) are a serious argument against current Bayesian practice.” (p.23)

A paper that I first read and annotated in the very early hours of the morning in Banff, when temperatures were down in the mid minus 20’s now appeared on arXiv, “Penalising model component complexity: A principled, practical approach to constructing priors” by Thiago Martins, Dan Simpson, Andrea Riebler, Håvard Rue, and Sigrunn Sørbye. It is a highly timely and pertinent paper on the selection of default priors! Which shows that the field of “objective” Bayes is still full of open problems and significant advances and makes a great argument for the future president [that I am] of the O’Bayes section of ISBA to encourage young Bayesian researchers to consider this branch of the field.

“On the other end of the hunt for the holy grail, “objective” priors are data-dependent and are not uniformly accepted among Bayesians on philosophical grounds.” (p.2)

Apart from the above quote, as objective priors are not data-dependent! (this is presumably a typo, used instead of model-dependent), I like very much the introduction (appreciating the reference to the very recent Kamary (2014) that just got rejected by TAS for quoting my blog post way too much… and that we jointly resubmitted to Statistics and Computing). Maybe missing the alternative solution of going hierarchical as far as needed and ending up with default priors [at the top of the ladder]. And not discussing the difficulty in specifying the sensitivity of weakly informative priors.

“Most model components can be naturally regarded as a flexible version of a base model.” (p.3)

The starting point for the modelling is the base model. How easy is it to define this base model? Does it [always?] translate into a null hypothesis formulation? Is there an automated derivation? I assume this somewhat follows from the “block” idea that I do like but how generic is model construction by blocks?

      germany-relative-risk

“Occam’s razor is the principle of parsimony, for which simpler model formulations should be preferred until there is enough support for a more complex model.” (p.4)

I also like this idea of putting a prior on the distance from the base! Even more because it is parameterisation invariant (at least at the hyperparameter level). (This vaguely reminded me of a paper we wrote with George a while ago replacing tests with distance evaluations.) And because it gives a definitive meaning to Occam’s razor. However, unless the hyperparameter ξ is one-dimensional this does not define a prior on ξ per se. I equally like Eqn (2) as it shows how the base constraint takes one away from Jeffrey’s prior. Plus, if one takes the Kullback as an intrinsic loss function, this also sounds related to Holmes’s and Walker’s substitute loss pseudopriors, no? Now, eqn (2) does not sound right in the general case. Unless one implicitly takes a uniform prior on the Kullback sphere of radius d? There is a feeling of one-d-ness in the description of the paper (at least till page 6) and I wanted to see how it extends to models with many (≥2) hyperparameters. Until I reached Section 6 where the authors state exactly that! There is also a potential difficulty in that d(ξ) cannot be computed in a general setting. (Assuming that d(ξ) has a non-vanishing Jacobian as on page 19 sounds rather unrealistic.) Still about Section 6, handling reference priors on correlation matrices is a major endeavour, which should produce a steady flow of followers..!

“The current practice of prior specification is, to be honest, not in a good shape. While there has been a strong growth of Bayesian analysis in science, the research field of “practical prior specification” has been left behind.” (*p.23)

There are still quantities to specify and calibrate in the PC priors, which may actually be deemed a good thing by Bayesians (and some modellers). But overall I think this paper and its message constitute a terrific step for Bayesian statistics and I hope the paper can make it to a major journal.