Archive for improper prior

Thesis defense in València

Posted in Statistics, Travel, University life, Wines with tags , , , , , , on February 25, 2011 by xi'an

On Monday, I took part in the jury of the PhD thesis of Anabel Forte Deltel, in the department of statistics of the Universitat de València. The topic of the thesis was variable selection in Gaussian linear models using an objective Bayes approach. Completely on my own research agenda! I had already discussed with Anabel in Zürich, where she gave a poster and gave me a copy of her thesis, so could concentrate on the fundamentals of her approach during the defense. Her approach extends Liang et al. (2008, JASA) hyper-g prior in a complete analysis of the conditions set by Jeffreys in his book for constructing such priors. She is therefore able to motivate a precise value for most hyperparameters (although some choices were mainly based on computational reasons opposing 2F1 with Appell’s F1 hypergeometric functions). She also defends the use of an improper prior by an invariance argument that leads to the standard Jeffreys’ prior on location-scale. (This is where I prefer the approach in Bayesian Core that does not discriminate between a subset of the covariates including the intercept and the other covariates. Even though it is not invariant by location-scale transforms.) After the defence, Jim Berger pointed out to me that the modelling allowed for the subset to be empty, which would then cancel my above objection! In conclusion, this thesis could well set a reference prior (if not in José Bernardo’s sense of the term!) for Bayesian linear model analysis in the coming years.

Continue reading

Bayes vs. SAS

Posted in Books, R, Statistics with tags , , , , , , , , , , , , , , , , , , on May 7, 2010 by xi'an

Glancing perchance at the back of my Amstat News, I was intrigued by the SAS advertisement

Bayesian Methods

  • Specify Bayesian analysis for ANOVA, logistic regression, Poisson regression, accelerated failure time models and Cox regression through the GENMOD, LIFEREG and PHREG procedures.
  • Analyze a wider variety of models with the MCMC procedure, a general purpose Bayesian analysis procedure.

and so decided to take a look at those items on the SAS website. (Some entries date back to 2006 so I am not claiming novelty in this post, just my reading through the manual!)

Even though I have not looked at a SAS program since the time in 1984 I was learning principal component and discriminant analysis by programming SAS procedures on punched cards, it seems the MCMC part is rather manageable (if you can manage SAS at all!), looking very much like a second BUGS to my bystander eyes, even to the point of including ARS algorithms! The models are defined in a BUGS manner, with priors on the side (and this includes improper priors, despite a confusing first example that mixes very large variances with vague priors for the linear model!). The basic scheme is a random walk proposal with adaptive scale or covariance matrix. (The adaptivity on the covariance matrix is slightly confusing in that the way it is described it does not seem to implement the requirements of Roberts and Rosenthal for sure convergence.) Gibbs sampling is not directly covered, although some examples are in essence using Gibbs samplers. Convergence is assessed via ca. 1995 methods à la Cowles and Carlin, including the rather unreliable Raftery and Lewis indicator, but so does Introducing Monte Carlo Methods with R, which takes advantage of the R coda package. I have not tested (!) any of the features in the MCMC procedure but judging from a quick skim through the 283 page manual everything looks reasonable enough. I wonder if anyone has ever tested a SAS program against its BUGS counterpart for efficiency comparison.

The Bayesian aspects are rather traditional as well, except for the testing issue. Indeed, from what I have read, SAS does not engage into testing and remains within estimation bounds, offering only HPD regions for variable selection without producing a genuine Bayesian model choice tool. I understand the issues with handling improper priors versus computing Bayes factors, as well as some delicate computational requirements, but this is a truly important chunk missing from the package. (Of course, the package contains a DIC (Deviance information criterion) capability, which may be seen as a substitute, but I have reservations about the relevance of DIC outside generalised linear models. Same difficulty with the posterior predictive.) As usual with SAS, the documentation is huge (I still remember the shelves after shelves of documentation volumes in my 1984 card-punching room!) and full of options and examples. Nothing to complain about. Except maybe the list of disadvantages in using Bayesian analysis:

  • It does not tell you how to select a prior. There is no correct way to choose a prior. Bayesian inferences require skills to translate prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results.
  • It can produce posterior distributions that are heavily influenced by the priors. From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior.
  • It often comes with a high computational cost, especially in models with a large number of parameters.

which does not say much… Since the MCMC procedure allows for any degree of hierarchical modelling, it is always possible to check the impact of a given prior by letting its parameters go random. I found that most practitioners are happy with the formalisation of their prior beliefs into mathematical densities, rather than adamant about a specific prior. As for computation, this is not a major issue.

Marginalisation paradoxes

Posted in Statistics with tags , , , , on June 7, 2009 by xi'an

There was a poster by Timothy Wallstrom yesterday night at the O-Bayes09 poster session about marginalisation paradoxes and we had a nice chat about this topic. Marginalisation paradoxes are fascinating and I always mention them in my Bayesian class, because I think they illustrate the limitations of how much one can interpret an improper prior. There is a consequent literature on how to “solve” marginalisation paradoxes, following Jaynes’ comments on the foundational paper of David, Stone and Zidek (Journal of the Royal Statistical Society, 1974), but—and this is where I disagree with Timothy—I do not think they need to be “solved” either by uncovering the group action on the problem (left Haar versus right Haar) or by using different proper prior sequences. For me, the core of the “paradox” is that writing an improper prior as

\pi(\theta,\zeta) = \pi_1(\theta) \pi_2(\zeta)

does not imply that \pi_2 is the marginal prior on \zeta when \pi_1 is improper. The interpretation of \pi_2 as such is what leads to the “paradox” but there is no mathematical difficulty in the issue. Starting with the joint improper prior \pi(\theta,\zeta) leads to an undefined posterior if we only consider the part of the observations that depends on \zeta because \theta does not integrate out. Defining improper priors as limits of proper priors—as Jaynes and Timothy Wallstrom do—can also be attempted from a mathematical point of view, but (a) I do not think a global resolution is possible this way in that all Bayesian procedures for the improper prior cannot be constructed as limits from the corresponding Bayesian procedures for the proper prior sequence, think eg about testing, and (b) this is trying to give a probabilistic meaning to the improper priors and thus gets back to the over-interpretation danger mentioned above. Hence a very nice poster discussion!