Archive for pseudo-priors

are pseudopriors required in Bayesian model selection?

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , on February 29, 2020 by xi'an

An interesting question from X validated about constructing pseudo-priors for Bayesian model selection. Namely, how useful are these for the concept rather than the implementation? The only case where I am aware of pseudo-priors being used is in Bayesian MCMC algorithms such as Carlin and Chib (1995), where the distributions are used to complement the posterior distribution conditional on a single model (index) into a joint distribution across all model parameters. The trick of this construction is that the pseudo-priors can be essentially anything, including depending on the data as well. And while the impact the ability of the resulting Markov chain to move between spaces, they have no say on the resulting inference, either when choosing a model or when estimating the parameters of a chosen model. The concept of pseudo-priors was also central to the mis-interpretations found in Congdon (2006) and Scott (2002). Which we reanalysed with Jean-Michel Marin in Bayesian Analysis (2008) as the distinction between model-based posteriors and joint pseudo-posteriors.

model selection and multiple testing

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on October 23, 2015 by xi'an

Ritabrata Dutta, Malgorzata Bogdan and Jayanta Ghosh recently arXived a survey paper on model selection and multiple testing. Which provides a good opportunity to reflect upon traditional Bayesian approaches to model choice. And potential alternatives. On my way back from Madrid, where I got a bit distracted when flying over the South-West French coast, from Biarritz to Bordeaux. Spotting the lake of Hourtain, where I spent my military training month, 29 years ago!

“On the basis of comparison of AIC and BIC, we suggest tentatively that model selection rules should be used for the purpose for which they were introduced. If they are used for other problems, a fresh justification is desirable. In one case, justification may take the form of a consistency theorem, in the other some sort of oracle inequality. Both may be hard to prove. Then one should have substantial numerical assessment over many different examples.”

The authors quickly replace the Bayes factor with BIC, because it is typically consistent. In the comparison between AIC and BIC they mention the connundrum of defining a prior on a nested model from the prior on the nesting model, a problem that has not been properly solved in my opinion. The above quote with its call to a large simulation study reminded me of the paper by Arnold & Loeppky about running such studies through ecdfs. That I did not see as solving the issue. The authors also discuss DIC and Lasso, without making much of a connection between those, or with the above. And then reach the parametric empirical Bayes approach to model selection exemplified by Ed George’s and Don Foster’s 2000 paper. Which achieves asymptotic optimality for posterior prediction loss (p.9). And which unifies a wide range of model selection approaches.

A second part of the survey considers the large p setting, where BIC is not a good approximation to the Bayes factor (when testing whether or not all mean entries are zero). And recalls that there are priors ensuring consistency for the Bayes factor in this very [restrictive] case. Then, in Section 4, the authors move to what they call “cross-validatory Bayes factors”, also known as partial Bayes factors and pseudo-Bayes factors, where the data is split to (a) make the improper prior proper and (b) run the comparison or test on the remaining data. They also show the surprising result that, provided the fraction of the data used to proper-ise the prior does not converge to one, the X validated Bayes factor remains consistent [for the special case above]. The last part of the paper concentrates on multiple testing but is more tentative and conjecturing about convergence results, centring on the differences between full Bayes and empirical Bayes. Then the plane landed in Paris and I stopped my reading, not feeling differently about the topic than when the plane started from Madrid.

Carlin and Chib (1995) for fixed dimension problems

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on February 25, 2014 by xi'an

chantier de désamiantage, Université Pierre et Marie Curie, Paris (c) Bouchon/le FigaroYesterday, I was part of a (public) thesis committee at the Université Pierre et Marie Curie, in down-town Paris. After a bit of a search for the defence room (as the campus is still undergoing a massive asbestos clean-up, 20 years after it started…!), I listened to Florian Maire delivering his talk on an array of work in computational statistics ranging from the theoretical (Peskun ordering) to the methodological (Monte Carlo online EM) to the applied (unsupervised learning of classes shapes via deformable templates). The implementation of the online EM algorithm involved the use of pseudo-priors à la Carlin and Chib (1995), even though the setting was a fixed-dimension one, in order to fight the difficulty of exploring the space of templates by a regular Gibbs sampler. (As usual, the design of the pseudo-priors was crucial to the success of the method.) The thesis also included a recent work with Randal Douc and Jimmy Olsson on ranking inhomogeneous Markov kernels of the type

P \circ Q \circ P \circ Q \circ ...

against alternatives with components (P’,Q’). The authors were able to characterise minimal conditions for a Peskun-ordering domination on the components to transfer to the combination. Quite an interesting piece of work for a PhD thesis!

Posterior model probabilities computed from model-specific Gibbs output [arXiv:1012.0073]

Posted in Books, Statistics with tags , , , , , on December 9, 2010 by xi'an

“Expressing RJMCMC as simple Gibbs sampling provides the key innovation of our formulation: it allows us to fit models one at a time using ordinary MCMC and then compute model weights or Bayes factors by post-processing the Monte Carlo output.”

Richard Barker (from the University of Otago, Dunedin, New Zealand) and William Link posted this new paper on arXiv. A point in their abstract attracted my attention, namely that they produce a “representation [that] allows [them] to fit models one at a time using ordinary MCMC and then compute model weights or Bayes factors by post-processing the Monte Carlo output”. This is quite interesting in that most attempts at building Bayes factors approximations from separate chains running each on a separate model have led to erroneous solutions. It appears however that the paper builds upon a technique fully exposed in the book written by the authors. Continue reading