## O’Bayes 19/4

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on July 4, 2019 by xi'an

Last talks of the conference! With Rui Paulo (along with Gonzalo Garcia-Donato) considering the special case of factors when doing variable selection. Which is an interesting question that I had never considered, as at best I would remove all leves or keeping them all. Except that there may be misspecification in the factors as for instance when several levels have the same impact.With Michael Evans discussing a paper that he wrote for the conference! Following his own approach to statistical evidence. And including his reluctance to cover infinity (calling on Gauß for backup!) or continuity, and his call to falsify a Bayesian model by checking it can be contradicted by the data. His assumption that checking for prior is separable from checking for [sampling] model is debatable. (With another mention made of the Savage-Dickey ratio.)

And with Dimitris Fouskakis giving a wide ranging assessment [which Mark Steel (Warwick) called a PEP talk!] of power-expected-posterior priors, used with reference (and usually improper) priors. Which in retrospect would have suited better the beginning of the conference as it provided a background to several of the talks. Raising a question (from my perspective) on using the maximum likelihood estimator as a pseudo-sufficient statistic when this MLE is computed for the base (simplest) model. Maybe an ABC induced bias in this question as it would not work for ABC model choice.

Overall, I think the scientific outcomes of the conference were quite positive: a wide range of topics and perspectives, a reasonable and diverse attendance, especially when considering the heavy load of related conferences in the surrounding weeks (the “June fatigue”!), animated poster sessions. I am obviously not the one to assess the organisation of the conference! Things I forgot to do in this regard: organise transportation from Oxford to Warwick University, provide an attached room for in-pair research, insist on sustainability despite the imposed catering solution, facilitate sharing joint transportation to and from the Warwick campus, mention that tap water was potable, and… wear long pants when running in nettles.

## O’Bayes 19/3

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on July 2, 2019 by xi'an

Nancy Reid gave the first talk of the [Canada] day, in an impressive comparison of all approaches in statistics that involve a distribution of sorts on the parameter, connected with the presentation she gave at BFF4 in Harvard two years ago, including safe Bayes options this time. This was related to several (most?) of the talks at the conference, given the level of worry (!) about the choice of a prior distribution. But the main assessment of the methods still seemed to be centred on a frequentist notion of calibration, meaning that epistemic interpretations of probabilities and hence most of Bayesian answers were disqualified from the start.

In connection with Nancy’s focus, Peter Hoff’s talk also concentrated on frequency valid confidence intervals in (linear) hierarchical models. Using prior information or structure to build better and shrinkage-like confidence intervals at a given confidence level. But not in the decision-theoretic way adopted by George Casella, Bill Strawderman and others in the 1980’s. And also making me wonder at the relevance of contemplating a fixed coverage as a natural goal. Above, a side result shown by Peter that I did not know and which may prove useful for Monte Carlo simulation.

Jaeyong Lee worked on a complex model for banded matrices that starts with a regular Wishart prior on the unrestricted space of matrices, computes the posterior and then projects this distribution onto the constrained subspace. (There is a rather consequent literature on this subject, including works by David Dunson in the past decade of which I was unaware.) This is a smart demarginalisation idea but I wonder a wee bit at the notion as the constrained space has measure zero for the larger model. This could explain for the resulting posterior not being a true posterior for the constrained model in the sense that there is no prior over the constrained space that could return such a posterior. Another form of marginalisation paradox. The crux of the paper is however about constructing a functional form of minimaxity. In his discussion of the paper, Guido Consonni provided a representation of the post-processed posterior (P³) that involves the Dickey-Savage ratio, sort of, making me more convinced of the connection.

As a lighter aside, one item of local information I should definitely have broadcasted more loudly and long enough in advance to the conference participants is that the University of Warwick is not located in ye olde town of Warwick, where there is no university, but on the outskirts of the city of Coventry, but not to be confused with the University of Coventry. Located in Coventry.

## Astrostatistics school

Posted in Mountains, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on October 17, 2017 by xi'an

What a wonderful week at the Astrostat [Indian] summer school in Autrans! The setting was superb, on the high Vercors plateau overlooking both Grenoble [north] and Valence [west], with the colours of the Fall at their brightest on the foliage of the forests rising on both sides of the valley and a perfect green on the fields at the centre, with sun all along, sharp mornings and warm afternoons worthy of a late Indian summer, too many running trails [turning into X country ski trails in the Winter] to contemplate for a single week [even with three hours of running over two days], many climbing sites on the numerous chalk cliffs all around [but a single afternoon for that, more later in another post!]. And of course a group of participants eager to learn about Bayesian methodology and computational algorithms, from diverse [astronomy, cosmology and more] backgrounds, trainings and countries. I was surprised at the dedication of the participants travelling all the way from Chile, Péru, and Hong Kong for the sole purpose of attending the school. David van Dyk gave the first part of the school on Bayesian concepts and MCMC methods, Roberto Trotta the second part on Bayesian model choice and hierarchical models, and myself a third part on, surprise, surprise!, approximate Bayesian computation. Plus practicals on R.

## Using MCMC output to efficiently estimate Bayes factors

Posted in Books, R, Statistics, University life with tags , , , , on May 19, 2016 by xi'an

As I was checking for software to answer a query on X validated about generic Bayes factor derivation, I came across an R software called BayesFactor, which only applies in regression settings and relies on the Savage-Dickey representation of the Bayes factor

$B_{01}=\dfrac{f(y|\theta^0)}{m(y)}=\dfrac{\pi(\theta^0|y)}{\pi(\theta^0)}$

when the null hypothesis writes as θ=θ⁰ (and possibly additional nuisance parameters with [roughly speaking] an independent prior). As we discussed in our paper with Jean-Michel Marin [which got ignored by large!], this representation of the Bayes factor is based on picking a very specific version of the prior, or more exactly of three prior densities. Assuming such versions are selected, I wonder at the performances of this approximation, given that it involves approximating the marginal posterior at θ⁰….

“To ensure that the Bayes factor we compute using the Savage–Dickey ratio is the the ratio of marginal densities that we intend, the condition (…) is easily met by models which specify priors in which the nuisance parameters are independent of the parameters of interest.” Morey et al. (2011)

First, when reading Morey at al. (2011), I realised (a wee bit late!) that Chib’s method is nothing but a version of the Savage-Dickey representation when the marginal posterior can be estimated in a parametric (Rao-Blackwellised) way. However, outside hierarchical models based on conjugate priors such parametric approximations are intractable and non-parametric versions must be invoked instead, which necessarily degrades the quality of the method. A degradation that escalates with the dimension of the parameter θ. In addition, I am somewhat perplexed by the use of a Rao-Blackwell argument in the setting of the Dickey-Savage representation. Indeed this representation assumes that

$\pi_1(\psi|\theta_0)=\pi_0(\psi) \ \ \text{or}\quad \pi_1(\theta_0,\psi)=\pi_1(\theta_0)\pi_0(\psi)$

which means that [the specific version of] the conditional density of θ⁰ given ψ should not depend on the nuisance parameter. But relying on a Rao-Blackwellisation leads to estimate the marginal posterior via full conditionals. Of course, θ given ψ and y may depend on ψ, but still… Morey at al. (2011) advocate the recourse to Chib’s formula as optimal but this obviously requires the full conditional to be available. They acknowledge this point as moot, since it is sufficient from their perspective to specify a conjugate prior. They consider this to be a slight modification of the model (p.377). However, I see the evaluation of an estimated density at a single (I repeat, single!) point as being the direst part of the method as it is clearly more sensitive to approximations that the evaluation of a whole integral, since the later incorporates an averaging effect by definition. Hence, even if this method was truly available for all models, I would be uncertain of its worth when compared with other methods, except the harmonic mean estimator of course!

On the side, Morey at al. (2011) study a simple one-sample t test where they use an improper prior on the nuisance parameter σ, under both models. While the Savage-Dickey representation is correct in this special case, I fail to see why the identity would apply in every case under an improper prior. In particular, independence does not make sense with improper priors. The authors also indicate the possible use of this Bayes factor approximation for encompassing models. At first, I thought this could be most useful in our testing by mixture framework where we define an encompassing model as a mixture. However, I quickly realised that using a Beta Be(a,a) prior on the weight α with a<1 leads to an infinite density value at both zero and one, hence cannot be compatible with a Savage-Dickey representation of the Bayes factor.

## Bayesian model averaging in astrophysics [guest post]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , on August 12, 2015 by xi'an

.[Following my posting of a misfiled 2013 blog, Ewan Cameron told me of the impact of this paper in starting his own blog and I asked him for a guest post, resulting in this analysis, much deeper than mine. No warning necessary this time!]

Back in February 2013 when “Bayesian Model Averaging in Astrophysics: A Review” by Parkinson & Liddle (hereafter PL13) first appeared on the arXiv I was a keen, young(ish) postdoc eager to get stuck into debates about anything and everything ‘astro-statistical’. And with its seemingly glaring flaws, PL13 was more grist to the mill. However, despite my best efforts on various forums I couldn’t get a decent fight started over the right way to do model averaging (BMA) in astronomy, so out of sheer frustration two months later I made my own soapbox to shout from at Another Astrostatistics Blog. Having seen PL13 reviewed recently here on Xian’s Og it feels like the right time to revisit the subject and reflect on where BMA in astronomy is today.

As pointed out to me back in 2013 by Tom Loredo, the act of Bayesian model averaging has been around much longer than its name; indeed an early astronomical example appears in Gregory & Loredo (1992) in which the posterior mean representation of an unknown signal is constructed for an astronomical “light-curve”, averaging over a set of constant and periodic candidate models. Nevertheless the wider popularisation of model averaging in astronomy has only recently taken place through a variety of applications in cosmology: e.g. Liddle, Mukherjee, Parkinson & Wang (2006) and Vardanyan, Trotta & Silk (2011).

In contrast to earlier studies like Gregory & Loredo (1992)—or the classic review on BMA by Hoeting et al. (1999)—in which the target of model averaging is typically either a utility function, a set of future observations, or a latent parameter of the observational process (e.g. the unknown “light-curve” shape) shared naturally by all competing models, the proposal of cosmological BMA studies is to produce a model-averaged version of the posterior for a given ‘shared’ parameter: a so-called “model-averaged PDF”. This proposal didn’t sit well with me back in 2013, and it still doesn’t sit well with me today. Philosophically: without a model a parameter has no meaning, so why should we want to seek meaning in the marginalised distribution of a parameter over an entire set of models? And, practically: to put it another way, without knowing the model ‘label’ to which a given mass of model-averaged parameter probability belongs there’s nothing much useful we can do with this ‘PDF’: nothing much we can say about the data we’ve just analysed and nothing much we can say about future experiments. Whereas the space of the observed data is shared automatically by all competing models, it seems to me to be somehow “un-Bayesian” to place the further restriction that the parameters of separate models share the same scale and topology. I say “un-Bayesian” since this mode of model averaging suggests a formulation of the parameter space + prior pairing stronger than the statement of one’s prior beliefs for the distribution of observable data given the model. But I would be happy to hear arguments from the other side in the comments box below … ! Continue reading