## differences between Bayes factors and normalised maximum likelihood

Posted in Books, Kids, Statistics, University life with tags , , , , on November 19, 2014 by xi'an

A recent arXival by Heck, Wagenmaker and Morey attracted my attention: Three Qualitative Differences Between Bayes Factors and Normalized Maximum Likelihood, as it provides an analysis of the differences between Bayesian analysis and Rissanen’s Optimal Estimation of Parameters that I reviewed a while ago. As detailed in this review, I had difficulties with considering the normalised likelihood

$p(x|\hat\theta_x) \big/ \int_\mathcal{X} p(y|\hat\theta_y)\,\text{d}y$

as the relevant quantity. One reason being that the distribution does not make experimental sense: for instance, how can one simulate from this distribution? [I mean, when considering only the original distribution.] Working with the simple binomial B(n,θ) model, the authors show the quantity corresponding to the posterior probability may be constant for most of the data values, produces a different upper bound and hence a different penalty of model complexity, and may differ in conclusion for some observations. Which means that the apparent proximity to using a Jeffreys prior and Rissanen’s alternative does not go all the way. While it is a short note and only focussed on producing an illustration in the Binomial case, I find it interesting that researchers investigate the Bayesian nature (vs. artifice!) of this approach…

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on April 21, 2014 by xi'an

As I was flying over Skye (with [maybe] a first if hazy perspective on the Cuillin ridge!) to Iceland, three long sets of replies to some of my posts appeared on the ‘Og:

Thanks to them for taking the time to answer my musings…

## Dan Simpson’s seminar at CREST

Posted in Kids, Mountains, Statistics, Travel, University life with tags , , , , , , , , , on April 18, 2014 by xi'an

Daniel Simpson gave a seminar at CREST yesterday on his recently arXived paper, “Penalising model component complexity: A principled, practical  approach to constructing priors” written with Thiago Martins, Andrea Riebler, Håvard Rue, and Sigrunn Sørbye. Paper that he should also have given in Banff last month had he not lost his passport in København airport…  I have already commented at length on this exciting paper, hopefully to become a discussion paper in a top journal!, so I am just pointing out two things that came to my mind during the energetic talk delivered by Dan to our group. The first thing is that those penalised complexity (PC) priors of theirs rely on some choices in the ordering of the relevance, complexity, nuisance level, &tc. of the parameters, just like reference priors. While Dan already wrote a paper on Russian roulette, there is also a Russian doll principle at work behind (or within) PC priors. Each shell of the Russian doll corresponds to a further level of complexity whose order need be decided by the modeller… Not very realistic in a hierarchical model with several types of parameters having only local meaning.

My second point is that the construction of those “politically correct” (PC) priors reflects another Russian doll structure, namely one of embedded models, hence would and should lead to a natural multiple testing methodology. Except that Dan rejected this notion during his talk, by being opposed to testing per se. (A good topic for one of my summer projects, if nothing more, then!)

## penalising model component complexity

Posted in Books, Mountains, pictures, Statistics, University life with tags , , , , , , , , , , on April 1, 2014 by xi'an

“Prior selection is the fundamental issue in Bayesian statistics. Priors are the Bayesian’s greatest tool, but they are also the greatest point for criticism: the arbitrariness of prior selection procedures and the lack of realistic sensitivity analysis (…) are a serious argument against current Bayesian practice.” (p.23)

A paper that I first read and annotated in the very early hours of the morning in Banff, when temperatures were down in the mid minus 20’s now appeared on arXiv, “Penalising model component complexity: A principled, practical approach to constructing priors” by Thiago Martins, Dan Simpson, Andrea Riebler, Håvard Rue, and Sigrunn Sørbye. It is a highly timely and pertinent paper on the selection of default priors! Which shows that the field of “objective” Bayes is still full of open problems and significant advances and makes a great argument for the future president [that I am] of the O’Bayes section of ISBA to encourage young Bayesian researchers to consider this branch of the field.

“On the other end of the hunt for the holy grail, “objective” priors are data-dependent and are not uniformly accepted among Bayesians on philosophical grounds.” (p.2)

Apart from the above quote, as objective priors are not data-dependent! (this is presumably a typo, used instead of model-dependent), I like very much the introduction (appreciating the reference to the very recent Kamary (2014) that just got rejected by TAS for quoting my blog post way too much… and that we jointly resubmitted to Statistics and Computing). Maybe missing the alternative solution of going hierarchical as far as needed and ending up with default priors [at the top of the ladder]. And not discussing the difficulty in specifying the sensitivity of weakly informative priors.

“Most model components can be naturally regarded as a flexible version of a base model.” (p.3)

The starting point for the modelling is the base model. How easy is it to define this base model? Does it [always?] translate into a null hypothesis formulation? Is there an automated derivation? I assume this somewhat follows from the “block” idea that I do like but how generic is model construction by blocks?

“Occam’s razor is the principle of parsimony, for which simpler model formulations should be preferred until there is enough support for a more complex model.” (p.4)

I also like this idea of putting a prior on the distance from the base! Even more because it is parameterisation invariant (at least at the hyperparameter level). (This vaguely reminded me of a paper we wrote with George a while ago replacing tests with distance evaluations.) And because it gives a definitive meaning to Occam’s razor. However, unless the hyperparameter ξ is one-dimensional this does not define a prior on ξ per se. I equally like Eqn (2) as it shows how the base constraint takes one away from Jeffrey’s prior. Plus, if one takes the Kullback as an intrinsic loss function, this also sounds related to Holmes’s and Walker’s substitute loss pseudopriors, no? Now, eqn (2) does not sound right in the general case. Unless one implicitly takes a uniform prior on the Kullback sphere of radius d? There is a feeling of one-d-ness in the description of the paper (at least till page 6) and I wanted to see how it extends to models with many (≥2) hyperparameters. Until I reached Section 6 where the authors state exactly that! There is also a potential difficulty in that d(ξ) cannot be computed in a general setting. (Assuming that d(ξ) has a non-vanishing Jacobian as on page 19 sounds rather unrealistic.) Still about Section 6, handling reference priors on correlation matrices is a major endeavour, which should produce a steady flow of followers..!

“The current practice of prior specification is, to be honest, not in a good shape. While there has been a strong growth of Bayesian analysis in science, the research field of “practical prior specification” has been left behind.” (*p.23)

There are still quantities to specify and calibrate in the PC priors, which may actually be deemed a good thing by Bayesians (and some modellers). But overall I think this paper and its message constitute a terrific step for Bayesian statistics and I hope the paper can make it to a major journal.