Rescaling data for evidence

Following a comment by Mark Fisher, I went back to the analysis of the benchmark dataset of the galaxy radial velocities found in many papers on mixture estimation as Roeder (1992). (Radford Neal gave an impromptu talk at the 2001 Edinburgh meeting organised at ICMS, by Mike Titterington and myself about the lack of astrophysical motivations for such a modelling, but it keeps being used as a benchmark. It is available as galaxy(MASS) in R.) As noted by Mark, there is a huge discrepancy between our numerical values for the marginal likelihood and those found in Chib (1995) or Neal (1998). The reason is that in the review paper with Kate Lee, Jean-Michel Marin, and Kerrie Mengersen, we used a standardised version of the dataset. In retrospect, there is little to support this preliminary standardisation since, as also noted by Mark, the empirical Bayes nature of the analysis remains of the same kind as in the original paper. We were thus simply hidding the empirical part under the data carpet…

Nonetheless, there is an interesting theoretical question prompted by Mark’s comments, to which he himself answered, namely about the comparison of the marginal likelihoods under both approaches. The attached notes show that, with a coherent change of priors, the marginal likelihood of the rescaled data is the original marginal likelihood rescaled by a factor \alpha^n where n is the sample size.

reparameterisation

Another interesting question in Mark’s comments is about using those marginal likelihoods to compare priors, i.e. on deriving a formal Bayes factor from the marginal likelihoods computed with Sid Chib’s prior and with ours. I have objections to doing this, both on foundational and practical grounds. From a foundational point of view, the sampling model remains the same under both priors and there is no Bayesian meta-model that encompasses both priors: they are incompatible with one another and thus cannot be used simultaneously. From a practical perspective, since the priors have an empirical component, it is possible to cheat towards a maximal marginal likelihood by using a prior centered at the mle and with a zero prior variance. (See also the controversy surrounding Murray Aitkin‘s resolution of the improper prior difficulty by using the data twice.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.