Mark, thanks for the further comments.

Adding the further factor from the reported marginal likelihood does not appear to be exact: there is indeed such a term appearing in the log-marginal likelihood, but the change of scale also impacts the integral against the parameters. This is indeed the whole issue with the determination of the prior, before or after standardisation. And I agree that both solutions are two versions of empirical Bayes techniques… No one is more nor less empirical than the other. (I am using R sd function so I indeed divide by (n-1).)

I am quite reluctant to use a Bayes factor to compare priors on the same model. I have seen this done in several papers but this does not sound either right or Bayesian to me… Especially when using empirical Bayes solutions: comparing marginal likelihoods always favours the prior highly concentrated around the mle (I think).

]]>I now see in Figure 5 (in your paper with Lee, Marin, and Mengersen) that the estimated means cannot be associated with the original data. Moreover, your statement of the prior in Example 8 (where you treat the galaxy data) is centered around zero. So there are clues in the paper! But if you continue to report the numbers in Table 3, I hope you will be more explicit about the fact that you are fitting a transformation of the galaxy data. (See my next comment/question.)

I now realize that the huge factor comes from Jacobian of the transformation. Let sd denote the standard deviation used to transform the data and let n denote the number of observations. Then the factor is sd^n. By the way, why not simply report this factor or, better yet, subtract n*log(sd) from your reported marginal likelihoods to make them comparable to Chib’s (and others who might fit the data)?

Please forgive my ignorance here, but regarding the “stardardization”, I assume you have subtracted the empirical mean and divided by the emprical standard deviation. For the empirical standard deviation, do you divide by n or (n-1)? [I’m guessing you use (n-1).] It makes some difference since (81/82)^(82/2) is roughly .60.

You say that by transforming the data, you avoid the use of “empirical Bayes priors”. I’m sure I don’t understand the import of what you say. Let me try to explain the source of my confusion. You have used the empirical mean and stardard deviation to transform the data and then adopted a Gaussian prior for \mu_j with a mean of zero and a variance of 10 \sigma^2 (with a distribution for \sigma^2). Suppose instead you did not subtract the mean of the data (for example) but instead centered the prior on the empirical mean. (This is essentially what Chib does.) In what respect is that more or less an empirically-based prior? (Similarly, I believe it’s possible to make an adjustment to the variance of the prior.) What am I missing?

Your model and Chib’s model differ (in essence) because the priors are different. For his model with three components (the one in which both means and variances free), the log of the marginal likelihood (as computed by Neal) is -226.791. For your model with three components, the log of the marginal likelihood is -103.36 – 82*log(sd) = -227.917 [assuming you divide by (n-1) to get the sd]. So the Bayes factor favors Chib’s model/prior by a factor of about 3 (assuming I’m doing things properly).

–Mark

]]>–Mark

]]>