Is the Dickey-Savage ratio any valid?!
As mentioned in an earlier post on the Bayes factor, I have conceptual difficulties with the Dickey-Savage ratio… While the method is well-described in Chen, Shao and Ibrahim (20012000, pages 164-165), let me recall here that the Dickey-Savage ratio provides a representation of the Bayes factor for testing an embedded model, , with a nuisance parameter
, under the assumption that the conditional prior density of
under the alternative when
,
, is equal to the prior under the null hypothesis
. In this case, we have
with
the prior and posterior marginal densities of under the alternative.
What bothers me with this equality (whose functional proof is quite straightforward) is that it relies on a particular version of the conditional density, i.e. that the assumption above is meaningless from a measure theoretic perspective. Given the formal definition of conditional measures and densities, they are only known defined up to a set of measure zero and the value of when
is thus arbitrary (since
is a fixed value that is [or should be] set before observation). Furthermore, the choice of the a version of
does not impact on the choice of the version of
, which also is arbitrary, so there is no cancellation of arbitrary constants in the Dickey-Savage ratio representation. This Dickey-Savage representation is therefore dependent both in its assumption and in its expression on a specific version of the conditional density
. Furthermore, when
is replaced with a Rao-Blackwell estimate,
this estimate also depends on the choice of a collection of versions of the conditional densities... The answer to the title of this post is therefore that, no, the Dickey-Savage representation is not valid: it simply is meaningless from a mathematical viewpoint and thus this has nothing to do with simulation issues, hence the removal of the previous sentence!
After a few more hours of thinking about this issue (in the plane to Finland), I came to realise I have a way to write down a generic valid Savage-Dickey Dickey-Savage-like ratio representation that only involves a pseudo-prior instead of imposing a meaningless constraint on the prior. Furthermore, this Dickey-Savage-like ratio representation can produce an approximation to the Bayes factor based on a corresponding single (new) sequence of simulations, without independently from the Verdinelli-Wasserman extension. Indeed, all that is needed is a Monte Carlo or MCMC sampler on with the new (pseudo-posterior) target
which is usually feasible by a completion and a Gibbs sampling algorithm. (We are currently implementing the idea with Jean-Michel Marin and should have a preprint ready pretty soon with a probit example.) The approximation based on the Dickey-Savage-like ratio representation for the Bayes factor is then
where the ‘s are simulated in one step of the a two-stage or three-stage Gibbs sampler and
is the full (completed) posterior derived from the pseudo-posterior
.
My conclusion is therefore that the Savage-Dickey ratio representation is a universally valid approximation technique rather than a correct mathematical representation valid under some restriction on the priors. All is well that ends well! My conclusions are therefore that (a) the Dickey-Savage ratio representation does not make sense mathematically and (b) there exists a universally valid approximation technique to the Bayes factor that relies on simulating from a well-defined pseudo-posterior and a corresponding Dickey-Savage representation, and on using an appropriate Rao-Bloackwellised estimate of a conditinal density.
October 9, 2009 at 8:42 am
[…] the Savage-Dickey paradox Following several posts on this topic, we eventually managed to write down a short note with Jean-Michel Marin, which is now posted on […]
October 4, 2009 at 12:23 am
[…] tool for Bayesian model choice and have the opportunity to expose the new results on the Dickey-Savage ratio for the first time! I am looking forward this meeting, having a fond memory of a previous Young […]
September 25, 2009 at 7:04 pm
Well, am I wrong in saying that the D-S ratio and the MAP share the common feature that they do not depend only on the prior measure? Bayes factor, the posterior mean or HPD intervals clearly depend only on the prior measure, which is very good. On the other hand, the D-S ratio and the MAP depend on the prior density, and hence the issue of what is the reference measure?, etc. and some unnatural (?) requirements, like assuming the densities to be continuous.
I agree that what is even more puzzling in the D-S ratio is that it is of course equal to something that does only depend on the prior measure (the Bayes factor) but rewritten in a form which involve a ratio of densities! This is what I understood as your original concern.
Regarding my last comment, I did not expect the author of ‘The Bayesian choice’ to justify Bayes factors other than by the mere fact that they are Bayes factors! But my question was more to know whether this D-S ratio could be related to usual frequentist tests of equality when the sample size is large (in saying that it could perhaps be related to Wald test, I did miss however the fact that there was a nuisance parameter even under the null, so the situation is probably a bit more complicated here).
September 25, 2009 at 8:48 pm
Ok, dude, I now see your points. Actually, following your comments and a long Skype discussion with Jean-Michel from Helsingin, I modified rather seriously the description of my points about the D-S ratio representation, since they were obviously confusing… You have completely summarised my feelings about the D-S representation! Something absurd from a mathematical perspective, unless one imposes constraints that are unrelated with measure-theory. I still fail to see the connection with Wald tests, because again this is a Bayes factor and you cannot look at it as a likelihood ratio. Maybe at best as a profile likelihood ratio?
September 25, 2009 at 2:59 pm
I believe that your criticism of the Dickey-Savage ratio is basically the same that is often raised against the MAP estimator: Both the MAP estimator and the D-S ratio are clearly meaningful ideas when the parameter is discrete, but for continuous parameters it is much less clear. Indeed, there are examples where the MAP estimator has unpleasant properties and I would expect some cases where the D-S ratio gives unexpected results as well.
This being said, the MAP estimator has a reassuring behavior when the sample size is large in that it is equivalent to the maximum likelihood estimator and hence consistent, etc. Do similar results exists for the D-S ratio? Unsurprisingly, I would expect this to be equivalent to the Wald test but has this been shown formally?
September 25, 2009 at 5:14 pm
Sorry, dude!, but I somehow disagree with your association of the MAP estimator with the Dickey-Savage ratio: the former is dependent on the dominating measure associated with the prior density and not so much on the version of this density (since removing a set of measure zero almost surely does not change the MAP estimate), while the Dickey-Savage ratio paradox is strictly related with the non-unicity of the density over a set of measure zero. Changing the dominating measure keeps the ratio the same.
Furthermore, I do not understand your second point. The Dickey-Savage ratio is the Bayes factor, hence it does not have any specific inferential property. It simply proposes a particular representation of the Bayes factor in terms of the prior/posterior under the alternative hypothesis. The Bayes factor being consistent, so is the Dickey-Savage ratio representation… As you rightly point out, the problem vanishes in discrete settings. To make the Dickey-Savage ratio work in continuous settings, I think imposing to the prior and posterior densities both to be continuous should be sufficient if the parameter space is connected.
In a very few days, I hope this will become clearer with the note we are completing on this (the R programme for the example is written and working). The implementation of the modified Gibbs sampler is straightforward and the numerical results coincide with other approximations to the Bayes factor, like Chib’s and the harmonic ones.