## Bayes factors and martingales

**A** surprising paper came out in the last issue of ** Statistical Science**, linking martingales and Bayes factors. In the historical part, the authors (Shafer, Shen, Vereshchagin and Vovk) recall that martingales were popularised by Martin-Löf, who is also influential in the theory of algorithmic randomness. A property of test martingales (i.e., martingales that are non negative with expectation one) is that

which makes their sequential maxima *p*-values of sorts. I had never thought about likelihood ratios this way, but it is true that a (reciprocal) likelihood ratio

is a martingale when the observations are distributed from *p*. The authors define a Bayes factor (for P) as satisfying (Section 3.2)

which I find hard to relate to my understanding of Bayes factors because there is no prior nor parameter involved. I first thought there was a restriction to simple null hypotheses. However, there is a composite versus composite example (Section 8.5, Binomial probability being less than or large than 1/2). So P would then be the marginal likelihood. In this case the test martingale is

Simulating the martingale is straightforward, however I do not recover the picture they obtain (Fig. 6):

x=sample(0:1,10^4,rep=TRUE,prob=c(1-theta,theta)) s=cumsum(x) ma=pbinom(s,1:10^4,.5,log.p=TRUE)-pbinom(s-1,1:10^4,.5,log.p=TRUE,lower.tail=FALSE) plot(ma,type="l") lines(cummin(ma),lty=2) #OR lines(cummin(ma),lty=2) lines(log(0.1)+0.9*cummin(ma),lty=2,col="steelblue") #OR cummax

**W**hen theta is not 1/2, the sequence goes down almost linearly to -infinity.

but when theta is 1/2, I more often get a picture where max and min are obtained in the first steps:

Obviously, I have not read the paper with the attention it deserved, so there may be features I missed that could be relevant for the Bayesian analysis of the behaviour of Bayes factors. However, at this stage, I fail to see the point of the “Puzzle for Bayesians” (Section 8.6) since the conclusion that “it is legitimate to collect data until a point has been disproven but not legitimate to interpret this data as proof of an alternative hypothesis within the model” is not at odds with a Bayesian interpretation of the test outcome: when the Bayes factor favours a model, it means this model is the most likely of the two given the data, not this model is true.

February 29, 2012 at 2:25 pm

Thanks a lot for this explanation, Paulo!

February 29, 2012 at 5:28 am

Dear Prof. Robert, this is just to share may current lack of understanding of this paper. In particular, I’d like to have a better understanding of what counts as a Bayes factor for them.

With the usual notation, we have a parameter and a vector of observations . Let me try to make sense of their definition of a Bayes factor at least when we have simple null and alternative hypothesis: and .

Define , for , and suppose that both are dominated by a -finite measure , with Radon-Nikodym derivatives . Then, we have the usual result for the Bayes factor

Now, if we suppose that and are equivalent, using the chain rule for the Radon-Nikodym derivatives we have

Since we have

the Bayes factor satisfies

It seems that they take this

propertyas their definition of a Bayes factor. Actually, they do more: when and are not equivalent, using the Lebesgue decomposition we can writewhere is dominated by , and and are mutually singular, so that

and then, since , we have

$

Now, as it seems to me, they make an analogy with the former case and come to their definition of a Bayes factor as any that satisfies

I can’t see clearly the advantage of this extended definition with the inequality sign. Also, I don’t understand how all of this extends to cases where we have composite hypothesis.