Bayes factors and martingales

A surprising paper came out in the last issue of Statistical Science, linking martingales and Bayes factors. In the historical part, the authors (Shafer, Shen, Vereshchagin and Vovk) recall that martingales were popularised by Martin-Löf, who is also influential in the theory of algorithmic randomness. A property of test martingales (i.e., martingales that are non negative with expectation one) is that

\mathbb{P}(X^*_t \ge c) = \mathbb{P}(\sup_{s\le t}X_s \ge c) \le 1/c

which makes their sequential maxima p-values of sorts. I had never thought about likelihood ratios this way, but it is true that a (reciprocal) likelihood ratio

\prod_{i=1}^n \dfrac{q(x_i)}{p(x_i)}

is a martingale when the observations are distributed from p.  The authors define a Bayes factor (for P) as satisfying (Section 3.2)

\int (1/B) \text{d}P \le 1

which I find hard to relate to my understanding of Bayes factors because there is no prior nor parameter involved. I first thought there was a restriction to simple null hypotheses. However, there is a composite versus composite example (Section 8.5, Binomial probability being less than or large than 1/2). So P would then be the marginal likelihood. In this case the test martingale is

X_t = \dfrac{P(B_{t+1}\le S_t)}{P(B_{t+1}\ge S_t)}\,, \quad B_t \sim \mathcal{B}(t,1/2)\,,\, S_t\sim \mathcal{B}(t,\theta)\,.

Simulating the martingale is straightforward, however I do not recover the picture they obtain (Fig. 6):

x=sample(0:1,10^4,rep=TRUE,prob=c(1-theta,theta))
s=cumsum(x)
ma=pbinom(s,1:10^4,.5,log.p=TRUE)-pbinom(s-1,1:10^4,.5,log.p=TRUE,lower.tail=FALSE)
plot(ma,type="l")
lines(cummin(ma),lty=2) #OR lines(cummin(ma),lty=2)
lines(log(0.1)+0.9*cummin(ma),lty=2,col="steelblue") #OR cummax

When theta is not 1/2, the sequence goes down almost linearly to -infinity.

but when theta is 1/2, I more often get a picture where max and min are obtained in the first steps:

Obviously, I have not read the paper with the attention it deserved, so there may be features I missed that could be relevant for the Bayesian analysis of the behaviour of Bayes factors. However, at this stage, I fail to see the point of the “Puzzle for Bayesians” (Section 8.6) since the conclusion that “it is legitimate to collect data until a point has been disproven but not legitimate to interpret this data as proof of an alternative hypothesis within the model” is not at odds with a Bayesian interpretation of the test outcome: when the Bayes factor favours a model, it means this model is the most likely of the two given the data, not this model is true.

2 Responses to “Bayes factors and martingales”

  1. Thanks a lot for this explanation, Paulo!

  2. Paulo Marques Says:

    Dear Prof. Robert, this is just to share may current lack of understanding of this paper. In particular, I’d like to have a better understanding of what counts as a Bayes factor for them.

    With the usual notation, we have a parameter \Theta and a vector of observations X. Let me try to make sense of their definition of a Bayes factor at least when we have simple null and alternative hypothesis: H_0:\Theta=\theta_0 and H_1:\Theta=\theta_1.

    Define Q_i(A)=P\left(X\in A\mid\Theta=\theta_i\right), for i=0,1, and suppose that both Q_i are dominated by a \sigma-finite measure \lambda, with Radon-Nikodym derivatives dQ_i/d\lambda=f_i. Then, we have the usual result for the Bayes factor

    \frac{P\left(\Theta=\theta_0\mid X=x\right)}{P\left(\Theta=\theta_1\mid X=x\right)} \Bigg/ \frac{P\left(\Theta=\theta_0\right)}{P\left(\Theta=\theta_1\right)} = \frac{f_0(x)}{f_1(x)} = B_{01}(x) \, .

    Now, if we suppose that Q_0 and Q_1 are equivalent, using the chain rule for the Radon-Nikodym derivatives we have

    B_{01}(x) = \frac{f_0(x)}{f_1(x)} = \frac{dQ_0}{d\lambda}(x) \Bigg/ \frac{dQ_1}{d\lambda}(x) = \frac{dQ_0}{dQ_1}(x) \, .

    Since we have

    1 = Q_1(\mathcal{X}) = \int_{\mathcal{X}} \frac{dQ_1}{dQ_0} \,dQ_0 \, ,

    the Bayes factor satisfies

    \int_{\mathcal{X}} \frac{1}{B_{01}} \,dQ_0 = 1 \, .

    It seems that they take this property as their definition of a Bayes factor. Actually, they do more: when Q_0 and Q_1 are not equivalent, using the Lebesgue decomposition we can write

    Q_1 = Q_1^{ac} + Q_1^s \, ,

    where Q_1^{ac} is dominated by Q_0, and Q_1^s and Q_0 are mutually singular, so that

    Q_1(A) = \int_A \frac{dQ_1^{ac}}{dQ_0} \,dQ_0 + Q_1^s(A) \, ,

    and then, since Q_1(\mathcal{X})=1, we have

    \int_{\mathcal{X}} \frac{dQ_1^{ac}}{dQ_0} \,dQ_0 \leq 1 \, . $

    Now, as it seems to me, they make an analogy with the former case and come to their definition of a Bayes factor as any B_{01} that satisfies

    \int_{\mathcal{X}} \frac{1}{B_{01}} \,dQ_0 \leq 1 \, .

    I can’t see clearly the advantage of this extended definition with the inequality sign. Also, I don’t understand how all of this extends to cases where we have composite hypothesis.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.