## Bayes factors and martingales

A surprising paper came out in the last issue of Statistical Science, linking martingales and Bayes factors. In the historical part, the authors (Shafer, Shen, Vereshchagin and Vovk) recall that martingales were popularised by Martin-Löf, who is also influential in the theory of algorithmic randomness. A property of test martingales (i.e., martingales that are non negative with expectation one) is that

$\mathbb{P}(X^*_t \ge c) = \mathbb{P}(\sup_{s\le t}X_s \ge c) \le 1/c$

which makes their sequential maxima p-values of sorts. I had never thought about likelihood ratios this way, but it is true that a (reciprocal) likelihood ratio

$\prod_{i=1}^n \dfrac{q(x_i)}{p(x_i)}$

is a martingale when the observations are distributed from p.  The authors define a Bayes factor (for P) as satisfying (Section 3.2)

$\int (1/B) \text{d}P \le 1$

which I find hard to relate to my understanding of Bayes factors because there is no prior nor parameter involved. I first thought there was a restriction to simple null hypotheses. However, there is a composite versus composite example (Section 8.5, Binomial probability being less than or large than 1/2). So P would then be the marginal likelihood. In this case the test martingale is

$X_t = \dfrac{P(B_{t+1}\le S_t)}{P(B_{t+1}\ge S_t)}\,, \quad B_t \sim \mathcal{B}(t,1/2)\,,\, S_t\sim \mathcal{B}(t,\theta)\,.$

Simulating the martingale is straightforward, however I do not recover the picture they obtain (Fig. 6):

x=sample(0:1,10^4,rep=TRUE,prob=c(1-theta,theta))
s=cumsum(x)
ma=pbinom(s,1:10^4,.5,log.p=TRUE)-pbinom(s-1,1:10^4,.5,log.p=TRUE,lower.tail=FALSE)
plot(ma,type="l")
lines(cummin(ma),lty=2) #OR lines(cummin(ma),lty=2)
lines(log(0.1)+0.9*cummin(ma),lty=2,col="steelblue") #OR cummax


When theta is not 1/2, the sequence goes down almost linearly to -infinity.

but when theta is 1/2, I more often get a picture where max and min are obtained in the first steps:

Obviously, I have not read the paper with the attention it deserved, so there may be features I missed that could be relevant for the Bayesian analysis of the behaviour of Bayes factors. However, at this stage, I fail to see the point of the “Puzzle for Bayesians” (Section 8.6) since the conclusion that “it is legitimate to collect data until a point has been disproven but not legitimate to interpret this data as proof of an alternative hypothesis within the model” is not at odds with a Bayesian interpretation of the test outcome: when the Bayes factor favours a model, it means this model is the most likely of the two given the data, not this model is true.

### 2 Responses to “Bayes factors and martingales”

1. Thanks a lot for this explanation, Paulo!

2. Paulo Marques Says:

Dear Prof. Robert, this is just to share may current lack of understanding of this paper. In particular, I’d like to have a better understanding of what counts as a Bayes factor for them.

With the usual notation, we have a parameter $\Theta$ and a vector of observations $X$. Let me try to make sense of their definition of a Bayes factor at least when we have simple null and alternative hypothesis: $H_0:\Theta=\theta_0$ and $H_1:\Theta=\theta_1$.

Define $Q_i(A)=P\left(X\in A\mid\Theta=\theta_i\right)$, for $i=0,1$, and suppose that both $Q_i$ are dominated by a $\sigma$-finite measure $\lambda$, with Radon-Nikodym derivatives $dQ_i/d\lambda=f_i$. Then, we have the usual result for the Bayes factor

$\frac{P\left(\Theta=\theta_0\mid X=x\right)}{P\left(\Theta=\theta_1\mid X=x\right)} \Bigg/ \frac{P\left(\Theta=\theta_0\right)}{P\left(\Theta=\theta_1\right)} = \frac{f_0(x)}{f_1(x)} = B_{01}(x) \, .$

Now, if we suppose that $Q_0$ and $Q_1$ are equivalent, using the chain rule for the Radon-Nikodym derivatives we have

$B_{01}(x) = \frac{f_0(x)}{f_1(x)} = \frac{dQ_0}{d\lambda}(x) \Bigg/ \frac{dQ_1}{d\lambda}(x) = \frac{dQ_0}{dQ_1}(x) \, .$

Since we have

$1 = Q_1(\mathcal{X}) = \int_{\mathcal{X}} \frac{dQ_1}{dQ_0} \,dQ_0 \, ,$

the Bayes factor satisfies

$\int_{\mathcal{X}} \frac{1}{B_{01}} \,dQ_0 = 1 \, .$

It seems that they take this property as their definition of a Bayes factor. Actually, they do more: when $Q_0$ and $Q_1$ are not equivalent, using the Lebesgue decomposition we can write

$Q_1 = Q_1^{ac} + Q_1^s \, ,$

where $Q_1^{ac}$ is dominated by $Q_0$, and $Q_1^s$ and $Q_0$ are mutually singular, so that

$Q_1(A) = \int_A \frac{dQ_1^{ac}}{dQ_0} \,dQ_0 + Q_1^s(A) \, ,$

and then, since $Q_1(\mathcal{X})=1$, we have

$\int_{\mathcal{X}} \frac{dQ_1^{ac}}{dQ_0} \,dQ_0 \leq 1 \, .$\$

Now, as it seems to me, they make an analogy with the former case and come to their definition of a Bayes factor as any $B_{01}$ that satisfies

$\int_{\mathcal{X}} \frac{1}{B_{01}} \,dQ_0 \leq 1 \, .$

I can’t see clearly the advantage of this extended definition with the inequality sign. Also, I don’t understand how all of this extends to cases where we have composite hypothesis.

This site uses Akismet to reduce spam. Learn how your comment data is processed.