## an infinite regress of hierarchical priors

Posted in Statistics with tags , , , , on October 22, 2020 by xi'an

An interesting musing posted on X validated about the impact of perpetuating prior models on the parameters of closer priors till infinity. Using a hierarchy of exponential priors and an exponential sampling distribution. If the (temporary) top prior at level d is Exp(1), the marginal distribution of the exponential sample corresponds to a ratio of two independent products of Exp(1) random variables

$X= \frac{\epsilon_{2\lfloor d/2 \rfloor}\cdots \epsilon_0}{\epsilon_{2\lfloor (d-1)/2 \rfloor+1}\cdots \epsilon_1}$

And both terms converge almost surely to zero with d (by Kakutani’s product martingale theorem). Thus ending up in an indeterminate ratio. Hierarchy has to stop somewhere! (Or, assuming an expectation of one everywhere, the variability at each level has to decrease fast enough.)

Posted in Books, Statistics with tags , , , , , , , on June 22, 2018 by xi'an

Bernard Delyon and François Portier just recently arXived a paper on population or evolutionary importance sampling, pointed out to me by Víctor Elvira. Changing the proposal or importance sampler at each iteration. And averaging the estimates across iterations, but also mentioning AMIS. While drawing a distinction that I do not understand, since the simulation cost remains the same, while improving the variance of the resulting estimator. (But the paper points out later that their martingale technique of proof does not apply in this AMIS case.) Some interesting features of the paper are that

• convergence occurs when the total number of simulations grows to infinity, which is the most reasonable scale for assessing the worth of the method;
• some optimality in the oracle sense is established for the method;
• an improvement is found by eliminating outliers and favouring update rate over simulation rate (at a constant cost). Unsurprisingly, the optimal weight of the t-th estimator is given by its inverse variance (with eqn (13) missing an inversion step). Although it relies on the normalised versions of the target and proposal densities, since it assumes the expectation of the ratio is equal to one.

When updating the proposal or importance distribution, the authors consider a parametric family with the update in the parameter being driven by moment or generalised moment matching, or Kullback reduction as in our population Monte Carlo paper. The interesting technical aspects of the paper include the use of martingale and empirical risk arguments. All in all, quite a pleasant surprise to see some follow-up to our work on that topic, more than 10 years later.

## Peter Hall (1951-2016)

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , on January 10, 2016 by xi'an

I just heard that Peter Hall passed away yesterday in Melbourne. Very sad news from down under. Besides being a giant in the fields of statistics and probability, with an astounding publication record, Peter was also a wonderful man and so very much involved in running local, national and international societies. His contributions to the field and the profession are innumerable and his loss impacts the entire community. Peter was a regular visitor at Glasgow University in the 1990s and I crossed paths with  him a few times, appreciating his kindness as well as his highest dedication to research. In addition, he was a gifted photographer and I recall that the [now closed] wonderful guest-house where we used to stay at the top of Hillhead had a few pictures of his taken in the Highlands and framed on its walls. (If I remember well, there were also beautiful pictures of the Belgian countryside by him at CORE, in Louvain-la-Neuve.) I think the last time we met was in Melbourne, three years ago… Farewell, Peter, you certainly left an indelible print on a lot of us.

[Song Chen from Beijing University has created a memorial webpage for Peter Hall to express condolences and share memories.]

## On Congdon’s estimator

Posted in Statistics, University life with tags , , , on August 29, 2011 by xi'an

I got the following email from Bob:

I’ve been looking at some methods for Bayesian model selection, and read your critique in Bayesian Analysis of Peter Congdon’s method. I was wondering if it could be fixed simply by including the prior densities of the pseudo-priors in the calculation of P(M=k|y), i.e. simply removing the approximation in Congdon’s eqn. 3 so that the product over the parameters of the other models (i.e. j≠k) is included in the calculation of $P(M=k|y, \theta^(t))$? This seems an easy fix, so I’m wondering why you didn’t suggest it.

This relates to our Bayesian Analysis criticism of Peter Congdon’s approximation of posterior model probabilities. The difficulty with the estimator is that it uses simulations from the separate [model-based] posteriors when it should rely on simulations from the marginal [model-integrated] posterior (in order to satisfy an unbiasedness property). After a few email exchanges with Bob, I think I understand correctly the fix he proposes, i.e. that the “other model” parameters are simulated from the corresponding model-based posteriors, rather than being jointly simulated with the parameter from the “current model” from the joint posterior. However, the correct weight in Carlin and Chib’s approximation then involves the product of the [model-based] posteriors (including the normalisation constant) as “pseudo-priors”. I also think that even if the exact [model-based] posteriors were used, the fact that the weight involves a product over a large number of densities should induce an asymmetric behaviour. Indeed this product, while on average equal to one (or 1/M if M is the number of models), is more likely to take very small values than to take very large values (by a supermartingale argument)…

## Bayes factors and martingales

Posted in R, Statistics with tags , , , on August 11, 2011 by xi'an

A surprising paper came out in the last issue of Statistical Science, linking martingales and Bayes factors. In the historical part, the authors (Shafer, Shen, Vereshchagin and Vovk) recall that martingales were popularised by Martin-Löf, who is also influential in the theory of algorithmic randomness. A property of test martingales (i.e., martingales that are non negative with expectation one) is that

$\mathbb{P}(X^*_t \ge c) = \mathbb{P}(\sup_{s\le t}X_s \ge c) \le 1/c$

which makes their sequential maxima p-values of sorts. I had never thought about likelihood ratios this way, but it is true that a (reciprocal) likelihood ratio

$\prod_{i=1}^n \dfrac{q(x_i)}{p(x_i)}$

is a martingale when the observations are distributed from p.  The authors define a Bayes factor (for P) as satisfying (Section 3.2)

$\int (1/B) \text{d}P \le 1$

which I find hard to relate to my understanding of Bayes factors because there is no prior nor parameter involved. I first thought there was a restriction to simple null hypotheses. However, there is a composite versus composite example (Section 8.5, Binomial probability being less than or large than 1/2). So P would then be the marginal likelihood. In this case the test martingale is

$X_t = \dfrac{P(B_{t+1}\le S_t)}{P(B_{t+1}\ge S_t)}\,, \quad B_t \sim \mathcal{B}(t,1/2)\,,\, S_t\sim \mathcal{B}(t,\theta)\,.$

Simulating the martingale is straightforward, however I do not recover the picture they obtain (Fig. 6):

x=sample(0:1,10^4,rep=TRUE,prob=c(1-theta,theta))
s=cumsum(x)
ma=pbinom(s,1:10^4,.5,log.p=TRUE)-pbinom(s-1,1:10^4,.5,log.p=TRUE,lower.tail=FALSE)
plot(ma,type="l")
lines(cummin(ma),lty=2) #OR lines(cummin(ma),lty=2)
lines(log(0.1)+0.9*cummin(ma),lty=2,col="steelblue") #OR cummax


When theta is not 1/2, the sequence goes down almost linearly to -infinity.

but when theta is 1/2, I more often get a picture where max and min are obtained in the first steps:

Obviously, I have not read the paper with the attention it deserved, so there may be features I missed that could be relevant for the Bayesian analysis of the behaviour of Bayes factors. However, at this stage, I fail to see the point of the “Puzzle for Bayesians” (Section 8.6) since the conclusion that “it is legitimate to collect data until a point has been disproven but not legitimate to interpret this data as proof of an alternative hypothesis within the model” is not at odds with a Bayesian interpretation of the test outcome: when the Bayes factor favours a model, it means this model is the most likely of the two given the data, not this model is true.