## using mixtures towards Bayes factor approximation

Posted in Statistics, Travel, University life with tags , , , , , , on December 11, 2014 by xi'an

Phil O’Neill and Theodore Kypraios from the University of Nottingham have arXived last week a paper on “Bayesian model choice via mixture distributions with application to epidemics and population process models”. Since we discussed this paper during my visit there earlier this year, I was definitely looking forward the completed version of their work. Especially because there are some superficial similarities with our most recent work on… Bayesian model choice via mixtures! (To the point that I misunderstood at the beginning their proposal for ours…)

The central idea in the paper is that, by considering the mixture likelihood

$\alpha\ell_1(\theta_1|\mathbf{x})+(1-\alpha)\ell_2(\theta_2|\mathbf{x})$

where x corresponds to the entire sample, it is straighforward to relate the moments of α with the Bayes factor, namely

$\mathfrak{B}_{12}=\dfrac{\mathbb{E}[\alpha]-\mathbb{E}[\alpha^2]-\mathbb{E}[\alpha|\mathbf{x}](1-\mathbb{E}[\alpha])}{\mathbb{E}[\alpha]\mathbb{E}[\alpha|\mathbf{x}]-\mathbb{E}[\alpha^2]}$

which means that estimating the mixture weight α by MCMC is equivalent to estimating the Bayes factor.

What puzzled me at first was that the mixture weight is in fine estimated with a single “datapoint”, made of the entire sample. So the posterior distribution on α is hardly different from the prior, since it solely varies by one unit! But I came to realise that this is a numerical tool and that the estimator of α is not meaningful  from a statistical viewpoint (thus differing completely from our perspective). This explains why the Beta prior on α can be freely chosen so that the mixing and stability of the Markov chain is improved: This parameter is solely an algorithmic entity.

There are similarities between this approach and the pseudo-prior encompassing perspective of Carlin and Chib (1995), even though the current version does not require pseudo-priors, using true priors instead. But thinking of weakly informative priors and of the MCMC consequence (see below) leads me to wonder if pseudo-priors would not help in this setting…

Another aspect of the paper that still puzzles me is that the MCMC algorithm mixes at all: indeed, depending on the value of the binary latent variable z, one of the two parameters is updated from the true posterior while the other is updated from the prior. It thus seems unlikely that the value of z would change quickly. Creating a huge imbalance in the prior can counteract this difference, but the same problem occurs once z has moved from 0 to 1 or from 1 to 0. It seems to me that resorting to a common parameter [if possible] and using as a proposal the model-based posteriors for both parameters is the only way out of this conundrum. (We do certainly insist on this common parametrisation in our approach as it is paramount to the use of improper priors.)

“In contrast, we consider the case where there is only one datum.”

The idea in the paper is therefore fully computational and relates to other linkage methods that create bridges between two models. It differs from our new notion of Bayesian testing in that we consider estimating the mixture between the two models in comparison, hence considering instead the mixture

$\prod_{i=1}^n\alpha f_1(x_i|\theta_1)+(1-\alpha) f_2(x_i|\theta_2)$

which is another model altogether and does not recover the original Bayes factor (Bayes factor that we altogether dismiss in favour of the posterior median of α and its entire distribution).

## Seminar in Nottingham

Posted in Kids, pictures, Running, Statistics, Travel, University life with tags , , , , , on March 26, 2014 by xi'an

Last Thursday, I gave a seminar in Nottingham, the true birthplace of the Gibbs sampler!, and I had a quite enjoyable half-day of scientific discussions in the Department of Statistics, with a fine evening tasting a local ale in the oldest (?) inn in England (Ye Olde Trip to Jerusalem) and sampling Indian dishes at 4550 Miles [plus or minus epsilon, since the genuine distance is 4200 miles) from Dehli, plus a short morning run on the very green campus. In particular, I discussed with Theo Kypraios and Simon Preston parallel ABC and their recent paper in Statistics and Computing, their use of the splitting technique of Neiswanger et al. I discussed earlier but intended here towards a better ABC approximation since (a) each term in the product could correspond to a single observation and (b) hence no summary statistic was needed and a zero tolerance could be envisioned. The  paper discusses how to handle samples from terms in a product of densities, either by a Gaussian approximation or by a product of kernel estimates. And mentions connections with expectation propagation (EP), albeit not at the ABC level.

A minor idea that came to me during this discussion was to check whether or not a reparameterisation towards a uniform prior was a good idea: the plus of a uniform prior was that the power discussion was irrelevant, making both versions of the parallel MCMC algorithm coincide. The minus was not the computational issue since most priors are from standard families, with easily invertible cdfs, but rather why this was supposed to make a difference. When writing this on the train to Oxford, I started wondering as an ABC implementation is impervious to this reparameterisation. Indeed, simulate θ from π and pseudo-data given θ versus simulate μ from uniform and pseudo-data given T(μ) does not make a difference in the simulated pseudo-sample, hence in the distance selected θ’s, and still in one case the power does not matter while in the other case it does..!

Another discussion I had during my visit led me to conclude a bit hastily that a thesis topic I had suggested to a new PhD student a few months ago had already been considered locally and earlier, although it ended up as a different, more computational than conceptual, perspective (so not all was lost for my student!). In a wider discussion around lunch, we also had an interesting foray on possible alternatives to Bayes factors and their shortcomings, which was a nice preparation to my seminar on giving up posterior probabilities for posterior error estimates. And an opportunity to mention the arXival of a proper scoring rules paper by Phil Dawid, Monica Musio and Laura Ventura, related with the one I had blogged about after the Padova workshop. And then again about a connected paper with Steve Fienberg. This lunch discussion even included some (mild) debate about Murray Aitkin’s integrated likelihood.

As a completely irrelevant aside, this trip gave me the opportunity of a “pilgrimage” to Birmingham New Street train station, 38 years after “landing” for the first time in Britain! And to experience a fresco the multiple delays and apologies of East Midlands trains (“we’re sorry we had to wait for this oil train in York”, “we have lost more time since B’ham”, “running a 37 minutes delay now”, “we apologize for the delay, due to trespassing”, …), the only positive side being that delayed trains made delayed connections possible!

## i-like Oxford [workshop, March 20-21, 2014]

Posted in Statistics, Travel, University life with tags , , , , , on February 5, 2014 by xi'an

There will be another i-like workshop this Spring, over two days in Oxford, St Anne’s College, involving talks by Xiao-Li Meng and Eric Moulines, as well as by researchers from the participating universities. Registration is now open. (I will take part as a part-time participant, travelling from Nottingham where I give a seminar on the 20th.)