Archive for testing as mixture estimation

mixture modelling for testing hypotheses

Posted in Books, Statistics, University life with tags , , , , , , , , , , on January 4, 2019 by xi'an

After a fairly long delay (since the first version was posted and submitted in December 2014), we eventually revised and resubmitted our paper with Kaniav Kamary [who has now graduated], Kerrie Mengersen, and Judith Rousseau on the final day of 2018. The main reason for this massive delay is mine’s, as I got fairly depressed by the general tone of the dozen of reviews we received after submitting the paper as a Read Paper in the Journal of the Royal Statistical Society. Despite a rather opposite reaction from the community (an admittedly biased sample!) including two dozens of citations in other papers. (There seems to be a pattern in my submissions of Read Papers, witness our earlier and unsuccessful attempt with Christophe Andrieu in the early 2000’s with the paper on controlled MCMC, leading to 121 citations so far according to G scholar.) Anyway, thanks to my co-authors keeping up the fight!, we started working on a revision including stronger convergence results, managing to show that the approach leads to an optimal separation rate, contrary to the Bayes factor which has an extra √log(n) factor. This may sound paradoxical since, while the Bayes factor  converges to 0 under the alternative model exponentially quickly, the convergence rate of the mixture weight α to 1 is of order 1/√n, but this does not mean that the separation rate of the procedure based on the mixture model is worse than that of the Bayes factor. On the contrary, while it is well known that the Bayes factor leads to a separation rate of order √log(n) in parametric models, we show that our approach can lead to a testing procedure with a better separation rate of order 1/√n. We also studied a non-parametric setting where the null is a specified family of distributions (e.g., Gaussians) and the alternative is a Dirichlet process mixture. Establishing that the posterior distribution concentrates around the null at the rate √log(n)/√n. We thus resubmitted the paper for publication, although not as a Read Paper, with hopefully more luck this time!

seminar in Harvard

Posted in Statistics, Travel with tags , , , , , , , , , , on March 16, 2016 by xi'an

harvard2103Next week, I will be in Harvard Monday and Tuesday, visiting friends in the Department of Statistics and giving a seminar. The slides for the talk will be quite similar to those of my talk in Bristol, a few weeks ago. Hopefully, there will not be too much overlap between both audiences! And hopefully I’ll manage to get to my conclusion before all hell breaks loose (which is why I strategically set my conclusion in the early slides!)

ASA’s statement on p-values [#2]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on March 9, 2016 by xi'an

 

It took a visit on FiveThirtyEight to realise the ASA statement I mentioned yesterday was followed by individual entries from most members of the panel, much more diverse and deeper than the statement itself! Without discussing each and all comments, some points I subscribe to

  • it does not make sense to try to replace the p-value and the 5% boundary by something else but of the same nature. This was the main line of our criticism of Valen Johnson’s PNAS paper with Andrew.
  • it does not either make sense to try to come up with a hard set answer about whether or not a certain parameter satisfies a certain constraint. A comparison of predictive performances at or around the observed data sounds much more sensible, if less definitive.
  • the Bayes factor is often advanced as a viable alternative to the p-value in those comments, but it suffers from difficulties exposed in our recent testing by mixture paper, one being the lack of absolute scale.
  • we seem unable to escape the landscape set by Neyman and Pearson when constructing their testing formalism, including the highly unrealistic 0-1 loss function. And the grossly asymmetric opposition between null and alternative hypotheses.
  • the behaviour of any procedure of choice should be evaluated under different scenarios, most likely by simulation, including some accounting for misspecified models. Which may require an extra bit of non-parametrics. And we should abstain from considering further than evaluating whether or not the data looks compatible with each of the scenarios. Or how much through the mixture representation.

R typos

Posted in Books, Kids, R, Statistics, Travel, University life with tags , , , , , , , , on January 27, 2016 by xi'an

Amster14At MCMskv, Alexander Ly (from Amsterdam) pointed out to me some R programming mistakes I made in the introduction to Metropolis-Hastings algorithms I wrote a few months ago for the Wiley on-line encyclopedia! While the outcome (Monte Carlo posterior) of the corrected version is moderately changed this is nonetheless embarrassing! The example (if not the R code) was a mixture of a Poisson and a Geometric distributions borrowed from our testing as mixture paper. Among other things, I used a flat prior on the mixture weights instead of a Beta(1/2,1/2) prior and a simple log-normal random walk on the mean parameter instead of a more elaborate second order expansion discussed in the text. And I also inverted the probabilities of success and failure for the Geometric density. The new version is now available on arXiv, and hopefully soon on the Wiley site, but one (the?) fact worth mentioning here is that the (right) corrections in the R code first led to overflows, because I was using the Beta random walk Be(εp,ε(1-p)) which major drawback I discussed here a few months ago. With the drag that nearly zero or one values of the weight parameter produced infinite values of the density… Adding 1 (or 1/2) to each parameter of the Beta proposal solved the problem. And led to a posterior on the weight still concentrating on the correct corner of the unit interval. In any case, a big thank you to Alexander for testing the R code and spotting out the several mistakes…

a vignette on Metropolis

Posted in Books, Kids, R, Statistics, Travel, University life with tags , , , , , , on April 13, 2015 by xi'an

Over the Atlantic, Dec. 14, 2010Over the past week, I wrote a short introduction to the Metropolis-Hastings algorithm, mostly in the style of our Introduction to Monte Carlo with R book, that is, with very little theory and worked-out illustrations on simple examples. (And partly over the Atlantic on my flight to New York and Columbia.) This vignette is intended for the Wiley StatsRef: Statistics Reference Online Series, modulo possible revision. Again, nothing novel therein, except for new examples.

using mixtures towards Bayes factor approximation

Posted in Statistics, Travel, University life with tags , , , , , , on December 11, 2014 by xi'an

NottPhil O’Neill and Theodore Kypraios from the University of Nottingham have arXived last week a paper on “Bayesian model choice via mixture distributions with application to epidemics and population process models”. Since we discussed this paper during my visit there earlier this year, I was definitely looking forward the completed version of their work. Especially because there are some superficial similarities with our most recent work on… Bayesian model choice via mixtures! (To the point that I misunderstood at the beginning their proposal for ours…)

The central idea in the paper is that, by considering the mixture likelihood

\alpha\ell_1(\theta_1|\mathbf{x})+(1-\alpha)\ell_2(\theta_2|\mathbf{x})

where x corresponds to the entire sample, it is straighforward to relate the moments of α with the Bayes factor, namely

\mathfrak{B}_{12}=\dfrac{\mathbb{E}[\alpha]-\mathbb{E}[\alpha^2]-\mathbb{E}[\alpha|\mathbf{x}](1-\mathbb{E}[\alpha])}{\mathbb{E}[\alpha]\mathbb{E}[\alpha|\mathbf{x}]-\mathbb{E}[\alpha^2]}

which means that estimating the mixture weight α by MCMC is equivalent to estimating the Bayes factor.

What puzzled me at first was that the mixture weight is in fine estimated with a single “datapoint”, made of the entire sample. So the posterior distribution on α is hardly different from the prior, since it solely varies by one unit! But I came to realise that this is a numerical tool and that the estimator of α is not meaningful  from a statistical viewpoint (thus differing completely from our perspective). This explains why the Beta prior on α can be freely chosen so that the mixing and stability of the Markov chain is improved: This parameter is solely an algorithmic entity.

There are similarities between this approach and the pseudo-prior encompassing perspective of Carlin and Chib (1995), even though the current version does not require pseudo-priors, using true priors instead. But thinking of weakly informative priors and of the MCMC consequence (see below) leads me to wonder if pseudo-priors would not help in this setting…

Another aspect of the paper that still puzzles me is that the MCMC algorithm mixes at all: indeed, depending on the value of the binary latent variable z, one of the two parameters is updated from the true posterior while the other is updated from the prior. It thus seems unlikely that the value of z would change quickly. Creating a huge imbalance in the prior can counteract this difference, but the same problem occurs once z has moved from 0 to 1 or from 1 to 0. It seems to me that resorting to a common parameter [if possible] and using as a proposal the model-based posteriors for both parameters is the only way out of this conundrum. (We do certainly insist on this common parametrisation in our approach as it is paramount to the use of improper priors.)

“In contrast, we consider the case where there is only one datum.”

The idea in the paper is therefore fully computational and relates to other linkage methods that create bridges between two models. It differs from our new notion of Bayesian testing in that we consider estimating the mixture between the two models in comparison, hence considering instead the mixture

\prod_{i=1}^n\alpha f_1(x_i|\theta_1)+(1-\alpha) f_2(x_i|\theta_2)

which is another model altogether and does not recover the original Bayes factor (Bayes factor that we altogether dismiss in favour of the posterior median of α and its entire distribution).

the demise of the Bayes factor

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , , on December 8, 2014 by xi'an

alphaPost1nalphaPost2n

With Kaniav Kamary, Kerrie Mengersen, and Judith Rousseau, we have just arXived (and submitted) a paper entitled “Testing hypotheses via a mixture model”. (We actually presented some earlier version of this work in Cancũn, Vienna, and Gainesville, so you may have heard of it already.) The notion we advocate in this paper is to replace the posterior probability of a model or an hypothesis with the posterior distribution of the weights of a mixture of the models under comparison. That is, given two models under comparison,

\mathfrak{M}_1:x\sim f_1(x|\theta_1) \text{ versus } \mathfrak{M}_2:x\sim f_2(x|\theta_2)

we propose to estimate the (artificial) mixture model

\mathfrak{M}_{\alpha}:x\sim\alpha f_1(x|\theta_1) + (1-\alpha) f_2(x|\theta_2)

and in particular derive the posterior distribution of α. One may object that the mixture model is neither of the two models under comparison but this is the case at the boundary, i.e., when α=0,1. Thus, if we use prior distributions on α that favour the neighbourhoods of 0 and 1, we should be able to see the posterior concentrate near 0 or 1, depending on which model is true. And indeed this is the case: for any given Beta prior on α, we observe a higher and higher concentration at the right boundary as the sample size increases. And establish a convergence result to this effect. Furthermore, the mixture approach offers numerous advantages, among which [verbatim from the paper]:

Continue reading