## Posterior predictive p-values and the convex order

Posted in Books, Statistics, University life with tags , , , , , , , , , on December 22, 2014 by xi'an

Patrick Rubin-Delanchy and Daniel Lawson [of Warhammer fame!] recently arXived a paper we had discussed with Patrick when he visited Andrew and I last summer in Paris. The topic is the evaluation of the posterior predictive probability of a larger discrepancy between data and model

$\mathbb{P}\left( f(X|\theta)\ge f(x^\text{obs}|\theta) \,|\,x^\text{obs} \right)$

which acts like a Bayesian p-value of sorts. I discussed several times the reservations I have about this notion on this blog… Including running one experiment on the uniformity of the ppp while in Duke last year. One item of those reservations being that it evaluates the posterior probability of an event that does not exist a priori. Which is somewhat connected to the issue of using the data “twice”.

“A posterior predictive p-value has a transparent Bayesian interpretation.”

Another item that was suggested [to me] in the current paper is the difficulty in defining the posterior predictive (pp), for instance by including latent variables

$\mathbb{P}\left( f(X,Z|\theta)\ge f(x^\text{obs},Z^\text{obs}|\theta) \,|\,x^\text{obs} \right)\,,$

which reminds me of the multiple possible avatars of the BIC criterion. The question addressed by Rubin-Delanchy and Lawson is how far from the uniform distribution stands this pp when the model is correct. The main result of their paper is that any sub-uniform distribution can be expressed as a particular posterior predictive. The authors also exhibit the distribution that achieves the bound produced by Xiao-Li Meng, Namely that

$\mathbb{P}(P\le \alpha) \le 2\alpha$

where P is the above (top) probability. (Hence it is uniform up to a factor 2!) Obviously, the proximity with the upper bound only occurs in a limited number of cases that do not validate the overall use of the ppp. But this is certainly a nice piece of theoretical work.

## reliable ABC model choice via random forests

Posted in pictures, R, Statistics, University life with tags , , , , , , , on October 29, 2014 by xi'an

After a somewhat prolonged labour (!), we have at last completed our paper on ABC model choice with random forests and submitted it to PNAS for possible publication. While the paper is entirely methodological, the primary domain of application of ABC model choice methods remains population genetics and the diffusion of this new methodology to the users is thus more likely via a media like PNAS than via a machine learning or statistics journal.

When compared with our recent update of the arXived paper, there is not much different in contents, as it is mostly an issue of fitting the PNAS publication canons. (Which makes the paper less readable in the posted version [in my opinion!] as it needs to fit the main document within the compulsory six pages, relegated part of the experiments and of the explanations to the Supplementary Information section.)

## posterior predictive distributions of Bayes factors

Posted in Books, Kids, Statistics with tags , , , on October 8, 2014 by xi'an

Once a Bayes factor B(y)  is computed, one needs to assess its strength. As repeated many times here, Jeffreys’ scale has no validation whatsoever, it is simply a division of the (1,∞) range into regions of convenience. Following earlier proposals in the literature (Box, 1980; García-Donato and Chen, 2005; Geweke and Amisano, 2008), an evaluation of this strength within the issue at stake, i.e. the comparison of two models, can be based on the predictive distribution. While most authors (like García-Donato and Chen) consider the prior predictive, I think using the posterior predictive distribution is more relevant since

1. it exploits the information contained in the data y, thus concentrates on a region of relevance in the parameter space(s), which is especially interesting in weakly informative settings (even though we should abstain from testing in those cases, dixit Andrew);
2. it reproduces the behaviour of the Bayes factor B(x) for values x of the observation similar to the original observation y;
3. it does not hide issues of indeterminacy linked with improper priors: the Bayes factor B(x) remains indeterminate, even with a well-defined predictive;
4. it does not separate between errors of type I and errors of type II but instead uses the natural summary provided by the Bayesian analysis, namely the predictive distribution π(x|y);
5. as long as the evaluation is not used to reach a decision, there is no issue of “using the data twice”, we are simply producing an estimator of the posterior loss, for instance the (posterior) probability of selecting the wrong model. The Bayes factor B(x) is thus functionally  independent of y, while x is probabilistically dependent on y.

Note that, even though probabilities of errors of type I and errors of type II can be computed, they fail to account for the posterior probabilities of both models. (This is the delicate issue with the solution of García-Donato and Chen.) Another nice feature is that the predictive distribution of the Bayes factor can be computed even in complex settings where ABC needs to be used.

## ABC model choice via random forests [expanded]

Posted in Statistics, University life with tags , , , , , , , , , , , on October 1, 2014 by xi'an

Today, we arXived a second version of our paper on ABC model choice with random forests. Or maybe [A]BC model choice with random forests. Since the random forest is built on a simulation from the prior predictive and no further approximation is used in the process. Except for the computation of the posterior [predictive] error rate. The update wrt the earlier version is that we ran massive simulations throughout the summer, on existing and new datasets. In particular, we have included a Human dataset extracted from the 1000 Genomes Project. Made of 51,250 SNP loci. While this dataset is not used to test new evolution scenarios, we compared six out-of-Africa scenarios, with a possible admixture for Americans of African ancestry. The scenario selected by a random forest procedure posits a single out-of-Africa colonization event with a secondary split into a European and an East Asian population lineages, and a recent genetic admixture between African and European lineages, for Americans of African origin. The procedure reported a high level of confidence since the estimated posterior error rate is equal to zero. The SNP loci were carefully selected using the following criteria: (i) all individuals have a genotype characterized by a quality score (GQ)>10, (ii) polymorphism is present in at least one of the individuals in order to fit the SNP simulation algorithm of Hudson (2002) used in DIYABC V2 (Cornuet et al., 2014), (iii) the minimum distance between two consecutive SNPs is 1 kb in order to minimize linkage disequilibrium between SNP, and (iv) SNP loci showing significant deviation from Hardy-Weinberg equilibrium at a 1% threshold in at least one of the four populations have been removed.

In terms of random forests, we optimised the size of the bootstrap subsamples for all of our datasets. While this optimisation requires extra computing time, it is negligible when compared with the enormous time taken by a logistic regression, which is [yet] the standard ABC model choice approach. Now the data has been gathered, it is only a matter of days before we can send the paper to a journal

## ABC model choice by random forests [guest post]

Posted in pictures, R, Statistics, University life with tags , , , , , , , , , , on August 11, 2014 by xi'an

[Dennis Prangle sent me his comments on our ABC model choice by random forests paper. Here they are! And I appreciate very much contributors commenting on my paper or others, so please feel free to join.]

This paper proposes a new approach to likelihood-free model choice based on random forest classifiers. These are fit to simulated model/data pairs and then run on the observed data to produce a predicted model. A novel “posterior predictive error rate” is proposed to quantify the degree of uncertainty placed on this prediction. Another interesting use of this is to tune the threshold of the standard ABC rejection approach, which is outperformed by random forests.

The paper has lots of thought-provoking new ideas and was an enjoyable read, as well as giving me the encouragement I needed to read another chapter of the indispensable Elements of Statistical Learning However I’m not fully convinced by the approach yet for a few reasons which are below along with other comments.

Alternative schemes

The paper shows that random forests outperform rejection based ABC. I’d like to see a comparison to more efficient ABC model choice algorithms such as that of Toni et al 2009. Also I’d like to see if the output of random forests could be used as summary statistics within ABC rather than as a separate inference method.

Posterior predictive error rate (PPER)

This is proposed to quantify the performance of a classifier given a particular data set. The PPER is the proportion of times the classifier’s most favoured model is incorrect for simulated model/data pairs drawn from an approximation to the posterior predictive. The approximation is produced by a standard ABC analysis.

Misclassification could be due to (a) a poor classifier or (b) uninformative data, so the PPER aggregrates these two sources of uncertainty. I think it is still very desirable to have an estimate of the uncertainty due to (b) only i.e. a posterior weight estimate. However the PPER is useful. Firstly end users may sometimes only care about the aggregated uncertainty. Secondly relative PPER values for a fixed dataset are a useful measure of uncertainty due to (a), for example in tuning the ABC threshold. Finally, one drawback of the PPER is the dependence on an ABC estimate of the posterior: how robust are the results to the details of how this is obtained?

Classification

This paper illustrates an important link between ABC and machine learning classification methods: model choice can be viewed as a classification problem. There are some other links: some classifiers make good model choice summary statistics (Prangle et al 2014) or good estimates of ABC-MCMC acceptance ratios for parameter inference problems (Pham et al 2014). So the good performance random forests makes them seem a generally useful tool for ABC (indeed they are used in the Pham et al al paper).

## ABC model choice by random forests

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , on June 25, 2014 by xi'an

After more than a year of collaboration, meetings, simulations, delays, switches,  visits, more delays, more simulations, discussions, and a final marathon wrapping day last Friday, Jean-Michel Marin, Pierre Pudlo,  and I at last completed our latest collaboration on ABC, with the central arguments that (a) using random forests is a good tool for choosing the most appropriate model and (b) evaluating the posterior misclassification error rather than the posterior probability of a model is an appropriate paradigm shift. The paper has been co-signed with our population genetics colleagues, Jean-Marie Cornuet and Arnaud Estoup, as they provided helpful advice on the tools and on the genetic illustrations and as they plan to include those new tools in their future analyses and DIYABC software.  ABC model choice via random forests is now arXived and very soon to be submitted…

One scientific reason for this fairly long conception is that it took us several iterations to understand the intrinsic nature of the random forest tool and how it could be most naturally embedded in ABC schemes. We first imagined it as a filter from a set of summary statistics to a subset of significant statistics (hence the automated ABC advertised in some of my past or future talks!), with the additional appeal of an associated distance induced by the forest. However, we later realised that (a) further ABC steps were counterproductive once the model was selected by the random forest and (b) including more summary statistics was always beneficial to the performances of the forest and (c) the connections between (i) the true posterior probability of a model, (ii) the ABC version of this probability, (iii) the random forest version of the above, were at best very loose. The above picture is taken from the paper: it shows how the true and the ABC probabilities (do not) relate in the example of an MA(q) model… We thus had another round of discussions and experiments before deciding the unthinkable, namely to give up the attempts to approximate the posterior probability in this setting and to come up with another assessment of the uncertainty associated with the decision. This led us to propose to compute a posterior predictive error as the error assessment for ABC model choice. This is mostly a classification error but (a) it is based on the ABC posterior distribution rather than on the prior and (b) it does not require extra-computations when compared with other empirical measures such as cross-validation, while avoiding the sin of using the data twice!

## posterior predictive p-values

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , on February 4, 2014 by xi'an

Bayesian Data Analysis advocates in Chapter 6 using posterior predictive checks as a way of evaluating the fit of a potential model to the observed data. There is a no-nonsense feeling to it:

“If the model fits, then replicated data generated under the model should look similar to observed data. To put it another way, the observed data should look plausible under the posterior predictive distribution.”

And it aims at providing an answer to the frustrating (frustrating to me, at least) issue of Bayesian goodness-of-fit tests. There are however issues with the implementation, from deciding on which aspect of the data or of the model is to be examined, to the “use of the data twice” sin. Obviously, this is an exploratory tool with little decisional backup and it should be understood as a qualitative rather than quantitative assessment. As mentioned in my tutorial on Sunday (I wrote this post in Duke during O’Bayes 2013), it reminded me of Ratmann et al.’s ABCμ in that they both give reference distributions against which to calibrate the observed data. Most likely with a multidimensional representation. And the “use of the data twice” can be argued for or against, once a data-dependent loss function is built.

“One might worry about interpreting the significance levels of multiple tests or of tests chosen by inspection of the data (…) We do not make [a multiple test] adjustment, because we use predictive checks to see how particular aspects of the data would be expected to appear in replications. If we examine several test variables, we would not be surprised for some of them not to be fitted by the model-but if we are planning to apply the model, we might be interested in those aspects of the data that do not appear typical.”

The natural objection that having a multivariate measure of discrepancy runs into multiple testing is answered within the book with the reply that the idea is not to run formal tests. I still wonder how one should behave when faced with a vector of posterior predictive p-values (ppp).

The above picture is based on a normal mean/normal prior experiment I ran where the ratio prior-to-sampling variance increases from 100 to 10⁴. The ppp is based on the Bayes factor against a zero mean as a discrepancy. It thus grows away from zero very quickly and then levels up around 0.5, reaching only values close to 1 for very large values of x (i.e. never in practice). I find the graph interesting because if instead of the Bayes factor I use the marginal (numerator of the Bayes factor) then the picture is the exact opposite. Which, I presume, does not make a difference for Bayesian Data Analysis, since both extremes are considered as equally toxic… Still, still, still, we are is the same quandary as when using any kind of p-value: what is extreme? what is significant? Do we have again to select the dreaded 0.05?! To see how things are going, I then simulated the behaviour of the ppp under the “true” model for the pair (θ,x). And ended up with the histograms below:

which shows that under the true model the ppp does concentrate around .5 (surprisingly the range of ppp’s hardly exceeds .5 and I have no explanation for this). While the corresponding ppp does not necessarily pick any wrong model, discrepancies may be spotted by getting away from 0.5…

“The p-value is to the u-value as the posterior interval is to the confidence interval. Just as posterior intervals are not, in general, classical confidence intervals, Bayesian p-values are not generally u-values.”

Now, Bayesian Data Analysis also has this warning about ppp’s being not uniform under the true model (u-values), which is just as well considering the above example, but I cannot help wondering if the authors had intended a sort of subliminal message that they were not that far from uniform. And this brings back to the forefront the difficult interpretation of the numerical value of a ppp. That is, of its calibration. For evaluation of the fit of a model. Or for decision-making…