Archive for ABC model choice

locusts in a random forest

Posted in pictures, Statistics, University life with tags , , , , , , , , , , , on July 19, 2019 by xi'an

My friends from Montpellier, where I am visiting today, Arnaud Estoup, Jean-Michel Marin, and Louis Raynal, along with their co-authors, have recently posted on biorXiv a paper using ABC-RF (Random Forests) to analyse the divergence of two populations of desert locusts in Africa. (I actually first heard of their paper by an unsolicited email from one of these self-declared research aggregates.)

“…the present study is the first one using recently developed ABC-RF algorithms to carry out inferences about both scenario choice and parameter estimation, on a real multi-locus microsatellite dataset. It includes and illustrates three novelties in statistical analyses (…): model grouping analyses based on several key evolutionary events, assessment of the quality of predictions to evaluate the robustness of our inferences, and incorporation of previous information on the mutational setting of the used microsatellite markers”.

The construction of the competing models (or scenarios) is built upon data of past precipitations and desert evolution spanning several interglacial periods, back to the middle Pleistocene, concluding at a probable separation in the middle-late stages of the Holocene, which corresponds to the last transition from humid to arid conditions in the African continent. The probability of choosing the wrong model is exploited to determine which model(s) lead(s) to a posterior [ABC] probability lower than the corresponding prior probability, and only one scenario stands this test. As in previous ABC-RF implementations, the summary statistics are complemented by pure noise statistics in order to determine a barrier in the collection of statistics, even though those just above the noise elements (which often cluster together) may achieve better Gini importance by mere chance. An aspect of the paper that I particularly like is the discussion of the various prior modellings one can derive from existing information (or lack thereof) and the evaluation of the impact of these modellings on the resulting inference based on simulated pseudo-data.

over-confident about mis-specified models?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on April 30, 2019 by xi'an

Ziheng Yang and Tianqui Zhu published a paper in PNAS last year that criticises Bayesian posterior probabilities used in the comparison of models under misspecification as “overconfident”. The paper is written from a phylogeneticist point of view, rather than from a statistician’s perspective, as shown by the Editor in charge of the paper [although I thought that, after Steve Fienberg‘s intervention!, a statistician had to be involved in a submission relying on statistics!] a paper , but the analysis is rather problematic, at least seen through my own lenses… With no statistical novelty, apart from looking at the distribution of posterior probabilities in toy examples. The starting argument is that Bayesian model comparison is often reporting posterior probabilities in favour of a particular model that are close or even equal to 1.

“The Bayesian method is widely used to estimate species phylogenies using molecular sequence data. While it has long been noted to produce spuriously high posterior probabilities for trees or clades, the precise reasons for this over confidence are unknown. Here we characterize the behavior of Bayesian model selection when the compared models are misspecified and demonstrate that when the models are nearly equally wrong, the method exhibits unpleasant polarized behaviors,supporting one model with high confidence while rejecting others. This provides an explanation for the empirical observation of spuriously high posterior probabilities in molecular phylogenetics.”

The paper focus on the behaviour of posterior probabilities to strongly support a model against others when the sample size is large enough, “even when” all models are wrong, the argument being apparently that the correct output should be one of equal probability between models, or maybe a uniform distribution of these model probabilities over the probability simplex. Why should it be so?! The construction of the posterior probabilities is based on a meta-model that assumes the generating model to be part of a list of mutually exclusive models. It does not account for cases where “all models are wrong” or cases where “all models are right”. The reported probability is furthermore epistemic, in that it is relative to the measure defined by the prior modelling, not to a promise of a frequentist stabilisation in a ill-defined asymptotia. By which I mean that a 99.3% probability of model M¹ being “true”does not have a universal and objective meaning. (Moderation note: the high polarisation of posterior probabilities was instrumental in our investigation of model choice with ABC tools and in proposing instead error rates in ABC random forests.)

The notion that two models are equally wrong because they are both exactly at the same Kullback-Leibler distance from the generating process (when optimised over the parameter) is such a formal [or cartoonesque] notion that it does not make much sense. There is always one model that is slightly closer and eventually takes over. It is also bizarre that the argument does not account for the complexity of each model and the resulting (Occam’s razor) penalty. Even two models with a single parameter are not necessarily of intrinsic dimension one, as shown by DIC. And thus it is not a surprise if the posterior probability mostly favours one versus the other. In any case, an healthily sceptic approach to Bayesian model choice means looking at the behaviour of the procedure (Bayes factor, posterior probability, posterior predictive, mixture weight, &tc.) under various assumptions (model M¹, M², &tc.) to calibrate the numerical value, rather than taking it at face value. By which I do not mean a frequentist evaluation of this procedure. Actually, it is rather surprising that the authors of the PNAS paper do not jump on the case when the posterior probability of model M¹ say is uniformly distributed, since this would be a perfect setting when the posterior probability is a p-value. (This is also what happens to the bootstrapped version, see the last paragraph of the paper on p.1859, the year Darwin published his Origin of Species.)

a book and three chapters on ABC

Posted in Statistics with tags , , , , , , , , , , on January 9, 2019 by xi'an

In connection with our handbook on mixtures being published, here are three chapters I contributed to from the Handbook of ABC, edited by Scott Sisson, Yanan Fan, and Mark Beaumont:

6. Likelihood-free Model Choice, by J.-M. Marin, P. Pudlo, A. Estoup and C.P. Robert

12. Approximating the Likelihood in ABC, by  C. C. Drovandi, C. Grazian, K. Mengersen and C.P. Robert

17. Application of ABC to Infer about the Genetic History of Pygmy Hunter-Gatherers Populations from Western Central Africa, by A. Estoup, P. Verdu, J.-M. Marin, C. Robert, A. Dehne-Garcia, J.-M. Cornuet and P. Pudlo

ABC variable selection

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on July 18, 2018 by xi'an

Prior to the ISBA 2018 meeting, Yi Liu, Veronika Ročková, and Yuexi Wang arXived a paper on relying ABC for finding relevant variables, which is a very original approach in that ABC is not as much the object as it is a tool. And which Veronika considered during her Susie Bayarri lecture at ISBA 2018. In other words, it is not about selecting summary variables for running ABC but quite the opposite, selecting variables in a non-linear model through an ABC step. I was going to separate the two selections into algorithmic and statistical selections, but it is more like projections in the observation and covariate spaces. With ABC still providing an appealing approach to approximate the marginal likelihood. Now, one may wonder at the relevance of ABC for variable selection, aka model choice, given our warning call of a few years ago. But the current paper does not require low-dimension summary statistics, hence avoids the difficulty with the “other” Bayes factor.

In the paper, the authors consider a spike-and… forest prior!, where the Bayesian CART selection of active covariates proceeds through a regression tree, selected covariates appearing in the tree and others not appearing. With a sparsity prior on the tree partitions and this new ABC approach to select the subset of active covariates. A specific feature is in splitting the data, one part to learn about the regression function, simulating from this function and comparing with the remainder of the data. The paper further establishes that ABC Bayesian Forests are consistent for variable selection.

“…we observe a curious empirical connection between π(θ|x,ε), obtained with ABC Bayesian Forests  and rescaled variable importances obtained with Random Forests.”

The difference with our ABC-RF model choice paper is that we select summary statistics [for classification] rather than covariates. For instance, in the current paper, simulation of pseudo-data will depend on the selected subset of covariates, meaning simulating a model index, and then generating the pseudo-data, acceptance being a function of the L² distance between data and pseudo-data. And then relying on all ABC simulations to find which variables are in more often than not to derive the median probability model of Barbieri and Berger (2004). Which does not work very well if implemented naïvely. Because of the immense size of the model space, it is quite hard to find pseudo-data close to actual data, resulting in either very high tolerance or very low acceptance. The authors get over this difficulty by a neat device that reminds me of fractional or intrinsic (pseudo-)Bayes factors in that the dataset is split into two parts, one that learns about the posterior given the model index and another one that simulates from this posterior to compare with the left-over data. Bringing simulations closer to the data. I do not remember seeing this trick before in ABC settings, but it is very neat, assuming the small data posterior can be simulated (which may be a fundamental reason for the trick to remain unused!). Note that the split varies at each iteration, which means there is no impact of ordering the observations.

practical Bayesian inference [book review]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , on April 26, 2018 by xi'an

[Disclaimer: I received this book of Coryn Bailer-Jones for a review in the International Statistical Review and intend to submit a revised version of this post as my review. As usual, book reviews on the ‘Og are reflecting my own definitely personal and highly subjective views on the topic!]

It is always a bit of a challenge to review introductory textbooks as, on the one hand, they are rarely written at the level and with the focus one would personally choose to write them. And, on the other hand, it is all too easy to find issues with the material presented and the way it is presented… So be warned and proceed cautiously! In the current case, Practical Bayesian Inference tries to embrace too much, methinks, by starting from basic probability notions (that should not be unknown to physical scientists, I believe, and which would avoid introducing a flat measure as a uniform distribution over the real line!, p.20). All the way to running MCMC for parameter estimation, to compare models by Bayesian evidence, and to cover non-parametric regression and bootstrap resampling. For instance, priors only make their apparition on page 71. With a puzzling choice of an improper prior (?) leading to an improper posterior (??), which is certainly not the smoothest entry on the topic. “Improper posteriors are a bad thing“, indeed! And using truncation to turn them into proper distributions is not a clear improvement as the truncation point will significantly impact the inference. Discussing about the choice of priors from the beginning has some appeal, but it may also create confusion in the novice reader (although one never knows!). Even asking about “what is a good prior?” (p.73) is not necessarily the best (and my recommended) approach to a proper understanding of the Bayesian paradigm. And arguing about the unicity of the prior (p.119) clashes with my own view of the prior being primarily a reference measure rather than an ideal summary of the available information. (The book argues at some point that there is no fixed model parameter, another and connected source of disagreement.) There is a section on assigning priors (p.113), but it only covers the case of a possibly biased coin without much realism. A feature common to many Bayesian textbooks though. To return to the issue of improper priors (and posteriors), the book includes several warnings about the danger of hitting an undefined posterior (still called a distribution), without providing real guidance on checking for its definition. (A tough question, to be sure.)

“One big drawback of the Metropolis algorithm is that it uses a fixed step size, the magnitude of which can hardly be determined in advance…”(p.165)

When introducing computational techniques, quadratic (or Laplace) approximation of the likelihood is mingled with kernel estimators, which does not seem appropriate. Proposing to check convergence and calibrate MCMC via ACF graphs is helpful in low dimensions, but not in larger dimensions. And while warning about the dangers of forgetting the Jacobians in the Metropolis-Hastings acceptance probability when using a transform like η=ln θ is well-taken, the loose handling of changes of variables may be more confusing than helpful (p.167). Discussing and providing two R codes for the (standard) Metropolis algorithm may prove too much. Or not. But using a four page R code for fitting a simple linear regression with a flat prior (pp.182-186) may definitely put the reader off! Even though I deem the example a proper experiment in setting a Metropolis algorithm and appreciate the detailed description around the R code itself. (I just take exception at the paragraph on running the code with two or even one observation, as the fact that “the Bayesian solution always exists” (p.188) [under a proper prior] is not necessarily convincing…)

“In the real world we cannot falsify a hypothesis or model any more than we “truthify” it (…) All we can do is ask which of the available models explains the data best.” (p.224)

In a similar format, the discussion on testing of hypotheses starts with a lengthy presentation of classical tests and p-values, the chapter ending up with a list of issues. Most of them reasonable in my own referential. I also concur with the conclusive remarks quoted above that what matters is a comparison of (all relatively false) models. What I less agree [as predictable from earlier posts and papers] with is the (standard) notion that comparing two models with a Bayes factor follows from the no information (in order to avoid the heavily loaded non-informative) prior weights of ½ and ½. Or similarly that the evidence is uniquely calibrated. Or, again, using a truncated improper prior under one of the assumptions (with the ghost of the Jeffreys-Lindley paradox lurking nearby…).  While the Savage-Dickey approximation is mentioned, the first numerical resolution of the approximation to the Bayes factor is via simulations from the priors. Which may be very poor in the situation of vague and uninformative priors. And then the deadly harmonic mean makes an entry (p.242), along with nested sampling… There is also a list of issues about Bayesian model comparison, including (strong) dependence on the prior, dependence on irrelevant alternatives, lack of goodness of fit tests, computational costs, including calls to possibly intractable likelihood function, ABC being then mentioned as a solution (which it is not, mostly).

Continue reading