Archive for classification

SPA 2015 Oxford

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on July 14, 2015 by xi'an

Today I gave a talk on Approximate Bayesian model choice via random forests at the yearly SPA (Stochastic Processes and their Applications) 2015 conference, taking place in Oxford (a nice town near Warwick) this year. In Keble College more precisely. The slides are below and while they are mostly repetitions of earlier slides, there is a not inconsequential novelty in the presentation, namely that I included our most recent and current perspective on ABC model choice. Indeed, when travelling to Montpellier two weeks ago, we realised that there was a way to solve our posterior probability conundrum!

campusDespite the heat wave that rolled all over France that week, we indeed figured out a way to estimate the posterior probability of the selected (MAP) model, way that we had deemed beyond our reach in previous versions of the talk and of the paper. The fact that we could not provide an estimate of this posterior probability and had to rely instead on a posterior expected loss was one of the arguments used by the PNAS reviewers in rejecting the paper. While the posterior expected loss remains a quantity worth approximating and reporting, the idea that stemmed from meeting together in Montpellier is that (i) the posterior probability of the MAP is actually related to another posterior loss, when conditioning on the observed summary statistics and (ii) this loss can be itself estimated via a random forest, since it is another function of the summary statistics. A posteriori, this sounds trivial but we had to have a new look at the problem to realise that using ABC samples was not the only way to produce an estimate of the posterior probability! (We are now working on the revision of the paper for resubmission within a few week… Hopefully before JSM!)

not Bayesian enough?!

Posted in Books, Statistics, University life with tags , , , , , , , on January 23, 2015 by xi'an

Elm tree in the park, Parc de Sceaux, Nov. 22, 2011Our random forest paper was alas rejected last week. Alas because I think the approach is a significant advance in ABC methodology when implemented for model choice, avoiding the delicate selection of summary statistics and the report of shaky posterior probability approximation. Alas also because the referees somewhat missed the point, apparently perceiving random forests as a way to project a large collection of summary statistics on a limited dimensional vector as in the Read Paper of Paul Fearnhead and Dennis Prarngle, while the central point in using random forests is the avoidance of a selection or projection of summary statistics.  They also dismissed ou approach based on the argument that the reduction in error rate brought by random forests over LDA or standard (k-nn) ABC is “marginal”, which indicates a degree of misunderstanding of what the classification error stand for in machine learning: the maximum possible gain in supervised learning with a large number of classes cannot be brought arbitrarily close to zero. Last but not least, the referees did not appreciate why we mostly cannot trust posterior probabilities produced by ABC model choice and hence why the posterior error loss is a valuable and almost inevitable machine learning alternative, dismissing the posterior expected loss as being not Bayesian enough (or at all), for “averaging over hypothetical datasets” (which is a replicate of Jeffreys‘ famous criticism of p-values)! Certainly a first time for me to be rejected based on this argument!

ABC model choice by random forests [guest post]

Posted in pictures, R, Statistics, University life with tags , , , , , , , , , , on August 11, 2014 by xi'an

[Dennis Prangle sent me his comments on our ABC model choice by random forests paper. Here they are! And I appreciate very much contributors commenting on my paper or others, so please feel free to join.]

treerise6This paper proposes a new approach to likelihood-free model choice based on random forest classifiers. These are fit to simulated model/data pairs and then run on the observed data to produce a predicted model. A novel “posterior predictive error rate” is proposed to quantify the degree of uncertainty placed on this prediction. Another interesting use of this is to tune the threshold of the standard ABC rejection approach, which is outperformed by random forests.

The paper has lots of thought-provoking new ideas and was an enjoyable read, as well as giving me the encouragement I needed to read another chapter of the indispensable Elements of Statistical Learning However I’m not fully convinced by the approach yet for a few reasons which are below along with other comments.

Alternative schemes

The paper shows that random forests outperform rejection based ABC. I’d like to see a comparison to more efficient ABC model choice algorithms such as that of Toni et al 2009. Also I’d like to see if the output of random forests could be used as summary statistics within ABC rather than as a separate inference method.

Posterior predictive error rate (PPER)

This is proposed to quantify the performance of a classifier given a particular data set. The PPER is the proportion of times the classifier’s most favoured model is incorrect for simulated model/data pairs drawn from an approximation to the posterior predictive. The approximation is produced by a standard ABC analysis.

Misclassification could be due to (a) a poor classifier or (b) uninformative data, so the PPER aggregrates these two sources of uncertainty. I think it is still very desirable to have an estimate of the uncertainty due to (b) only i.e. a posterior weight estimate. However the PPER is useful. Firstly end users may sometimes only care about the aggregated uncertainty. Secondly relative PPER values for a fixed dataset are a useful measure of uncertainty due to (a), for example in tuning the ABC threshold. Finally, one drawback of the PPER is the dependence on an ABC estimate of the posterior: how robust are the results to the details of how this is obtained?

Classification

This paper illustrates an important link between ABC and machine learning classification methods: model choice can be viewed as a classification problem. There are some other links: some classifiers make good model choice summary statistics (Prangle et al 2014) or good estimates of ABC-MCMC acceptance ratios for parameter inference problems (Pham et al 2014). So the good performance random forests makes them seem a generally useful tool for ABC (indeed they are used in the Pham et al al paper).

likelihood-free inference via classification

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , on August 5, 2014 by xi'an

IMG_2268Last week, Michael Gutmann, Ritabrata Dutta, Samuel Kaski, and Jukka Corander posted on arXiv the last version of the paper they had presented at MCMSki 4. As indicated by its (above) title, it suggests implementing ABC based on classification tools. Thus making it somewhat connected to our recent random forest paper.

The starting idea in the paper is that datasets generated from distributions with different parameters should be easier to classify than datasets generated from distributions with the same parameters. And that classification accuracy naturally induces a distance between datasets and between the parameters behind those datasets. We had followed some of the same track when starting using random forests, before realising that for our model choice setting, proceeding the entire ABC way once the random forest procedure had been constructed was counter-productive. Random forests are just too deadly as efficient model choice machines to try to compete with them through an ABC postprocessing. Performances are just… Not. As. Good!

A side question: I have obviously never thought about that before but why is the naïve Bayes classification rule so called?! It never sounded very Bayesian to me to (a) use the true value of the parameter and (b) average the classification performances. Interestingly, the authors (i) show identical performances of other classification methods (Fig. 2) and (ii) an exception for MA time series: when we first experimented random forests, raw data from an MA(2) model was tested to select between MA(1) and  MA(2) models, and the performances of the resulting random forest were quite poor.

Now, an opposition between our two approaches is that Michael and his coauthors also include point estimation within the range of classification-based ABC inference. As we stressed in our paper, we restrict the range to classification and model choice because we do not think those machine learning tools are stable and powerful enough to perform regression and posterior probability approximation. I also see a practical weakness in the estimation scheme proposed in this new paper. Namely that the Monte Carlo gets into the way of the consistency theorem. And possibly of the simulation method itself. Another remark is that, while the authors compare the fit produced by different classification methods, there should be a way to aggregate them towards higher efficiency. Returning once more to our random forest paper, we saw improved performances each time we included a reference method, from LDA to SVMs. It would be interesting to see a (summary) variable selection version of the proposed method. A final remark is that computing time and effort do not seem to get mentioned in the paper (unless Indian jetlag confuses me more than usual). I wonder how fast the computing effort grows with the sample size to reach parametric and quadratic convergence rates.

ABC model choice by random forests

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , on June 25, 2014 by xi'an

treerise6After more than a year of collaboration, meetings, simulations, delays, switches,  visits, more delays, more simulations, discussions, and a final marathon wrapping day last Friday, Jean-Michel Marin, Pierre Pudlo,  and I at last completed our latest collaboration on ABC, with the central arguments that (a) using random forests is a good tool for choosing the most appropriate model and (b) evaluating the posterior misclassification error rather than the posterior probability of a model is an appropriate paradigm shift. The paper has been co-signed with our population genetics colleagues, Jean-Marie Cornuet and Arnaud Estoup, as they provided helpful advice on the tools and on the genetic illustrations and as they plan to include those new tools in their future analyses and DIYABC software.  ABC model choice via random forests is now arXived and very soon to be submitted…

truePPOne scientific reason for this fairly long conception is that it took us several iterations to understand the intrinsic nature of the random forest tool and how it could be most naturally embedded in ABC schemes. We first imagined it as a filter from a set of summary statistics to a subset of significant statistics (hence the automated ABC advertised in some of my past or future talks!), with the additional appeal of an associated distance induced by the forest. However, we later realised that (a) further ABC steps were counterproductive once the model was selected by the random forest and (b) including more summary statistics was always beneficial to the performances of the forest and (c) the connections between (i) the true posterior probability of a model, (ii) the ABC version of this probability, (iii) the random forest version of the above, were at best very loose. The above picture is taken from the paper: it shows how the true and the ABC probabilities (do not) relate in the example of an MA(q) model… We thus had another round of discussions and experiments before deciding the unthinkable, namely to give up the attempts to approximate the posterior probability in this setting and to come up with another assessment of the uncertainty associated with the decision. This led us to propose to compute a posterior predictive error as the error assessment for ABC model choice. This is mostly a classification error but (a) it is based on the ABC posterior distribution rather than on the prior and (b) it does not require extra-computations when compared with other empirical measures such as cross-validation, while avoiding the sin of using the data twice!