Following a proposal put forward by Ted Meeds, Max Welling, Richard Wilkinson, Neil Lawrence and myself, our ABC in Montréal workshop has been accepted by the NIPS 2014 committee and will thus take place on either Friday, Dec. 11, or Saturday, Dec. 12, at the end of the main NIPS meeting (Dec. 8-10). (Despite the title, this workshop is not part of the ABC in … series I started five years ago. It will only last a single day with a few invited talks and no poster. And no free wine & cheese party.) On top of this workshop, our colleagues Vikash K Mansinghka, Daniel M Roy, Josh Tenenbaum, Thomas Dietterich, and Stuart J Russell have also been successful in their bid for the 3rd NIPS Workshop on Probabilistic Programming which will presumably be held on the opposite day to ours, as Vikash is speaking at our workshop, while I am speaking in this workshop. I am yet undecided as to whether or not to attend the main conference, given that I am already travelling a lot this semester and have to teach two courses, incl. a large undergraduate statistics inference course… Obviously, I will try to attend if our joint paper is accepted by the editorial board! Even though Marco will then be the speaker.
Archive for machine learning
[Dennis Prangle sent me his comments on our ABC model choice by random forests paper. Here they are! And I appreciate very much contributors commenting on my paper or others, so please feel free to join.]
This paper proposes a new approach to likelihood-free model choice based on random forest classifiers. These are fit to simulated model/data pairs and then run on the observed data to produce a predicted model. A novel “posterior predictive error rate” is proposed to quantify the degree of uncertainty placed on this prediction. Another interesting use of this is to tune the threshold of the standard ABC rejection approach, which is outperformed by random forests.
The paper has lots of thought-provoking new ideas and was an enjoyable read, as well as giving me the encouragement I needed to read another chapter of the indispensable Elements of Statistical Learning However I’m not fully convinced by the approach yet for a few reasons which are below along with other comments.
The paper shows that random forests outperform rejection based ABC. I’d like to see a comparison to more efficient ABC model choice algorithms such as that of Toni et al 2009. Also I’d like to see if the output of random forests could be used as summary statistics within ABC rather than as a separate inference method.
Posterior predictive error rate (PPER)
This is proposed to quantify the performance of a classifier given a particular data set. The PPER is the proportion of times the classifier’s most favoured model is incorrect for simulated model/data pairs drawn from an approximation to the posterior predictive. The approximation is produced by a standard ABC analysis.
Misclassification could be due to (a) a poor classifier or (b) uninformative data, so the PPER aggregrates these two sources of uncertainty. I think it is still very desirable to have an estimate of the uncertainty due to (b) only i.e. a posterior weight estimate. However the PPER is useful. Firstly end users may sometimes only care about the aggregated uncertainty. Secondly relative PPER values for a fixed dataset are a useful measure of uncertainty due to (a), for example in tuning the ABC threshold. Finally, one drawback of the PPER is the dependence on an ABC estimate of the posterior: how robust are the results to the details of how this is obtained?
This paper illustrates an important link between ABC and machine learning classification methods: model choice can be viewed as a classification problem. There are some other links: some classifiers make good model choice summary statistics (Prangle et al 2014) or good estimates of ABC-MCMC acceptance ratios for parameter inference problems (Pham et al 2014). So the good performance random forests makes them seem a generally useful tool for ABC (indeed they are used in the Pham et al al paper).
During a session of the IFCAM workshop this morning I noticed a new arXiv posting on random forests. Entitled Understanding Random Forests: From Theory to Practice, it actually corresponds to a PhD thesis written by Gilles Louppe on the topic. At the Université de Liège, Belgie/Belgium/Belgique. In this thesis, Gilles Louppe provides a rather comprehensive coverage of the random forest methodology, from specific bias-variance decompositions and convergence properties to the historical steps towards random forests, to implementation details and recommendations, to describing how to rank (co)variates by order of importance. The last point was of particular relevance for our current work on ABC model choice with random forests as it relies on random forests and relies on the frequency of appearance of a given variable to label its importance. The thesis showed me this was not a great way of selecting covariates as it did not account for correlation and could easily miss important covariates. It is a very complete, well-written and beautifully LaTeXed (with fancy grey boxes and all that jazz!). As part of his thesis, Gilles Louppe also contributed to the open source machine learning library Scikit. The thesis thus makes a most profitable and up-to-date entry into the topic of random forests…
After more than a year of collaboration, meetings, simulations, delays, switches, visits, more delays, more simulations, discussions, and a final marathon wrapping day last Friday, Jean-Michel Marin, Pierre Pudlo, and I at last completed our latest collaboration on ABC, with the central arguments that (a) using random forests is a good tool for choosing the most appropriate model and (b) evaluating the posterior misclassification error rather than the posterior probability of a model is an appropriate paradigm shift. The paper has been co-signed with our population genetics colleagues, Jean-Marie Cornuet and Arnaud Estoup, as they provided helpful advice on the tools and on the genetic illustrations and as they plan to include those new tools in their future analyses and DIYABC software. ABC model choice via random forests is now arXived and very soon to be submitted…
One scientific reason for this fairly long conception is that it took us several iterations to understand the intrinsic nature of the random forest tool and how it could be most naturally embedded in ABC schemes. We first imagined it as a filter from a set of summary statistics to a subset of significant statistics (hence the automated ABC advertised in some of my past or future talks!), with the additional appeal of an associated distance induced by the forest. However, we later realised that (a) further ABC steps were counterproductive once the model was selected by the random forest and (b) including more summary statistics was always beneficial to the performances of the forest and (c) the connections between (i) the true posterior probability of a model, (ii) the ABC version of this probability, (iii) the random forest version of the above, were at best very loose. The above picture is taken from the paper: it shows how the true and the ABC probabilities (do not) relate in the example of an MA(q) model… We thus had another round of discussions and experiments before deciding the unthinkable, namely to give up the attempts to approximate the posterior probability in this setting and to come up with another assessment of the uncertainty associated with the decision. This led us to propose to compute a posterior predictive error as the error assessment for ABC model choice. This is mostly a classification error but (a) it is based on the ABC posterior distribution rather than on the prior and (b) it does not require extra-computations when compared with other empirical measures such as cross-validation, while avoiding the sin of using the data twice!
Last session of our Big’MC seminar at Institut Henri Poincaré this year, on
Tuesday Thursday, June 19, with
Chris Holmes (Oxford) at 3pm on
Robust statistical decisions via re-weighted Monte Carlo samples
and Pierre Pudlo (iC3M, Université de Montpellier 2) at 4:15pm on [our joint work]
ABC and machine learning
Today in Warwick, I had a very nice discussion with Michael Betancourt on many statistical and computational issues but at one point in the conversation we came upon the trouble of bridging the gap between the machine learning and statistics communities. While a conference like AISTATS is certainly contributing to this, it does not reach the main bulk of the statistics community. Since, in Reykjavik, we had discussed the corresponding difficulty of people publishing a longer and “more” statistical paper in a “more” statistical journal, once the central idea was published in a machine learning conference proceeding like NIPS or AISTATS. we had this idea that creating a special fast-track in a mainstream statistics journal for a subset of those papers, using for instance a tailor-made committee in that original conference, or creating an annual survey of the top machine learning conference proceedings rewritten in a more” statistical way (and once again selected by an ad hoc committee) would help, at not too much of a cost for inducing machine learners to make the extra-effort of switching to another style. From there, we enlarged the suggestion to enlist a sufficient number of (diverse) bloggers in each major conference towards producing quick but sufficiently informative entries on their epiphany talks (if any), possibly supported by the conference organisers or the sponsoring societies. (I am always happy to welcome any guest blogger in conferences I attend!)
Here is a call from ENSAE about two positions in statistics/machine learning, starting next semester:
ENSAE ParisTech and CREST is currently inviting applications for one position at the level associate or full professor from outstanding candidates having demonstrated abilities in both research and teaching. We are interested in candidates with a Ph.D. in Statistics or Machine Learning (or related field) whose research interests are in high dimensional statistical inference, learning theory or statistics of networks.
The appointment could begin as soon as September 1, 2014. The position is for an initial three-year term, with a possible renewal option in case of positive evaluation of research and teaching activities. Salary for suitably qualified applicants is competitive and commensurate with experience. The deadline for application is May 19, 2014. Full details are given here for the first position and there for the second position.