Archive for machine learning

ABC à Montréal

Posted in Kids, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on December 13, 2014 by xi'an

Montreal1So today was the NIPS 2014 workshop, “ABC in Montréal“, which started with a fantastic talk by Juliane Liepe on some exciting applications of ABC to the migration of immune cells, with the analysis of movies involving those cells acting to heal a damaged fly wing and a cut fish tail. Quite amazing videos, really. (With the great entry line of ‘We have all cut  a finger at some point in our lives’!) The statistical model behind those movies was a random walk on a grid, with different drift and bias features that served as model characteristics. Frank Wood managed to deliver his talk despite a severe case of food poisoning, with a great illustration of probabilistic programming that made me understand (at last!) the very idea of probabilistic programming. And  Vikash Mansinghka presented some applications in image analysis. Those two talks led me to realise why probabilistic programming was so close to ABC, with a programming touch! Hence why I was invited to talk today! Then Dennis Prangle exposed his latest version of lazy ABC, that I have already commented on the ‘Og, somewhat connected with our delayed acceptance algorithm, to the point that maybe something common can stem out of the two notions. Michael Blum ended the day with provocative answers to the provocative question of Ted Meeds as to whether or not machine learning needed ABC (Ans. No!) and whether or not machine learning could help ABC (Ans. ???). With an happily mix-up between mechanistic and phenomenological models that helped generating discussion from the floor.

The posters were also of much interest, with calibration as a distance measure by Michael Guttman, in continuation of the poster he gave at MCMski, Aaron Smith presenting his work with Luke Bornn, Natesh Pillai and Dawn Woodard, on why a single pseudo-sample is enough for ABC efficiency. This gave me the opportunity to discuss with him the apparent contradiction with the result of Kryz Łatunsziński and Anthony Lee about the geometric convergence of ABC-MCMC only attained with a random number of pseudo-samples… And to wonder if there is a geometric versus binomial dilemma in this setting, Namely, whether or not simulating pseudo-samples until one is accepted would be more efficient than just running one and discarding it in case it is too far. So, although the audience was not that large (when compared with the other “ABC in…” and when considering the 2500+ attendees at NIPS over the week!), it was a great day where I learned a lot, did not have a doze during talks (!), [and even had an epiphany of sorts at the treadmill when I realised I just had to take longer steps to reach 16km/h without hyperventilating!] So thanks to my fellow organisers, Neil D Lawrence, Ted Meeds, Max Welling, and Richard Wilkinson for setting the program of that day! And, by the way, where’s the next “ABC in…”?! (Finland, maybe?)

reliable ABC model choice via random forests

Posted in pictures, R, Statistics, University life with tags , , , , , , , on October 29, 2014 by xi'an

human_ldaAfter a somewhat prolonged labour (!), we have at last completed our paper on ABC model choice with random forests and submitted it to PNAS for possible publication. While the paper is entirely methodological, the primary domain of application of ABC model choice methods remains population genetics and the diffusion of this new methodology to the users is thus more likely via a media like PNAS than via a machine learning or statistics journal.

When compared with our recent update of the arXived paper, there is not much different in contents, as it is mostly an issue of fitting the PNAS publication canons. (Which makes the paper less readable in the posted version [in my opinion!] as it needs to fit the main document within the compulsory six pages, relegated part of the experiments and of the explanations to the Supplementary Information section.)

ABC@NIPS: call for papers

Posted in Statistics, Travel, University life with tags , , , , , , , , , on September 9, 2014 by xi'an

In connection with the previous announcement of ABC in Montréal, a call for papers that came out today:

NIPS 2014 Workshop: ABC in Montreal

December 12, 2014
Montréal, Québec, Canada

Approximate Bayesian computation (ABC) or likelihood-free (LF) methods have developed mostly beyond the radar of the machine learning community, but are important tools for a large segment of the scientific community. This is particularly true for systems and population biology, computational psychology, computational chemistry, etc. Recent work has both applied machine learning models and algorithms to general ABC inference (NN, forests, GPs) and ABC inference to machine learning (e.g. using computer graphics to solve computer vision using ABC). In general, however, there is significant room for collaboration between the two communities.

The workshop will consist of invited and contributed talks, poster spotlights, and a poster session. Rather than a panel discussion we will encourage open discussion between the speakers and the audience!

Examples of topics of interest in the workshop include (but are not limited to):

* Applications of ABC to machine learning, e.g., computer vision, inverse problems
* ABC in Systems Biology, Computational Science, etc
* ABC Reinforcement Learning
* Machine learning simulator models, e.g., NN models of simulation responses, GPs etc.
* Selection of sufficient statistics
* Online and post-hoc error
* ABC with very expensive simulations and acceleration methods (surrogate modeling, choice of design/simulation points)
* ABC with probabilistic programming
* Posterior evaluation of scientific problems/interaction with scientists
* Post-computational error assessment
* Impact on resulting ABC inference
* ABC for model selection

===========
Submission:
=========== Continue reading

ISBA@NIPS

Posted in Statistics, Travel, University life with tags , , , , , , , , , on September 2, 2014 by xi'an

[An announcement from ISBA about sponsoring young researchers at NIPS that links with my earlier post that our ABC in Montréal proposal for a workshop had been accepted and a more global feeling that we (as a society) should do more to reach towards machine-learning.]

The International Society for Bayesian Analysis (ISBA) is pleased to announce its new initiative *ISBA@NIPS*, an initiative aimed at highlighting the importance and impact of Bayesian methods in the new era of data science.

Among the first actions of this initiative, ISBA is endorsing a number of *Bayesian satellite workshops* at the Neural Information Processing Systems (NIPS) Conference, that will be held in Montréal, Québec, Canada, December 8-13, 2014.

Furthermore, a special ISBA@NIPS Travel Award will be granted to the best Bayesian invited and contributed paper(s) among all the ISBA endorsed workshops.

ISBA endorsed workshops at NIPS

  1. ABC in Montréal. This workshop will include topics on: Applications of ABC to machine learning, e.g., computer vision, other inverse problems (RL); ABC Reinforcement Learning (other inverse problems); Machine learning models of simulations, e.g., NN models of simulation responses, GPs etc.; Selection of sufficient statistics and massive dimension reduction methods; Online and post-hoc error; ABC with very expensive simulations and acceleration methods (surrogate modelling, choice of design/simulation points).
  2.  Networks: From Graphs to Rich Data. This workshop aims to bring together a diverse and cross-disciplinary set of researchers to discuss recent advances and future directions for developing new network methods in statistics and machine learning.
  3. Advances in Variational Inference. This workshop aims at highlighting recent advancements in variational methods, including new methods for scalability using stochastic gradient methods, , extensions to the streaming variational setting, improved local variational methods, inference in non-linear dynamical systems, principled regularisation in deep neural networks, and inference-based decision making in reinforcement learning, amongst others.
  4. Women in Machine Learning (WiML 2014). This is a day-long workshop that gives female faculty, research scientists, and graduate students in the machine learning community an opportunity to meet, exchange ideas and learn from each other. Under-represented minorities and undergraduates interested in machine learning research are encouraged to attend.

Continue reading

NIPS workshops (Dec. 12-13, 2014, Montréal)

Posted in Kids, Statistics, Travel, University life with tags , , , , , , , , on August 25, 2014 by xi'an

Run_ABCFollowing a proposal put forward by Ted Meeds, Max Welling,  Richard Wilkinson, Neil Lawrence and myself, our ABC in Montréal workshop has been accepted by the NIPS 2014 committee and will thus take place on either Friday, Dec. 11, or Saturday, Dec. 12, at the end of the main NIPS meeting (Dec. 8-10). (Despite the title, this workshop is not part of the ABC in … series I started five years ago. It will only last a single day with a few invited talks and no poster. And no free wine & cheese party.) On top of this workshop, our colleagues Vikash K Mansinghka, Daniel M Roy, Josh Tenenbaum, Thomas Dietterich, and Stuart J Russell have also been successful in their bid for the 3rd NIPS Workshop on Probabilistic Programming which will presumably be held on the opposite day to ours, as Vikash is speaking at our workshop, while I am speaking in this workshop. I am yet undecided as to whether or not to attend the main conference, given that I am already travelling a lot this semester and have to teach two courses, incl. a large undergraduate statistics inference course… Obviously, I will try to attend if our joint paper is accepted by the editorial board! Even though Marco will then be the speaker.

ABC model choice by random forests [guest post]

Posted in pictures, R, Statistics, University life with tags , , , , , , , , , , on August 11, 2014 by xi'an

[Dennis Prangle sent me his comments on our ABC model choice by random forests paper. Here they are! And I appreciate very much contributors commenting on my paper or others, so please feel free to join.]

treerise6This paper proposes a new approach to likelihood-free model choice based on random forest classifiers. These are fit to simulated model/data pairs and then run on the observed data to produce a predicted model. A novel “posterior predictive error rate” is proposed to quantify the degree of uncertainty placed on this prediction. Another interesting use of this is to tune the threshold of the standard ABC rejection approach, which is outperformed by random forests.

The paper has lots of thought-provoking new ideas and was an enjoyable read, as well as giving me the encouragement I needed to read another chapter of the indispensable Elements of Statistical Learning However I’m not fully convinced by the approach yet for a few reasons which are below along with other comments.

Alternative schemes

The paper shows that random forests outperform rejection based ABC. I’d like to see a comparison to more efficient ABC model choice algorithms such as that of Toni et al 2009. Also I’d like to see if the output of random forests could be used as summary statistics within ABC rather than as a separate inference method.

Posterior predictive error rate (PPER)

This is proposed to quantify the performance of a classifier given a particular data set. The PPER is the proportion of times the classifier’s most favoured model is incorrect for simulated model/data pairs drawn from an approximation to the posterior predictive. The approximation is produced by a standard ABC analysis.

Misclassification could be due to (a) a poor classifier or (b) uninformative data, so the PPER aggregrates these two sources of uncertainty. I think it is still very desirable to have an estimate of the uncertainty due to (b) only i.e. a posterior weight estimate. However the PPER is useful. Firstly end users may sometimes only care about the aggregated uncertainty. Secondly relative PPER values for a fixed dataset are a useful measure of uncertainty due to (a), for example in tuning the ABC threshold. Finally, one drawback of the PPER is the dependence on an ABC estimate of the posterior: how robust are the results to the details of how this is obtained?

Classification

This paper illustrates an important link between ABC and machine learning classification methods: model choice can be viewed as a classification problem. There are some other links: some classifiers make good model choice summary statistics (Prangle et al 2014) or good estimates of ABC-MCMC acceptance ratios for parameter inference problems (Pham et al 2014). So the good performance random forests makes them seem a generally useful tool for ABC (indeed they are used in the Pham et al al paper).

a thesis on random forests

Posted in Books, Kids, Statistics, University life with tags , , , , on August 4, 2014 by xi'an

Blackstone park, Providence, November 28, 2012During a session of the IFCAM workshop this morning I noticed a new arXiv posting on random forests. Entitled Understanding Random Forests: From Theory to Practice, it actually corresponds to a PhD thesis written by Gilles Louppe on the topic. At the Université de Liège, Belgie/Belgium/Belgique. In this thesis, Gilles Louppe provides a rather comprehensive coverage of the random forest methodology, from specific bias-variance decompositions and convergence properties to the historical steps towards random forests, to implementation details and recommendations, to describing how to rank (co)variates by order of importance. The last point was of particular relevance for our current work on ABC model choice with random forests as it relies on random forests and relies on the frequency of appearance of a given variable to label its importance. The thesis showed me this was not a great way of selecting covariates as it did not account for correlation and could easily miss important covariates. It is a very complete, well-written and beautifully LaTeXed (with fancy grey boxes and all that jazz!). As part of his thesis, Gilles Louppe also contributed to the open source machine learning library Scikit.  The thesis thus makes a most profitable and up-to-date entry into the topic of random forests…

Follow

Get every new post delivered to your Inbox.

Join 717 other followers