Archive for machine learning

a postdoc with Christian Robert

Posted in Statistics with tags , , , , , , on April 12, 2018 by xi'an

[Here is another call for a postdoctoral position in Lyon, under the supervision of my homonym Christian Robert:]

Post-Doctoral Position in Data Science and Machine Learning

Chair Data Analytics and Models for Insurance – 20182020

DAMI is a research chair funded by BNP Paribas Cardif, and is interested in problems related to Data Science and Models for Insurance.

Description

A post-doctoral fellowship in the areas of data science and machine learning is available at the DAMI research chair at the Claude Bernard Lyon 1 University. The post-doctoral fellow, in collaboration with the industry partner, will conduct research of advanced analytics and machine learning algorithms for actuarial sciences and insurance, and in particular, for improving risk-based pricing, and for developing predictive analytics.

The position is for one year (possibly two years), starting in September/October 2018.

Required Professional Expertise

–        A recent PhD (received within the past 5 years) in Computer Science, Computer Engineering, Applied Statistics and Mathematics, or related fields.

–        Excellent research capabilities and competitive development skills

–        Strong knowledge/expertise in machine learning and data analytics including data pre-processing, model building, and model evaluation

–        Experience with Python and/or R

Interested candidates are invited to submit their CV, list of publications, and contact information of two references to Christian ROBERT at univ-lyon1.fr.

gender gaps

Posted in Statistics, University life with tags , , , , , , , , , , on March 31, 2018 by xi'an

Two of my colleagues [and co-authors] at Dauphine, Elyès Jouini and Clotilde Napp, published a paper in Science last week (and an associated tribune in Le Monde which I spotted first) about explaining differences in national gender inequalities in maths (as measured by PISA) in terms of the degree of overall inequality in the respective countries. Gaps in the highest maths performer sex ratio. While I have no qualm about the dependency or the overall statistical cum machine learning analysis (supported by our common co-author Jean-Michel Marin), and while I obviously know nothing about the topic!, I leisurely wonder at the cultural factor (which may also partly explain for the degree of inequality) when considering that the countries at the bottom of the above graphs are rather religious (and mostly catholic). I also find it most intriguing that the gender gap is consistently reversed when considering higher performer sex ratio for reading, because mastering the language should be a strong factor in power structures and hence differences therein should also lead to inequalities…

1500 nuances of gan [gan gan style]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on February 16, 2018 by xi'an

I recently realised that there is a currently very popular trend in machine learning called GAN [for generative adversarial networks] that strongly connects with ABC, at least in that it relies mostly on the availability of a generative model, i.e., a probability model that can be generated as in x=G(ϵ;θ), to draw inference about θ [or predictions]. For instance, there was a GANs tutorial at NIPS 2016 by Ian Goodfellow and many talks on the topic at recent NIPS, the 1500 in the title referring to the citations of the GAN paper by Goodfellow et al. (2014). (The name adversarial comes from opposing true model to generative model in the inference. )

If you remember Jeffreys‘s famous pique about classical tests as being based on improbable events that did not happen, GAN, like ABC,  is sort of the opposite in that it generates events until the one that was observed happens. More precisely, by generating pseudo-samples and switching parameters θ until these samples get as confused as possible between the data generating (“true”) distribution and the generative one. (In its original incarnation, GAN is indeed an optimisation scheme in θ.) A basic presentation of GAN is that it constructs a function D(x,ϕ) that represents the probability that x came from the true model p versus the generative model, ϕ being the parameter of a neural network trained to this effect, aimed at minimising in ϕ a two-term objective function

E[log D(x,ϕ)]+E[log(1D(G(ϵ;θ),ϕ))]

where the first expectation is taken under the true model and the second one under the generative model.

“The discriminator tries to best distinguish samples away from the generator. The generator tries to produce samples that are indistinguishable by the discriminator.” Edward

One ABC perception of this technique is that the confusion rate

E[log(1D(G(ϵ;θ),ϕ))]

is a form of distance between the data and the generative model. Which expectation can be approximated by repeated simulations from this generative model. Which suggests an extension from the optimisation approach to a ABCyesian version by selecting the smallest distances across a range of θ‘s simulated from the prior.

This notion relates to solution using classification tools as density ratio estimation, connecting for instance to Gutmann and Hyvärinen (2012). And ultimately with Geyer’s 1992 normalising constant estimator.

Another link between ABC and networks also came out during that trip. Proposed by Bishop (1994), mixture density networks (MDN) are mixture representations of the posterior [with component parameters functions of the data] trained on the prior predictive through a neural network. These MDNs can be trained on the ABC learning table [based on a specific if redundant choice of summary statistics] and used as substitutes to the posterior distribution, which brings an interesting alternative to Simon Wood’s synthetic likelihood. In a paper I missed Papamakarios and Murray suggest replacing regular ABC with this version…

machine learning methods are useful for ABC [or my first PCI Evol Biol!]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , on November 23, 2017 by xi'an

While I am still working on setting a PCI [peer community in] Comput Stats, having secure sponsorship of some societies (ASA, KSS, RSS, SFdS, and hopefully ISBA), my coauthors Jean-Michel Marin and Louis Raynal submitted our paper ABC random forests for Bayesian parameter inference to PCI Evol Biol. And after a few months of review, including a revision accounting for the reviewers’ requests, our paper stood the test and the recommendation by Michael Blum and Dennis Prangle got published there. Great news, and hopefully helpful for our submission within the coming days!

random matrix advances

Posted in pictures, Statistics with tags , , , , on October 23, 2017 by xi'an

postdocs positions in Uppsala in computational stats for machine learning

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on October 22, 2017 by xi'an

Lawrence Murray sent me a call for two postdoc positions in computational statistics and machine learning. In Uppsala, Sweden. With deadline November 17. Definitely attractive for a fresh PhD! Here are some of the contemplated themes:

(1) Developing efficient Bayesian inference algorithms for large-scale latent variable models in data rich scenarios.

(2) Finding ways of systematically combining different inference techniques, such as variational inference, sequential Monte Carlo, and deep inference networks, resulting in new methodology that can reap the benefits of these different approaches.

(3) Developing efficient black-box inference algorithms specifically targeted at inference in probabilistic programs. This line of research may include implementation of the new methods in the probabilistic programming language Birch, currently under development at the department.

Statistics versus Data Science [or not]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on October 13, 2017 by xi'an

Last week a colleague from Warwick forwarded us a short argumentation by Donald Macnaughton (a “Toronto-based statistician”) about switching the name of our field from Statistics to Data Science. This is not the first time I hear of this proposal and this is not the first time I express my strong disagreement with it! Here are the naughtonian arguments

  1. Statistics is (at least in the English language) endowed with several meanings from the compilation of numbers out of a series of observations to the field, to the procedures proposed by the field. This is argued to be confusing for laypeople. And missing the connection with data at the core of our field. As well as the indication that statistics gathers information from the data. Data science seems to convey both ideas… But it is equally vague in that most scientific fields if not all rely on data and observations and the structure exploitation of such data. Actually a lot of so-called “data-scientists” have specialised in the analysis of data from their original field, without voluntarily embarking upon a career of data-scientist. And not necessarily acquiring the proper tools for incorporating uncertainty quantification (aka statistics!).
  2. Statistics sounds old-fashioned and “old-guard” and “inward-looking” and unattractive to young talents, while they flock to Data Science programs. Which is true [that they flock] but does not mean we [as a field] must flock there as well. In five or ten years, who can tell this attraction of data science(s) will still be that strong. We already had to switch our Master names to Data Science or the like, this is surely more than enough.
  3. Data science is encompassing other areas of science, like computer science and operation research, but this is not an issue both in terms of potential collaborations and gaining the upper ground as a “key part” in the field. Which is more wishful thinking than a certainty, given the existing difficulties in being recognised as a major actor in data analysis. (As for instance in a recent grant evaluation in “Big Data” where the evaluation committee involved no statistician. And where we got rejected.)