Archive for machine learning

data scientist position

Posted in R, Statistics, University life with tags , , , , , , , , , , on April 8, 2014 by xi'an

Université Paris-DauphineOur newly created Chaire “Economie et gestion des nouvelles données” in Paris-Dauphine, ENS Ulm, École Polytechnique and ENSAE is recruiting a data scientist starting as early as May 1, the call remaining open till the position is filled. The location is in one of the above labs in Paris, the duration for at least one year, salary is varying, based on the applicant’s profile, and the contacts are Stephane Gaiffas (stephane.gaiffas AT cmap DOT, Robin Ryder (ryder AT ceremade DOT and Gabriel Peyré (peyre AT ceremade DOT Here are more details:

Job description

The chaire “Economie et gestion des nouvelles données” is recruiting a talented young engineer specialized in large scale computing and data processing. The targeted applications include machine learning, imaging sciences and finance. This is a unique opportunity to join a newly created research group between the best Parisian labs in applied mathematics and computer science (ParisDauphine, ENS Ulm, Ecole Polytechnique and ENSAE) working hand in hand with major industrial companies (Havas, BNP Paribas, Warner Bros.). The proposed position consists in helping researchers of the group to develop and implement large scale data processing methods, and applying these methods on real life problems in collaboration with the industrial partners.

A non exhaustive list of methods that are currently investigated by researchers of the group, and that will play a key role in the computational framework developed by the recruited engineer, includes :
● Large scale non smooth optimization methods (proximal schemes, interior points, optimization on manifolds).
● Machine learning problems (kernelized methods, Lasso, collaborative filtering, deep learning, learning for graphs, learning for timedependent systems), with a particular focus on large scale problems and stochastic methods.
● Imaging problems (compressed sensing, superresolution).
● Approximate Bayesian Computation (ABC) methods.
● Particle and Sequential Monte Carlo methods

Candidate profile

The candidate should have a very good background in computer science with various programming environments (e.g. Matlab, Python, C++) and knowledge of high performance computing methods (e.g. GPU, parallelization, cloud computing). He/she should adhere to the open source philosophy and possibly be able to interact with the relevant communities (e.g. scikitlearn initiative). Typical curriculum includes engineering school or Master studies in computer science / applied maths / physics, and possibly a PhD (not required).

Working environment

The recruited engineer will work within one of the labs of the chaire. He will benefit from a very stimulating working environment and all required computing resources. He will work in close interaction with the 4 research labs of the chaire, and will also have regular meetings with the industrial partners. More information about the chaire can be found online at

Advances in scalable Bayesian computation [day #3]

Posted in Books, Mountains, pictures, R, Statistics, University life with tags , , , , , , , , , , on March 6, 2014 by xi'an

polyptych painting within the TransCanada Pipeline Pavilion, Banff Centre, Banff, March 21, 2012We have now gone over the midpoint of our workshop Advances in Scalable Bayesian Computation with three talks in the morning and an open research or open air afternoon. (Maybe surprisingly I chose to stay indoors and work on a new research topic rather than trying cross-country skiing!) If I must give a theme for the day, it would be (jokingly) corporate Big data, as the three speakers spoke of problems and solutions connected with Google, Facebook and similar companies. First, Russ Salakhutdinov presented some  hierarchical structures on multimedia data, like connecting images and text, with obvious applications on Google. The first part described Boltzman machines with impressive posterior simulations of characters and images. (Check the video at 45:00.) Then Steve Scott gave us a Google motivated entry to embarrassingly parallel algorithms, along the lines of papers recently discussed on the ‘Og. (Too bad we forgot to start the video at the very beginning!) One of the novel things in the talk (for me) was the inclusion of BART in this framework, with the interesting feature that using the whole prior on each machine was way better than using a fraction of the prior, as predicted by the theory! And Joaquin Quinonero Candela provided examples of machine learning techniques used by Facebook to suggest friends and ads in a most efficient way (techniques remaining hidden!).

Even though the rest of the day was free, the two hours of exercising between the pool in the early morning and the climbing wall in the late afternoon left me with no energy to experiment curling with a large subsample of the conference attendees, much to my sorrow!

Le Monde puzzle [#851]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on February 6, 2014 by xi'an

A more unusual Le Monde mathematical puzzle:

Fifty black and white tokens are set on an equilateral triangle of side 9, black on top and white on bottom. If they can only be turned three by three, determine whether it is possible to produce a triangle with all white sides on top, under each of the following constraints:

  • the three tokens must stand on a line;
  • the three tokens must stand on a line and be contiguous;
  • the three tokens must stand on the summits of an equilateral triangle;
  • the three tokens must stand on the summits of an equilateral triangle of side one.

I could not think of a quick fix with an R code so leave it to the interested ‘Og reader… In the next issue of the Science&Médecine leaflet (Jan. 29), which appeared while I was in Warwick, there were a few entries of interest. First, the central article was about Big Data (again), but, for a change, the journalist took the pain to include French statisticians and machine learners in the picture, like Stefan Clemençon, Aurélien Garivier, Jean-Michel Loubes, and Nicolas Vayatis. (In a typical French approach, the subtitle was “A challenge for maths”, rather than statistics!) Ignoring the (minor) confusion therein of “small n, large p” with the plague of dimensionality, the article does mention a few important issues like distributed computing, inhomogeneous datasets, overfitting and learning. There are also links to the new masters in data sciences at ENSAE, Telecom-Paritech, and Paris 6-Pierre et Marie Curie. (The one in Paris-Dauphine is still under construction and will not open next year.) As a side column, the journal also wonders about the “end of Science” due to massive data influx and “Big Data” techniques that could predict and explain without requiring theories and deductive or scientific thinking. Somewhat paradoxically, the column ends up by a quote of Jean-Michel Loubes, who states that one could think “our” methods start from effects to end up with causes, but that in fact the models are highly dependent on the data. And on the opinion of experts. Doesn’t that suggest some Bayesian principles at work there?!

Another column is dedicated to Edward Teller‘s “dream” of using nuclear bombs for civil engineering, like in the Chariot project in Alaska. And the last entry is against Kelvin’s “to measure is to know”, with the title “To known is not to measure”, although it does not aim at a general philosophical level but rather objects to the unrestricted intrusion of bibliometrics and other indices brought from marketing. Written by a mathematician, this column is not directed against statistics and the Big Data revolution, but rather the myth that everything can be measured and quantified. (There was also a pointer to a tribune against the pseudo-recruiting of top researchers by Saudi universities in order to improve their Shanghai ranking but I do not have time to discuss it here. And now. Maybe later.)

OxWaSP (The Oxford-Warwick Statistics Programme)

Posted in Kids, Statistics, University life with tags , , , , , , on January 21, 2014 by xi'an

University of Warwick, May 31 2010This is an official email promoting OxWaSP, our joint doctoral training programme, which I [objectively] think is definitely worth considering if planning a PhD in Statistics. Anywhere.

The Statistics Department – University of Oxford and the Statistics Department – University Of Warwick, supported by the EPSRC, will run a joint Centre of Doctoral Training in the theory, methods and applications of Statistical Science for 21st Century data-intensive environments and large-scale models. This is the first centre of its type in the World and will equip its students to work in an area in growing demand both in academia and industry.Oxford, Feb. 23, 2012

Each year from October 2014 OxWaSP will recruit at least 5 students attached to Warwick and at least 5 attached to Oxford. Each student will be funded with a grant for four years of study. Students spend the first year at Oxford developing advanced skills in statistical science. In the first two terms students are given research training through modular courses: Statistical Inference in Complex Models; Multivariate Stochastic Processes; Bayesian Analyses for Complex Structural Information; Machine Learning and Probabilistic Graphical Models; Stochastic Computation for Intractable Inference. In the third term, students carry out two small research projects. At the end of year 1, students begin a three-year research project with a chosen supervisor, five continuing at Oxford and five moving to the University of Warwick.

Training in years 2-4 includes annual retreats, workshops and a research course in machine learning at Amazon (Berlin). There are funded opportunities for students to work with our leading industrial partners and to travel in their third year to an international summer placement in some of the strongest Statistics groups in the USA, Europe and Asia including UC Berkeley, Columbia University, Duke University, the University of Washington in Seattle, ETH Zurich and NUS Singapore.

Applications will be considered in gathered fields with the next deadline of 24 January 2014 (Non-EU applicants should apply by this date to maximise their chance of funding). Interviews for successful applicants who submit by the January deadline will take place at the end of February 2014. There will be a second deadline for applications at the end of February (Warwick) and 14th March (Oxford).

machine learning as buzzword

Posted in Books, Kids with tags , , , , , on November 12, 2013 by xi'an

In one of his posts, my friend Larry mentioned that popular posts had to mention the Bayes/frequentist opposition in the title… I think mentioning machine learning is also a good buzzword to increase the traffic! I did spot this phenomenon last week when publishing my review of Kevin Murphy’s Machine Learning: the number of views and visitors jumped by at least a half, exceeding the (admittedly modest) 1000 bar on two consecutive days. Interestingly, the number of copies of Machine Learning (sold via my amazon associate link) did not follow this trend: so far, I only spotted a few copies sold, in similar amounts to the number of copies of Spatio-temporal Statistics I reviewed the week before. Or most books I review, positively or negatively! (However, I did spot a correlated increase in overall amazon associate orderings and brazenly attributed the command of a Lego robotic set to a “machine learner”! And as of yesterday Og‘s readers massively ordered 152 236 copies of the latest edition of Andrew’s Bayesian Data Analysis, Thanks!)


Get every new post delivered to your Inbox.

Join 550 other followers