Archive for Paris-Saclay campus

assistant/associate professor position in statistics/machine-learning at ENSAE

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on March 10, 2020 by xi'an

ENSAE (my Alma Mater) is opening a new position for next semester in statistics or/and machine-learning. At the Assistant Professor level, the position is for an initial three-year term, renewable for another three years, before the tenure evaluation. The school is located on the Université Paris-Saclay campus, only teaches at the Master and PhD levels, and the deadline for application is 31 March 2020. Details and contacts on the call page.

local mayhem, again and again and again…

Posted in Kids, pictures, Travel, University life with tags , , , , , , , , , , , , , , , , , , on December 27, 2019 by xi'an

The public transports in France and in particular in Paris have now been on strike for three weeks. In connection with a planned reform of the retirement conditions of workers with special status, like those in the train and metro companies, who can retire earlier than the legal age (62). As usual with social unrest in France, other categories joined the strike and the protest, including teachers and health service public workers, as well as police officers, fire-fighters and opera dancers, and even some students. Below are some figures from the OECD about average retirement conditions in nearby EU countries that show that these conditions are apparently better in France. (With the usual provision that these figures have been correctly reported.) In particular, the life expectancy at the start of retirement is the highest for both men and women. Coincidence (or not), my UCU affiliated colleagues in Warwick were also on strike a few weeks ago about their pensions…

Travelling through and around Paris by bike, I have not been directly affected by the strikes (as heavy traffic makes biking easier!), except for the morning of last week when I was teaching at ENSAE, when I blew up a tyre midway there and had to hop to the nearest train station to board the last train of the morning, arriving (only) 10mn late. Going back home was only feasible by taxi, which happened to be large enough to take my bicycle as well… Travelling to and from the airport for Vancouver and Birmingham was equally impossible by public transportation, meaning spending fair amounts of time in and money on taxis! And listening to taxi-drivers’ opinions or musical tastes. Nothing to moan about when considering the five to six hours spent by some friends of mine to get to work and back.


Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on October 8, 2019 by xi'an

In connection with the recent PhD thesis defence of Juliette Chevallier, in which I took a somewhat virtual part for being physically in Warwick, I read a paper she wrote with Stéphanie Allassonnière on stochastic approximation versions of the EM algorithm. Computing the MAP estimator can be done via some adapted for simulated annealing versions of EM, possibly using MCMC as for instance in the Monolix software and its MCMC-SAEM algorithm. Where SA stands sometimes for stochastic approximation and sometimes for simulated annealing, originally developed by Gilles Celeux and Jean Diebolt, then reframed by Marc Lavielle and Eric Moulines [friends and coauthors]. With an MCMC step because the simulation of the latent variables involves an untractable normalising constant. (Contrary to this paper, Umberto Picchini and Adeline Samson proposed in 2015 a genuine ABC version of this approach, paper that I thought I missed—although I now remember discussing it with Adeline at JSM in Seattle—, ABC is used as a substitute for the conditional distribution of the latent variables given data and parameter. To be used as a substitute for the Q step of the (SA)EM algorithm. One more approximation step and one more simulation step and we would reach a form of ABC-Gibbs!) In this version, there are very few assumptions made on the approximation sequence, except that it converges with the iteration index to the true distribution (for a fixed observed sample) if convergence of ABC-SAEM is to happen. The paper takes as an illustrative sequence a collection of tempered versions of the true conditionals, but this is quite formal as I cannot fathom a feasible simulation from the tempered version and not from the untempered one. It is thus much more a version of tempered SAEM than truly connected with ABC (although a genuine ABC-EM version could be envisioned).

two positions at UBC

Posted in Mountains, pictures, Travel, University life with tags , , , , , , , , , , on September 15, 2019 by xi'an

A long-time friend at UBC pointed out to me the opening of two tenure-track Assistant Professor positions at the Department of Statistics at the University of British Columbia, Vancouver, with an anticipated start date of July 1, 2020 or January 1, 2021. The deadline for applications is October 18, 2019. Statistics at UBC is an internationally renowned department, in particular (but not restricted to) computational statistics and Bayesian methods and this is a great opportunity to join this department. (Not mentioning the unique location of the campus and the beautiful surroundings of the city of Vancouver!)

noise contrastive estimation

Posted in Statistics with tags , , , , , , , , , on July 15, 2019 by xi'an

As I was attending Lionel Riou-Durand’s PhD thesis defence in ENSAE-CREST last week, I had a look at his papers (!). The 2018 noise contrastive paper is written with Nicolas Chopin (both authors share the CREST affiliation with me). Which compares Charlie Geyer’s 1994 bypassing the intractable normalising constant problem by virtue of an artificial logit model with additional simulated data from another distribution ψ.

“Geyer (1994) established the asymptotic properties of the MC-MLE estimates under general conditions; in particular that the x’s are realisations of an ergodic process. This is remarkable, given that most of the theory on M-estimation (i.e.estimation obtained by maximising functions) is restricted to iid data.”

Michael Guttman and Aapo Hyvärinen also use additional simulated data in another likelihood of a logistic classifier, called noise contrastive estimation. Both methods replace the unknown ratio of normalising constants with an unbiased estimate based on the additional simulated data. The major and impressive result in this paper [now published in the Electronic Journal of Statistics] is that the noise contrastive estimation approach always enjoys a smaller variance than Geyer’s solution, at an equivalent computational cost when the actual data observations are iid. And the artificial data simulations ergodic. The difference between both estimators is however negligible against the Monte Carlo error (Theorem 2).

This may be a rather naïve question, but I wonder at the choice of the alternative distribution ψ. With a vague notion that it could be optimised in a GANs perspective. A side result of interest in the paper is to provide a minimal (re)parameterisation of the truncated multivariate Gaussian distribution, if only as an exercise for future exams. Truncated multivariate Gaussian for which the normalising constant is of course unknown.