Archive for CREST

Ph.D. scholarships at ENSAE ParisTech‐CREST

Posted in Statistics with tags , , , , , , , , on April 2, 2019 by xi'an

ENSAE ParisTech and CREST are currently inviting applications for 3-year PhD scholarships in statistics (and economics, finance, and sociology). There is no constraint of nationality or curriculum, but the supervisor must be from ENSAE (Paris-Saclay) or ENSAI (Rennes-Bruz).  The deadline is May 1, to be sent to Mrs Fanda Traore, at

Applications should submitted (in French or in English), including :
– Curriculum vitae;
– Statement of research and teaching interests (10 pages);
– a cover letter
– the official transcripts of all higher education institutions from which you get a degree
– recommendation letters from professors, including a letter from the Ph.D. supervisor.

Selected candidates will be most likely interviewed at ENSAE‐CREST.

position in statistics and/or machine learning at ENSAE ParisTech‐CREST

Posted in pictures, University life with tags , , , , , , , on March 28, 2019 by xi'an

ENSAE ParisTech and CREST are currently inviting applications for a position of Assistant or Associate Professor in Statistics or Machine Learning.

The appointment starts in September, 2019, at the earliest. At the level of Assistant Professor, the position is for an initial three-year term renewable for another three years before the tenure evaluation. Salary is competitive according to qualifications. The teaching duties are reduced compared to French university standards. At the time of appointment, knowledge of French is not required but it is expected that the appointee will acquire a workable knowledge of French within a reasonable time.

Candidate Profile

– PhD in Statistics or Machine Learning.
– Outstanding research, including subjects in high-dimensional statistics and machine learning.
– Publications in leading international journals in Statistics or leading outlets in Machine Learning.

Demonstrated ability to teach courses in Mathematics, Statistics and Machine Learning for engineers and to supervise projects in Applied Statistics. The successful candidate is expected to teach at least one course in mathematics, applied mathematics or introductory statistics at the undergraduate level, and one course in the “Data Science, Statistics and Machine Learning”’ specialization track during the third year of ENSAE (Master level).

Applications should submitted (in French or in English) by email to :
– Curriculum vitae;
– Statement of research and teaching interests (2-4 pages);
– Names and addresses of three or more individuals willing to provide letters of reference.

Deadline for applications : April 29, 2019.
Selected candidates will be invited to present their work and project at ENSAE‐CREST.

Roberto Casarin’s talk at CREST tomorrow

Posted in Statistics with tags , , , , , , , , , , , on March 13, 2019 by xi'an

My former student and friend Roberto Casarin (University Ca’Foscari, Venice) will talk tomorrow at the CREST Financial Econometrics seminar on

“Bayesian Markov Switching Tensor Regression for Time-varying Networks”

Time: 10:30
Date: 14 March 2019
Place: Room 3001, ENSAE, Université Paris-Saclay

Abstract : We propose a new Bayesian Markov switching regression model for multi-dimensional arrays (tensors) of binary time series. We assume a zero-inflated logit dynamics with time-varying parameters and apply it to multi-layer temporal networks. The original contribution is threefold. First, in order to avoid over-fitting we propose a parsimonious parameterisation of the model, based on a low-rank decomposition of the tensor of regression coefficients. Second, the parameters of the tensor model are driven by a hidden Markov chain, thus allowing for structural changes. The regimes are identified through prior constraints on the mixing probability of the zero-inflated model. Finally, we model the jointly dynamics of the network and of a set of variables of interest. We follow a Bayesian approach to inference, exploiting the Pólya-Gamma data augmentation scheme for logit models in order to provide an efficient Gibbs sampler for posterior approximation. We show the effectiveness of the sampler on simulated datasets of medium-big sizes, finally we apply the methodology to a real dataset of financial networks.

Siem Reap conference

Posted in Kids, pictures, Travel, University life with tags , , , , , , , , , , , , , , , , , , on March 8, 2019 by xi'an

As I returned from the conference in Siem Reap. on a flight avoiding India and Pakistan and their [brittle and bristling!] boundary on the way back, instead flying far far north, near Arkhangelsk (but with nothing to show for it, as the flight back was fully in the dark), I reflected how enjoyable this conference had been, within a highly friendly atmosphere, meeting again with many old friends (some met prior to the creation of CREST) and new ones, a pleasure not hindered by the fabulous location near Angkor of course. (The above picture is the “last hour” group picture, missing a major part of the participants, already gone!)

Among the many talks, Stéphane Shao gave a great presentation on a paper [to appear in JASA] jointly written with Pierre Jacob, Jie Ding, and Vahid Tarokh on the Hyvärinen score and its use for Bayesian model choice, with a highly intuitive representation of this divergence function (which I first met in Padua when Phil Dawid gave a talk on this approach to Bayesian model comparison). Which is based on the use of a divergence function based on the squared error difference between the gradients of the true log-score and of the model log-score functions. Providing an alternative to the Bayes factor that can be shown to be consistent, even for some non-iid data, with some gains in the experiments represented by the above graph.

Arnak Dalalyan (CREST) presented a paper written with Lionel Riou-Durand on the convergence of non-Metropolised Langevin Monte Carlo methods, with a new discretization which leads to a substantial improvement of the upper bound on the sampling error rate measured in Wasserstein distance. Moving from p/ε to √p/√ε in the requested number of steps when p is the dimension and ε the target precision, for smooth and strongly log-concave targets.

This post gives me the opportunity to advertise for the NGO Sala Baï hostelry school, which the whole conference visited for lunch and which trains youths from underprivileged backgrounds towards jobs in hostelery, supported by donations, companies (like Krama Krama), or visiting the Sala Baï  restaurant and/or hotel while in Siem Reap.


my [homonym] talk this afternoon at CREST [Paris-Saclay]

Posted in pictures, Statistics, University life with tags , , , , , , , on March 4, 2019 by xi'an

Christian ROBERT (Université Lyon 1) « How large is the jump discontinuity in the diffusion coefficient of an Itô diffusion?”

Time: 3:30 pm – 4:30 pm
Date: 04th of March 2019
Place: Room 3105

Abstract : We consider high frequency observations from a one-dimensional diffusion process Y. We assume that the diffusion coefficient σ is continuously differentiable, but with a jump discontinuity at some levely. Such a diffusion has already been considered as a local volatility model for the underlying price of an asset, but raises several issues for pricing European options or for hedging such derivatives. We introduce kernel sign-constrained estimators of the left and right limits of σ at y, but up to constant factors. We present and discuss the asymptotic properties of these kernel estimators.  We then propose a method to evaluate these constant factors by looking for bandwiths for which the kernel estimators are stable by iteration. We finally provide an estimator of the jump discontinuity size and discuss its convergence rate.


Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , on November 5, 2018 by xi'an

A paper by Alexander Buchholz (CREST) and Nicolas Chopin (CREST) on quasi-Monte Carlo methods for ABC is going to appear in the Journal of Computational and Graphical Statistics. I had missed the opportunity when it was posted on arXiv and only became aware of the paper’s contents when I reviewed Alexander’s thesis for the doctoral school. The fact that the parameters are simulated (in ABC) from a prior that is quite generally a standard distribution while the pseudo-observations are simulated from a complex distribution (associated with the intractability of the likelihood function) means that the use of quasi-Monte Carlo sequences is in general only possible for the first part.

The ABC context studied there is close to the original version of ABC rejection scheme [as opposed to SMC and importance versions], the main difference standing with the use of M pseudo-observations instead of one (of the same size as the initial data). This repeated version has been discussed and abandoned in a strict Monte Carlo framework in favor of M=1 as it increases the overall variance, but the paper uses this version to show that the multiplication of pseudo-observations in a quasi-Monte Carlo framework does not increase the variance of the estimator. (Since the variance apparently remains constant when taking into account the generation time of the pseudo-data, we can however dispute the interest of this multiplication, except to produce a constant variance estimator, for some targets, or to be used for convergence assessment.) L The article also covers the bias correction solution of Lee and Latuszyǹski (2014).

Due to the simultaneous presence of pseudo-random and quasi-random sequences in the approximations, the authors use the notion of mixed sequences, for which they extend a one-dimension central limit theorem. The paper focus on the estimation of Z(ε), the normalization constant of the ABC density, ie the predictive probability of accepting a simulation which can be estimated at a speed of O(N⁻¹) where N is the number of QMC simulations, is a wee bit puzzling as I cannot figure the relevance of this constant (function of ε), especially since the result does not seem to generalize directly to other ABC estimators.

A second half of the paper considers a sequential version of ABC, as in ABC-SMC and ABC-PMC, where the proposal distribution is there  based on a Normal mixture with a small number of components, estimated from the (particle) sample of the previous iteration. Even though efficient techniques for estimating this mixture are available, this innovative step requires a calculation time that should be taken into account in the comparisons. The construction of a decreasing sequence of tolerances ε seems also pushed beyond and below what a sequential approach like that of Del Moral, Doucet and Jasra (2012) would produce, it seems with the justification to always prefer the lower tolerances. This is not necessarily the case, as recent articles by Li and Fearnhead (2018a, 2018b) and ours have shown (Frazier et al., 2018). Overall, since ABC methods are large consumers of simulation, it is interesting to see how the contribution of QMC sequences results in the reduction of variance and to hope to see appropriate packages added for standard distributions. However, since the most consuming part of the algorithm is due to the simulation of the pseudo-data, in most cases, it would seem that the most relevant focus should be on QMC add-ons on this part, which may be feasible for models with a huge number of standard auxiliary variables as for instance in population evolution.

clustering dynamical networks

Posted in pictures, Statistics, University life with tags , , , , , , , , , , on June 5, 2018 by xi'an

Yesterday I attended a presentation by Catherine Matias on dynamic graph structures, as she was giving a plenary talk at the 50th French statistical meeting, conveniently located a few blocks away from my office at ENSAE-CREST. In the nicely futuristic buildings of the EDF campus, which are supposed to represent cogs according to the architect, but which remind me more of these gas holders so common in the UK, at least in the past! (The E of EDF stands for electricity, but the original public company handled both gas and electricity.) This was primarily a survey of the field, which is much more diverse and multifaceted than I realised, even though I saw some recent developments by Antonietta Mira and her co-authors, as well as refereed a thesis on temporal networks at Ca’Foscari by Matteo Iacopini, which defence I will attend in early July. The difficulty in the approaches covered by Catherine stands with the amount and complexity of the latent variables induced by the models superimposed on the data. In her paper with Christophe Ambroise, she followed a variational EM approach. From the spectator perspective that is mine, I wondered at using ABC instead, which is presumably costly when the data size grows in space or in time. And at using tensor structures as in Mateo’s thesis. This reminded me as well of Luke Bornn’s modelling of basketball games following each player in real time throughout the game. (Which does not prevent the existence of latent variables.) But more vaguely and speculatively I also wonder at the meaning of the chosen models, which try to represent “everything” in the observed process, which seems doomed from the start given the heterogeneity of the data. While reaching my Keynesian pessimistic low- point, which happens rather quickly!, one could hope for projection techniques, towards reducing the dimension of the data of interest and of the parameter required by the model.