## end of a long era [1982-2017]

Posted in Books, pictures, Running, University life with tags , , , , , , , , , , , on May 23, 2017 by xi'an

This afternoon I went to CREST to empty my office there from books and a few papers (like the original manuscript version of Monte Carlo Statistical Methods). This is because the research centre, along with the ENSAE graduate school (my Alma mater), is moving to a new building on the Saclay plateau, next to École Polytechnique. As part of this ambitious migration of engineering schools from downtown Paris to a brand new campus there. Without getting sentimental about this move, it means leaving the INSEE building in Malakoff, on the outskirts of downtown Paris, which has been an enjoyable part of my student and then academic life from 1982 till now. And also leaving the INSEE Paris Club runners! (I am quite uncertain about being as active at the new location, if only because going there by bike is a bit more of a challenge. To be addressed anyway!) And I left behind my accumulation of conference badges (although I should try to recycle them for the incoming BNP 11 in Paris!).

## zig, zag, and subsampling

Posted in Books, Statistics, University life with tags , , , , , , , , , on December 29, 2016 by xi'an

Today, I alas missed a seminar at BiPS on the Zig-Zag (sub-)sampler of Joris Bierkens, Paul Fearnhead and Gareth Roberts, presented here in Paris by James Ridgway. Fortunately for me, I had some discussions with Murray Pollock in Warwick and then again with Changye Wu in Dauphine that shed some light on this complex but highly innovative approach to simulating in Big Data settings thanks to a correct subsampling mechanism.

The zig-zag process runs a continuous process made of segments that turn from one diagonal to the next at random times driven by a generator connected with the components of the gradient of the target log-density. Plus a symmetric term. Provided those random times can be generated, this process is truly available and associated with the right target distribution. When the components of the parameter are independent (an unlikely setting), those random times can be associated with an inhomogeneous Poisson process. In the general case, one needs to bound the gradients by more manageable functions that create a Poisson process that can later be thinned. Next, one needs to simulate the process for the upper bound, a task that seems hard to achieve apart from linear and piecewise constant upper bounds. The process has a bit of a slice sampling taste, except that it cannot be used as a slice sampler but requires continuous time integration, given that the length of each segment matters. (Or maybe random time subsampling?)

A highly innovative part of the paper concentrates on Big Data likelihoods and on the possibility to subsample properly and exactly the original dataset. The authors propose Zig-Zag with subsampling by turning the gradients into random parts of the gradients. While remaining unbiased. There may be a cost associated with this gain of one to n, namely that the upper bounds may turn larger as they handle all elements in the likelihood at once, hence become (even) less efficient. (I am more uncertain about the case of the control variates, as it relies on a Lipschitz assumption.) While I still miss an easy way to implement the approach in a specific model, I remain hopeful for this new approach to make a major dent in the current methodologies!

## variance of an exponential order statistics

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , , on November 10, 2016 by xi'an

This afternoon, one of my Monte Carlo students at ENSAE came to me with an exercise from Monte Carlo Statistical Methods that I did not remember having written. And I thus “charged” George Casella with authorship for that exercise!

Exercise 3.3 starts with the usual question (a) about the (Binomial) precision of a tail probability estimator, which is easy to answer by iterating simulation batches. Expressed via the empirical cdf, it is concerned with the vertical variability of this empirical cdf. The second part (b) is more unusual in that the first part is again an evaluation of a tail probability, but then it switches to find the .995 quantile by simulation and produce a precise enough [to three digits] estimate. Which amounts to assess the horizontal variability of this empirical cdf.

As we discussed about this question, my first suggestion was to aim at a value of N, number of Monte Carlo simulations, such that the .995 x N-th spacing had a length of less than one thousandth of the .995 x N-th order statistic. In the case of the Exponential distribution suggested in the exercise, generating order statistics is straightforward, since, as suggested by Devroye, see Section V.3.3, the i-th spacing is an Exponential variate with rate (N-i+1). This is so fast that Devroye suggests simulating Uniform order statistics by inverting Exponential order statistics (p.220)!

However, while still discussing the problem with my student, I came to a better expression of the question, which was to figure out the variance of the .995 x N-th order statistic in the Exponential case. Working with the density of this order statistic however led nowhere useful. A bit later, after Google-ing the problem, I came upon this Stack Exchange solution that made use of the spacing result mentioned above, namely that the expectation and variance of the k-th order statistic are

$\mathbb{E}[X_{(k)}]=\sum\limits_{i=N-k+1}^N\frac1i,\qquad \mbox{Var}(X_{(k)})=\sum\limits_{i=N-k+1}^N\frac1{i^2}$

which leads to the proper condition on N when imposing the variability constraint.

## Argentan, 30th and 17th and 7th edition(s)

Posted in Running, Travel with tags , , , , , , on October 2, 2015 by xi'an

When I started the ‘Og, in 2008, I was about to run the 23rd edition of the Argentan half-marathon… Seven years later, I am once again getting ready for the race, after a rather good training season, between the mountains of the North Cascade and the track of Malakoff. with the last week in England, Holland, and Canada having seen close to two trainings a day. (Borderline stress injury, maybe!) Weather does not look too bad this year, so we’ll see tomorrow how I fare against myself (and the other V2 runners, incidentally!).

## more gray matters

Posted in pictures with tags , , , on March 21, 2015 by xi'an

## Professor position at ENSAE, on the Paris Saclay campus

Posted in Statistics with tags , , , , , , , , on March 9, 2015 by xi'an

There is an opening at the Statistics School ENSAE for a Statistics associate or full professor position, starting on September 2015. Currently located on the South-West boundary of Paris, the school is soon to move to the mega-campus of Paris Saclay, near École Polytechnique, along with a dozen other schools. See this description of the position. The deadline is very close, March 23!

## 41ièmes Foulées de Malakoff [5k, 7⁰C, 18:40, 40th & 2nd V2]

Posted in Running with tags , , , , on February 7, 2015 by xi'an

[Warning: post of limited interest to most, about a local race I ran for another year!]

Once more, I managed to run my annual 5k in Malakof. And once again being (barely) there on the day of the race. Having landed a few hours earlier from Birmingham. Due to traffic and road closures, I arrived very later in Malakoff and could not warm up as usual, or even squeeze to the first rows on the starting line. Given those handicaps, I still managed in getting close to my best time of last year (18:40 vs. 18:36). I alas finished second in my V2 category, just a few meters behind the first V2 and definitely catching up on him! My INSEE Paris Club team won the company challenge for yet another year. Repeating a pattern of now many years.