Archive for the University life Category

inference with Wasserstein distance

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on January 23, 2017 by xi'an

Today, Pierre Jacob posted on arXiv a paper of ours on the use of the Wasserstein distance in statistical inference, which main focus is exploiting this distance to create an automated measure of discrepancy for ABC. Which is why the full title is Inference in generative models using the Wasserstein distance. Generative obviously standing for the case when a model can be generated from but cannot be associated with a closed-form likelihood. We had all together discussed this notion when I visited Harvard and Pierre last March, with much excitement. (While I have not contributed much more than that round of discussions and ideas to the paper, the authors kindly included me!) The paper contains theoretical results for the consistency of statistical inference based on those distances, as well as computational on how the computation of these distances is practically feasible and on how the Hilbert space-filling curve used in sequential quasi-Monte Carlo can help. The notion further extends to dependent data via delay reconstruction and residual reconstruction techniques (as we did for some models in our empirical likelihood BCel paper). I am quite enthusiastic about this approach and look forward discussing it at the 17w5015 BIRS ABC workshop, next month!

weakly informative reparameterisations for location-scale mixtures

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , on January 19, 2017 by xi'an

fitted_density_galaxy_data_500itersWe have been working towards a revision of our reparameterisation paper for quite a while now and too advantage of Kate Lee visiting Paris this fortnight to make a final round: we have now arXived (and submitted) the new version. The major change against the earlier version is the extension of the approach to a large class of models that include infinitely divisible distributions, compound Gaussian, Poisson, and exponential distributions, and completely monotonic densities. The concept remains identical: change the parameterisation of a mixture from a component-wise decomposition to a construct made of the first moment(s) of the distribution and of component-wise objects constrained by the moment equation(s). There is of course a bijection between both parameterisations, but the constraints appearing in the latter produce compact parameter spaces for which (different) uniform priors can be proposed. While the resulting posteriors are no longer conjugate, even conditional on the latent variables, standard Metropolis algorithms can be implemented to produce Monte Carlo approximations of these posteriors.

la maison des mathématiques

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , on January 14, 2017 by xi'an

ihp0When I  worked with Jean-Michel Marin at Institut Henri Poincaré the week before Xmas, there was this framed picture standing on the ground, possibly in preparation for exhibition in the Institute. I found this superposition of the lady cleaning the blackboard from its maths formulas and of the seemingly unaware mathematician both compelling visually in the sheer geometric aesthetics of the act and somewhat appalling in its message. Especially when considering the initiatives taken by IHP towards reducing the gender gap in maths. After inquiring into the issue, I found that this picture was part of a whole photograph exhibit on IHP by Vincent Moncorgé, now published into a book, La Maison des Mathématiques by Villani, Uzan, and Moncorgé. Most pictures are on-line and I found them quite appealing. Except again for the above.

Le Monde puzzle [#990]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , on January 12, 2017 by xi'an

To celebrate the new year (assuming it is worth celebrating!), Le Monde mathematical puzzle came up with the following:

Two sequences (x¹,x²,…) and (y¹,y²,…) are defined as follows: the current value of x is either the previous value or twice the previous value, while the current value of y is the sum of the values of x up to now. What is the minimum number of steps to reach 2016 or 2017?

By considering that all consecutive powers of 2 must appear at least one, the puzzles boils down to finding the minimal number of replications in the remainder of the year minus the sum of all powers of 2. Which itself boils down to deriving the binary decomposition of that remainder. Hence the basic R code (using intToBits):

deco=function(k=2016){
 m=trunc(log2(k))
 while (sum(2^(0:m))>k) m=m-1
 if (sum(2^(0:m))==k){ return(rep(1,m+1))
 }else{
 res=k-sum(2^(0:m))
 return(rep(1,m+1)+as.integer(intToBits(res))[1:(m+1)])

which produces

> sum(deco(2016))
[1] 16
> sum(deco(2017))
[1] 16
> sum(deco(1789))
[1] 18

learning and inference for medical discovery in Oxford [postdoc]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , on January 10, 2017 by xi'an

[Here is a call for a two-year postdoc in Oxford sent to me by Arnaud Doucet. For those worried about moving to Britain, I think that, given the current pace—or lack thereof—of the negotiations with the EU, it is very likely that Britain will not have Brexited two years from now.]

Numerous medical problems ranging from screening to diagnosis to treatment of chronic diseases to  management of care in hospitals requires the development of novel statistical models and methods. These models and methods need to address the unique characteristics of medical data such as sampling bias, heterogeneity, non-stationarity, informative censoring etc. Existing state-of-the-art machine learning and statistics techniques often fail to exploit those characteristics. Additionally, the focus needs to be on probabilistic models which are
interpretable by the clinicians so that the inference results can be integrated within the medical-decision making.

We have access to unique datasets for clinical deterioration of patients in the hospital, for cancer screening, and for treatment of chronic diseases. Preliminary work has been tested and implemented at UCLA Medical Center, resulting in significantly management care in this hospital.

The successful applicant will be expected to develop new probabilistic models and learning methods inspired by these applications. The focus will be primarily on methodological and theoretical developments, and involve collaborating with Oxford researchers in machine learning, computational statistics and medicine to bring these developments to practice.

The post-doctoral researcher will be jointly supervised by Prof. Mihaela van der Schaar and Prof. Arnaud Doucet. Both of them have a strong track-record in advising PhD students and post-doctoral researchers who subsequently became successful academics in statistics, engineering sciences, computer science and economics. The position is for 2 years.

empirical Bayes, reference priors, entropy & EM

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , on January 9, 2017 by xi'an

Klebanov and co-authors from Berlin arXived this paper a few weeks ago and it took me a quiet evening in Darjeeling to read it. It starts with the premises that led Robbins to introduce empirical Bayes in 1956 (although the paper does not appear in the references), where repeated experiments with different parameters are run. Except that it turns non-parametric in estimating the prior. And to avoid resorting to the non-parametric MLE, which is the empirical distribution, it adds a smoothness penalty function to the picture. (Warning: I am not a big fan of non-parametric MLE!) The idea seems to have been Good’s, who acknowledged using the entropy as penalty is missing in terms of reparameterisation invariance. Hence the authors suggest instead to use as penalty function on the prior a joint relative entropy on both the parameter and the prior, which amounts to the average of the Kullback-Leibler divergence between the sampling distribution and the predictive based on the prior. Which is then independent of the parameterisation. And of the dominating measure. This is the only tangible connection with reference priors found in the paper.

The authors then introduce a non-parametric EM algorithm, where the unknown prior becomes the “parameter” and the M step means optimising an entropy in terms of this prior. With an infinite amount of data, the true prior (meaning the overall distribution of the genuine parameters in this repeated experiment framework) is a fixed point of the algorithm. However, it seems that the only way it can be implemented is via discretisation of the parameter space, which opens a whole Pandora box of issues, from discretisation size to dimensionality problems. And to motivating the approach by regularisation arguments, since the final product remains an atomic distribution.

While the alternative of estimating the marginal density of the data by kernels and then aiming at the closest entropy prior is discussed, I find it surprising that the paper does not consider the rather natural of setting a prior on the prior, e.g. via Dirichlet processes.

truncated normal algorithms

Posted in Books, pictures, R, Statistics, University life with tags , , , , on January 4, 2017 by xi'an

Nicolas Chopin (CREST) just posted an entry on Statisfaction about the comparison of truncated Normal algorithms run by Alan Rogers, from the University of Utah. Nicolas wrote a paper in Statistics and Computing about a simulation method, which proposes a Ziggurat type of algorithm for this purpose, and which I do not remember reading, thanks to my diminishing memory buffer!  As shown in the picture below, when truncating to the half-line (a,∞), this method improves upon my accept-reject approach except in the far tails.

truncanormOn the top graph, made by Alan Rogers, my uniform proposal (r) seems to be doing better for a Normal truncated to (a,b) when b<0, or when a gets large and close to b. Nicolas’ ziggurat (c) works better than the Gaussian accept-reject method (c) on the positive part. (I wonder what the exponential proposal (e) stands for, in terms of scale parameter.)