**T**oday, Pierre Jacob posted on arXiv a paper of ours on the use of the Wasserstein distance in statistical inference, which main focus is exploiting this distance to create an automated measure of discrepancy for ABC. Which is why the full title is Inference in generative models using the Wasserstein distance. Generative obviously standing for the case when a model can be generated from but cannot be associated with a closed-form likelihood. We had all together discussed this notion when I visited Harvard and Pierre last March, with much excitement. (While I have not contributed much more than that round of discussions and ideas to the paper, the authors kindly included me!) The paper contains theoretical results for the consistency of statistical inference based on those distances, as well as computational on how the computation of these distances is practically feasible and on how the Hilbert space-filling curve used in sequential quasi-Monte Carlo can help. The notion further extends to dependent data via delay reconstruction and residual reconstruction techniques (as we did for some models in our empirical likelihood BCel paper). I am quite enthusiastic about this approach and look forward discussing it at the 17w5015 BIRS ABC workshop, next month!

## Archive for the Books Category

## inference with Wasserstein distance

Posted in Books, Statistics, University life with tags 17w5025, adaptive Monte Carlo algorithm, Banff, BIRS, Canada, empirical distribution, Harvard University, numerical transport, optimal transport, statistical inference, synthetic data, Wasserstein distance on January 23, 2017 by xi'an## an express riddle

Posted in Books, Kids, R with tags FiveThirtyEight, harmonic series, The Riddler on January 20, 2017 by xi'an**A** quick puzzle on The Riddler this week that enjoys a quick solution once one writes it out. The core of the puzzle is about finding the average number of draws one need to empty a population of size T if each draw is uniform over the remaining number of individuals between one and the number that remain. It is indeed easy to see that this average satisfies

since all draws but one require an extra draw. A recursion then leads by elimination to deduce that

which is the beginning of the (divergent) harmonic series. In the case T=30, the solution is (almost) equal to 4.

> sum(1/(1:30))*1e10

[1] 39949871309

**A** second riddle the same week reminded me of a result in Devroye’s Non-Uniform Random Variate Generation, namely to find the average number of draws from a Uniform until the sequence goes down. Actually, the real riddle operates with a finite support Uniform, but I find the solution with the continuous Uniform more elegant. And it only took a few metro stops to solve. The solution goes as follows: the probability to stop after two Uniform draws is 1/2, after n uniform draws, it is (n-1)/n!, which does sum up to 1:

and the expectation of this distribution is e-1 by a very similar argument, as can be checked by a rudimentary Monte Carlo experiment

> over(1e7) #my implementation of the puzzle

[1] 1.7185152

## weakly informative reparameterisations for location-scale mixtures

Posted in Books, pictures, R, Statistics, University life with tags compound Gaussian distribution, compound Poisson distribution, MCMC, Metropolis-Hastings algorithm, mixtures of distributions, Monte Carlo Statistical Methods, reparameterisation on January 19, 2017 by xi'an**W**e have been working towards a revision of our reparameterisation paper for quite a while now and too advantage of Kate Lee visiting Paris this fortnight to make a final round: we have now arXived (and submitted) the new version. The major change against the earlier version is the extension of the approach to a large class of models that include infinitely divisible distributions, compound Gaussian, Poisson, and exponential distributions, and completely monotonic densities. The concept remains identical: change the parameterisation of a mixture from a component-wise decomposition to a construct made of the first moment(s) of the distribution and of component-wise objects constrained by the moment equation(s). There is of course a bijection between both parameterisations, but the constraints appearing in the latter produce compact parameter spaces for which (different) uniform priors can be proposed. While the resulting posteriors are no longer conjugate, even conditional on the latent variables, standard Metropolis algorithms can be implemented to produce Monte Carlo approximations of these posteriors.

## la maison des mathématiques

Posted in Books, Kids, pictures, Statistics, University life with tags Akashic Books, book review, exhibit, IHP, Institut Henri Poincaré, la maison des mathématiques, Paris, photograph, Vincent Moncorgé on January 14, 2017 by xi'an**W**hen I worked with Jean-Michel Marin at Institut Henri Poincaré the week before Xmas, there was this framed picture standing on the ground, possibly in preparation for exhibition in the Institute. I found this superposition of the lady cleaning the blackboard from its maths formulas and of the seemingly unaware mathematician both compelling visually in the sheer geometric aesthetics of the act and somewhat appalling in its message. Especially when considering the initiatives taken by IHP towards reducing the gender gap in maths. After inquiring into the issue, I found that this picture was part of a whole photograph exhibit on IHP by Vincent Moncorgé, now published into a book, La Maison des Mathématiques by Villani, Uzan, and Moncorgé. Most pictures are on-line and I found them quite appealing. Except again for the above.

## Le Monde puzzle [#990]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags binary, intToBits(), Le Monde, mathematical puzzle, R on January 12, 2017 by xi'an**T**o celebrate the new year (assuming it is worth celebrating!), Le Monde mathematical puzzle came up with the following:

Two sequences (x¹,x²,…) and (y¹,y²,…) are defined as follows: the current value of x is either the previous value or twice the previous value, while the current value of y is the sum of the values of x up to now. What is the minimum number of steps to reach 2016 or 2017?

B*y* considering that all consecutive powers of 2 must appear at least one, the puzzles boils down to finding the minimal number of replications in the remainder of the year minus the sum of all powers of 2. Which itself boils down to deriving the binary decomposition of that remainder. Hence the basic R code (using intToBits):

deco=function(k=2016){ m=trunc(log2(k)) while (sum(2^(0:m))>k) m=m-1 if (sum(2^(0:m))==k){ return(rep(1,m+1)) }else{ res=k-sum(2^(0:m)) return(rep(1,m+1)+as.integer(intToBits(res))[1:(m+1)])

which produces

> sum(deco(2016)) [1] 16 > sum(deco(2017)) [1] 16 > sum(deco(1789)) [1] 18

## anytime algorithm

Posted in Books, Statistics with tags anytime algorithm, Cambridge University, computing cost, exchangeability, Harvard University, MCMC, SMC, SMC², University of Oxford, University of Warwick on January 11, 2017 by xi'an**L**awrence Murray, Sumeet Singh, Pierre Jacob, and Anthony Lee (Warwick) recently arXived a paper on Anytime Monte Carlo. (The earlier post on this topic is no coincidence, as Lawrence had told me about this problem when he visited Paris last Spring. Including a forced extension when his passport got stolen.) The difficulty with anytime algorithms for MCMC is the lack of exchangeability of the MCMC sequence (except for formal settings where regeneration can be used).

When accounting for duration of computation between steps of an MCMC generation, the Markov chain turns into a Markov jump process, whose stationary distribution α is biased by the average delivery time. Unless it is constant. The authors manage this difficulty by interlocking the original chain with a secondary chain so that even- and odd-index chains are independent. The secondary chain is then discarded. This provides a way to run an anytime MCMC. The principle can be extended to K+1 chains, run one after the other, since only one of those chains need be discarded. It also applies to SMC and SMC². The appeal of anytime simulation in this particle setting is that resampling is no longer a bottleneck. Hence easily distributed among processors. One aspect I do not fully understand is how the computing budget is handled, since allocating the same real time to each iteration of SMC seems to envision each target in the sequence as requiring the same amount of time. (An interesting side remark made in this paper is the lack of exchangeability resulting from elaborate resampling mechanisms, lack I had not thought of before.)