**T**his morning, we had a jam session at the maths department of Paris-Dauphine where a few researchers & colleagues of mine presented their field of research to the whole department. Very interesting despite or thanks to the variety of topics, with forays into the three-body problem(s) [and Poincaré‘s mistake], mean fields for Nash equilibrium (or how to exit a movie theatre), approximate losses in machine learning and so on. Somehow, there was some unity as well through randomness, convexity and optimal transport. One talk close to my own interests was obviously the study of simulation within convex sets by Joseph Lehec from Paris-Dauphine [and Sébastien Bubeck & Ronen Eldan] as they had established a total variation convergence result at a speed only increasing polynomially with the dimension. The underlying simulation algorithm is rather theoretical in that it involves random walk (or Langevin corrected) moves where any excursion outside the convex support is replaced with its projection on the set. Projection that may prove pretty expensive to compute if the convex set is defined for instance as the intersection of many hyperplanes. So I do not readily see how the scheme can be recycled into a competitor to a Metropolis-Hastings solution in that the resulting chain hits the boundary from time to time. With the same frequency over iterations. A solution is to instead use Metropolis-Hastings of course, while another one is to bounce on the boundary and then correct by Metropolis-Hastings… The optimal scales in the three different cases are quite different, from √d in the Metropolis-Hastings cases to d√d in the projection case. (I did not follow the bouncing option to the end, as it lacks a normalising constant.) Here is a quick and not particularly helpful comparison of the exploration patterns of both approaches in dimension 50 for the unit sphere and respective scales of 10/d√d [blue] and 1/√d [gold].

## Archive for random walk

## optimal simulation on a convex set

Posted in R, Statistics with tags convexity, Henri Poincaré, high dimensions, optimal transport, random walk, total variation, Université Paris Dauphine on February 4, 2016 by xi'an## R typos

Posted in Books, Kids, R, Statistics, Travel, University life with tags Amsterdam, Bayesian Analysis, MCMskv, Metropolis-Hastings algorithm, mixtures, Monte Carlo Statistical Methods, R, random walk, testing as mixture estimation on January 27, 2016 by xi'an**A**t MCMskv, Alexander Ly (from Amsterdam) pointed out to me some R programming mistakes I made in the introduction to Metropolis-Hastings algorithms I wrote a few months ago for the Wiley on-line encyclopedia! While the outcome (Monte Carlo posterior) of the corrected version is moderately changed this is nonetheless embarrassing! The example (if not the R code) was a mixture of a Poisson and a Geometric distributions borrowed from our testing as mixture paper. Among other things, I used a flat prior on the mixture weights instead of a Beta(1/2,1/2) prior *and* a simple log-normal random walk on the mean parameter instead of a more elaborate second order expansion discussed in the text. And I also inverted the probabilities of success and failure for the Geometric density. The new version is now available on arXiv, and hopefully soon on the Wiley site, but one (the?) fact worth mentioning here is that the (right) corrections in the R code first led to overflows, because I was using the Beta random walk Be(εp,ε(1-p)) which major drawback I discussed here a few months ago. With the drag that nearly zero or one values of the weight parameter produced infinite values of the density… Adding 1 (or 1/2) to each parameter of the Beta proposal solved the problem. And led to a posterior on the weight still concentrating on the correct corner of the unit interval. In any case, a big thank you to Alexander for testing the R code and spotting out the several mistakes…

## independent Metropolis-Hastings

Posted in Books, Statistics with tags 0.234, block sampling, Gibbs sampler, independent Metropolis-Hastings algorithm, Metropolis-within-Gibbs algorithm, optimal acceptance rate, random walk on November 24, 2015 by xi'an

“In this paper we have demonstrated the potential benefits, both theoretical and practical, of the independence sampler over the random walk Metropolis algorithm.”

**P**eter Neal and Tsun Man Clement Lee arXived a paper on optimising the independent Metropolis-Hastings algorithm. I was a bit surprised at this “return” of the independent sampler, which I hardly mention in my lectures, so I had a look at the paper. The goal is to produce an equivalent to what Gelman, Gilks and Wild (1996) obtained for random walk samplers. In the formal setting when the target is a product of n identical densities f, the optimal number k of components to update in one Metropolis-Hastings (within Gibbs) round is approximately 2.835/I, where I is the symmetrised Kullback-Leibler distance between the (univariate) target f and the independent proposal q. When I is finite. The most surprising part is that the optimal acceptance rate is again 0.234, as in the random walk case. This is surprising in that I usually associate the independent Metropolis-Hastings algorithm with high acceptance rates. But this is of course when calibrating the proposal q, not the block size k of the Gibbs part. Hence, while this calibration of the independent Metropolis-within-Gibbs sampler is worth the study and almost automatically applicable, it remains that it only applies to a certain category of problems where blocking can take place. As in the disease models illustrating the paper. And requires an adequate choice of proposal distribution for, otherwise, the above quote becomes inappropriate.

## Non-reversible Markov Chains for Monte Carlo sampling

Posted in pictures, Statistics, Travel, University life with tags ABC, Alan Turing Institute, CRiSM, Hamiltonian Monte Carlo, intractable likelihood, lifting, Monte Carlo Statistical Methods, non-reversible diffusion, NUTS, overdamped Langevin algorithm, random walk, University of Warwick, workshop on September 24, 2015 by xi'an**T**his “week in Warwick” was not chosen at random as I was aware there is a workshop on non-reversible MCMC going on. (Even though CRiSM sponsored so many workshops in September that almost any week would have worked for the above sentence!) It has always been kind of a mystery to me that non-reversibility could make a massive difference in practice, even though I am quite aware that it does. And I can grasp some of the theoretical arguments why it does. So it was quite rewarding to sit in this Warwick amphitheatre and learn about overdamped Langevin algorithms and other non-reversible diffusions, to see results where convergence times moved from n to √n, and to grasp some of the appeal of lifting albeit in finite state spaces. Plus, the cartoon presentation of Hamiltonian Monte Carlo by Michael Betancourt was a great moment, not only because of the satellite bursting into flames on the screen but also because it gave a very welcome intuition about why reversibility was inefficient and HMC appealing. So I am grateful to my two colleagues, Joris Bierkens and Gareth Roberts, for organising this exciting workshop, with a most profitable scheduling favouring long and few talks. My next visit to Warwick will also coincide with a workshop on intractable likelihood, next November. This time part of the new Alan Turing Institute programme.

## the Flatland paradox [#2]

Posted in Books, Kids, R, Statistics, University life with tags ABC, combinatorics, exact ABC, Flatland, improper priors, Larry Wasserman, marginalisation paradoxes, paradox, Pierre Druilhet, random walk, subjective versus objective Bayes, William Feller on May 27, 2015 by xi'an**A**nother trip in the métro today (to work with Pierre Jacob and Lawrence Murray in a Paris Anticafé!, as the University was closed) led me to infer—warning!, this is not the exact distribution!—the distribution of *x*, namely

since a path *x* of length *l(x)* will corresponds to N draws if N-*l(x)* is an even integer *2p* and *p* undistinguishable annihilations in 4 possible directions have to be distributed over *l(x)*+1 possible locations, with Feller’s number of distinguishable distributions as a result. With a prior π(N)=1/N on N, hence on *p*, the posterior on *p* is given by

Now, given N and *x*, the probability of no annihilation on the last round is 1 when *l(x)*=N and in general

which can be integrated against the posterior. The numerical expectation is represented for a range of values of *l(x)* in the above graph. Interestingly, the posterior probability is constant for *l(x)* large and equal to 0.8125 under a flat prior over N.

**Getting back to Pierre Druilhet’s approach, he sets a flat prior on the length of the path θ and from there derives that the probability of annihilation is about 3/4. However, “the uniform prior on the paths of lengths lower or equal to M” used for this derivation which gives a probability of length l proportional to 3**^{l} is quite different from the distribution of l(θ) given a number of draws N. Which as shown above looks much more like a Binomial B(N,1/2).

However, being not quite certain about the reasoning involving Fieller’s trick, I ran an ABC experiment under a flat prior restricted to (*l(x)*,4*l(x)*) and got the above, where the histogram is for a posterior sample associated with *l(x)*=195 and the gold curve is the potential posterior. Since ABC is exact in this case (i.e., I only picked N’s for which l(x)=195), ABC is not to blame for the discrepancy! I asked about the distribution on Stack Exchange maths forum (and a few colleagues here as well) but got no reply so far… Here is the R code that goes with the ABC implementation:

#observation: elo=195 #ABC version T=1e6 el=rep(NA,T) N=sample(elo:(4*elo),T,rep=TRUE) for (t in 1:T){ #generate a path paz=sample(c(-(1:2),1:2),N[t],rep=TRUE) #eliminate U-turns uturn=paz[-N[t]]==-paz[-1] while (sum(uturn>0)){ uturn[-1]=uturn[-1]*(1- uturn[-(length(paz)-1)]) uturn=c((1:(length(paz)-1))[uturn==1], (2:length(paz))[uturn==1]) paz=paz[-uturn] uturn=paz[-length(paz)]==-paz[-1] } el[t]=length(paz)} #subsample to get exact posterior poster=N[abs(el-elo)==0]

## single variable transformation approach to MCMC

Posted in Books, Statistics, Travel with tags arXiv, asymptotic variance, Metropolis-Hastings, mixing speed, random walk on September 9, 2014 by xi'anI read the newly arXived paper “On Single Variable Transformation Approach to Markov Chain Monte Carlo” by Dey and Bhattacharya on the pleasant train ride between Bristol and Coventry last weekend. The paper actually follows several earlier papers by the authors that I have not read in detail. The notion of single variable transform is to add plus or minus the same random noise to all components of the current value of the Markov chain, instead of the standard d-dimensional random walk proposal of the reference Metropolis-Hastings algorithm, namely all proposals are of the form

meaning the chain proceeds [after acceptance] along *one and only one* of the d diagonals. The authors’ arguments are that (a) the proposal is cheaper and (b) the acceptance rate is higher. What I find questionable in this argument is that this does not directly matter in the evaluation of the performances of the algorithm. For instance, higher acceptance in a Metropolis-Hasting algorithm does not imply faster convergence and smaller asymptotic variance. (This goes without mentioning the fact that the comparative Figure 1 is so variable with the dimension as to be of limited worth. Figure 1 and 2 are also found in an earlier arXived paper of the authors.) For instance, restricting the moves along the diagonals of the Euclidean space implies that there is a positive probability to make two successive proposals along *the same* diagonal, which is a waste of time. When considering the two-dimensional case, joining two arbitrary points using an everywhere positive density g upon ε means generating two successive values from g, which is equivalent cost-wise to generating a single noise from a two-dimensional proposal. Without the intermediate step of checking the one-dimensional move along one diagonal. So much for a gain. In fine, the proposal found in this paper sums up as being a one-at-a-time version of a standard random walk Metropolis-Hastings algorithm.

## MCMC on zero measure sets

Posted in R, Statistics with tags conditional density, Hastings-Metropolis sampler, Jacobian, MCMC, measure theory, measure zero set, projected measure, random walk on March 24, 2014 by xi'an**S**imulating a bivariate normal under the constraint (or conditional to the fact) that x²-y²=1 (a non-linear zero measure curve in the 2-dimensional Euclidean space) is not that easy: if running a random walk along that curve (by running a random walk on y and deducing x as x²=y²+1 and accepting with a Metropolis-Hastings ratio based on the bivariate normal density), the outcome differs from the target predicted by a change of variable and the proper derivation of the conditional. The *above* graph resulting from the R code *below* illustrates the discrepancy!

targ=function(y){ exp(-y^2)/(1.52*sqrt(1+y^2))} T=10^5 Eps=3 ys=xs=rep(runif(1),T) xs[1]=sqrt(1+ys[1]^2) for (t in 2:T){ propy=runif(1,-Eps,Eps)+ys[t-1] propx=sqrt(1+propy^2) ace=(runif(1)<(dnorm(propy)*dnorm(propx))/ (dnorm(ys[t-1])*dnorm(xs[t-1]))) if (ace){ ys[t]=propy;xs[t]=propx }else{ ys[t]=ys[t-1];xs[t]=xs[t-1]}}

If instead we add the proper Jacobian as in

ace=(runif(1)<(dnorm(propy)*dnorm(propx)/propx)/ (dnorm(ys[t-1])*dnorm(xs[t-1])/xs[t-1]))

the fit is there. My open question is how to make this derivation generic, i.e. without requiring the (dreaded) computation of the (dreadful) Jacobian.