**T**his morning, we had a jam session at the maths department of Paris-Dauphine where a few researchers & colleagues of mine presented their field of research to the whole department. Very interesting despite or thanks to the variety of topics, with forays into the three-body problem(s) [and Poincaré‘s mistake], mean fields for Nash equilibrium (or how to exit a movie theatre), approximate losses in machine learning and so on. Somehow, there was some unity as well through randomness, convexity and optimal transport. One talk close to my own interests was obviously the study of simulation within convex sets by Joseph Lehec from Paris-Dauphine [and Sébastien Bubeck & Ronen Eldan] as they had established a total variation convergence result at a speed only increasing polynomially with the dimension. The underlying simulation algorithm is rather theoretical in that it involves random walk (or Langevin corrected) moves where any excursion outside the convex support is replaced with its projection on the set. Projection that may prove pretty expensive to compute if the convex set is defined for instance as the intersection of many hyperplanes. So I do not readily see how the scheme can be recycled into a competitor to a Metropolis-Hastings solution in that the resulting chain hits the boundary from time to time. With the same frequency over iterations. A solution is to instead use Metropolis-Hastings of course, while another one is to bounce on the boundary and then correct by Metropolis-Hastings… The optimal scales in the three different cases are quite different, from √d in the Metropolis-Hastings cases to d√d in the projection case. (I did not follow the bouncing option to the end, as it lacks a normalising constant.) Here is a quick and not particularly helpful comparison of the exploration patterns of both approaches in dimension 50 for the unit sphere and respective scales of 10/d√d [blue] and 1/√d [gold].

## Archive for the R Category

## optimal simulation on a convex set

Posted in R, Statistics with tags convexity, Henri Poincaré, high dimensions, optimal transport, random walk, total variation, Université Paris Dauphine on February 4, 2016 by xi'an## Le Monde puzzle [#947]

Posted in Books, Kids, pictures, R, Statistics with tags aperiodicity, chequerboard, coupling from the past, dead leaves, Le Monde, mathematical puzzle, R, Wilfrid Kendall on February 2, 2016 by xi'an**A**nother boardgame in Le Monde mathematical puzzle :

Given an 8×8 chequerboard, consider placing 2×2 tiles over thischequerboard until (a) the entire surface is covered and (b) removing a single 2×2 tile exposes some of the original chequerboard. What is the maximal number of 2×2 tiles one can set according to this scheme? And for a 10×10 chequerboard?

**T**his puzzle reminded me of Wilfrid Kendall’s introduction to perfect sampling with leaves seen through a ceiling window falling from above, until sky was no longer visible. The representation was used by Wilfrid to explain that coupling from the past did not need to go all the way back to infinity:

Defining a random coverage of the chequerboard by those 2×2 tiles amounts to selecting a random permutation þ of 1:(n-1)² and finding the subvector of þ producing a full coverage

grid=matrix(0,n,n) path=sample(1:(n-1)^2) #random permutation path=path+((path-1)%/%(n-1)) #account for border shift i=1 neigh=c(0,1,n,n+1) while (min(grid)==0){ #all entries covered grid[path[i]+neigh]=grid[path[i]+neigh]+1 i=i+1 } i=i-1

Then removing superfluous tiles:

for (k in sample(1:i)){ krid=grid krid[path[k]+neigh]=krid[path[k]+neigh]-1 if (min(krid)>0){ #off used below off[k]=FALSE; grid=krid} #useless tile }

And checking the remaining ones are truly essential:

mingrid=0 for (k in (1:i)[off]){ krid=grid krid[path[k]+neigh]=krid[path[k]+neigh]-1 mingrid=max(mingrid,min(krid)) } sky=(mingrid>0) #rejection of the grid

leads to the maximum number of tiles to be *[at least]* M=16,25,37,54 for n=6,8,10,12…

## love-hate Metropolis algorithm

Posted in Books, pictures, R, Statistics, Travel with tags auxiliary variable, doubly intractable problems, Metropolis-Hastings algorithm, Monte Carlo Statistical Methods, multimodality, normalising constant, parallel tempering, pseudo-marginal MCMC, The night of the hunter, unbiased estimation on January 28, 2016 by xi'an**H**yungsuk Tak, Xiao-Li Meng and David van Dyk just arXived a paper on a multiple choice proposal in Metropolis-Hastings algorithms towards dealing with multimodal targets. Called “A repulsive-attractive Metropolis algorithm for multimodality” *[although I wonder why XXL did not jump at the opportunity to use the “love-hate” denomination!]*. The proposal distribution includes a [forced] downward Metropolis-Hastings move that uses the inverse of the target density π as its own target, namely 1/{π(x)+ε}. Followed by a [forced] Metropolis-Hastings upward move which target is {π(x)+ε}. The +ε is just there to avoid handling ratios of zeroes (although I wonder why using the convention 0/0=1 would not work). And chosen as 10⁻³²³ by default in connection with R smallest positive number. Whether or not the “downward” move is truly downwards and the “upward” move is truly upwards obviously depends on the generating distribution: I find it rather surprising that the authors consider the *same* random walk density in both cases as I would have imagined relying on a more dispersed distribution for the downward move in order to reach more easily other modes. For instance, the downward move could have been based on an *anti*-Langevin proposal, relying on the gradient to proceed further down…

This special choice of a single proposal however simplifies the acceptance ratio (and keeps the overall proposal symmetric). The final acceptance ratio still requires a ratio of intractable normalising constants that the authors bypass by Møller et al. (2006) auxiliary variable trick. While the authors mention the alternative pseudo-marginal approach of Andrieu and Roberts (2009), they do not try to implement it, although this would be straightforward here: since the normalising constants are the probabilities of accepting a downward and an upward move, respectively. Those can easily be evaluated at a cost similar to the use of the auxiliary variables. That is,

– generate a few moves from the current value and record the proportion *p* of accepted downward moves;

– generate a few moves from the final proposed value and record the proportion *q* of accepted downward moves;

and replace the ratio of intractable normalising constants with *p/q*. It is not even clear that one needs those extra moves since the algorithm requires an acceptance in the downward and upward moves, hence generate Geometric variates associated with those probabilities p and q, variates that can be used for estimating them. From a theoretical perspective, I also wonder if forcing the downward and upward moves truly leads to an improved convergence speed. Considering the case when the random walk is poorly calibrated for either the downward or upward move, the number of failed attempts before an acceptance may get beyond the reasonable.

As XXL and David pointed out to me, the unusual aspect of the approach is that here the proposal density is intractable, rather than the target density itself. This makes using Andrieu and Roberts (2009) seemingly less straightforward. However, as I was reminded this afternoon at the statistics and probability seminar in Bristol, the argument for the pseudo-marginal based on an unbiased estimator is that w Q(w|x) has a marginal in x equal to π(x) when the expectation of w is π(x). In thecurrent problem, the proposal in x can extended into a proposal in (x,w), w P(w|x) whose marginal is the proposal on x.

If we complement the target π(x) with the conditional P(w|x), the acceptance probability would then involve

{π(x’) P(w’|x’) / π(x) P(w|x)} / {w’ P(w’|x’) / w P(w|x)} = {π(x’) / π(x)} {w/w’}

so it seems the pseudo-marginal (or auxiliary variable) argument also extends to the proposal. Here is a short experiment that shows no discrepancy between target and histogram:

nozero=1e-300 #love-hate move move<-function(x){ bacwa=1;prop1=prop2=rnorm(1,x,2) while (runif(1)>{pi(x)+nozero}/{pi(prop1)+nozero}){ prop1=rnorm(1,x,2);bacwa=bacwa+1} while (runif(1)>{pi(prop2)+nozero}/{pi(prop1)+nozero}) prop2=rnorm(1,prop1,2) y=x if (runif(1)<pi(prop2)*bacwa/pi(x)/fowa){ y=prop2;assign("fowa",bacwa)} return(y)} #arbitrary bimodal target pi<-function(x){.25*dnorm(x)+.75*dnorm(x,mean=5)} #running the chain T=1e5 x=5*rnorm(1);luv8=rep(x,T) fowa=1;prop1=rnorm(1,x,2) #initial estimate while (runif(1)>{pi(x)+nozero}/{pi(prop1)+nozero}){ fowa=fowa+1;prop1=rnorm(1,x,2)} for (t in 2:T) luv8[t]=move(luv8[t-1])

## R typos

Posted in Books, Kids, R, Statistics, Travel, University life with tags Amsterdam, Bayesian Analysis, MCMskv, Metropolis-Hastings algorithm, mixtures, Monte Carlo Statistical Methods, R, random walk, testing as mixture estimation on January 27, 2016 by xi'an**A**t MCMskv, Alexander Ly (from Amsterdam) pointed out to me some R programming mistakes I made in the introduction to Metropolis-Hastings algorithms I wrote a few months ago for the Wiley on-line encyclopedia! While the outcome (Monte Carlo posterior) of the corrected version is moderately changed this is nonetheless embarrassing! The example (if not the R code) was a mixture of a Poisson and a Geometric distributions borrowed from our testing as mixture paper. Among other things, I used a flat prior on the mixture weights instead of a Beta(1/2,1/2) prior *and* a simple log-normal random walk on the mean parameter instead of a more elaborate second order expansion discussed in the text. And I also inverted the probabilities of success and failure for the Geometric density. The new version is now available on arXiv, and hopefully soon on the Wiley site, but one (the?) fact worth mentioning here is that the (right) corrections in the R code first led to overflows, because I was using the Beta random walk Be(εp,ε(1-p)) which major drawback I discussed here a few months ago. With the drag that nearly zero or one values of the weight parameter produced infinite values of the density… Adding 1 (or 1/2) to each parameter of the Beta proposal solved the problem. And led to a posterior on the weight still concentrating on the correct corner of the unit interval. In any case, a big thank you to Alexander for testing the R code and spotting out the several mistakes…

## high dimension Metropolis-Hastings algorithms

Posted in Books, Kids, Mountains, pictures, R, Statistics with tags acceptance probability, curse of dimensionality, high dimensions, MCMC, Metropolis-Hastings algorithm, Monte Carlo Statistical Methods, unmlaut on January 26, 2016 by xi'an**W**hen discussing high dimension models with Ingmar ~~Schüster~~ Schuster [blame my fascination for accented characters!] the other day, we came across the following paradox with Metropolis-Hastings algorithms. If attempting to simulate from a multivariate standard normal distribution in a large dimension, when starting from the mode of the target, i.e., its mean γ, leaving the mode γis extremely unlikely, given the huge drop between the value of the density at the mode γ and at likely realisations (corresponding to the blue sequence). Even when relying on the very scale that makes the proposal identical to the target! Resorting to a tiny scale like Σ/p manages to escape the unhealthy neighbourhood of the highly unlikely mode (as shown with the brown sequence).

Here is the corresponding R code:

p=100 T=1e3 mh=mu #mode as starting value vale=rep(0,T) for (t in 1:T){ prop=mvrnorm(1,mh,sigma/p) if (log(runif(1))<logdmvnorm(prop,mu,sigma)- logdmvnorm(mh,mu,sigma)) mh=prop vale[t]=logdmvnorm(mh,mu,sigma)}

## MCqMC 2016

Posted in Books, pictures, R, Statistics, Travel, University life on January 18, 2016 by xi'an**A**fter the MCqMC 2014 conference in Leuven I enjoyed very much, the MCqMC 2016 instalment takes place in Stanford this (late) summer. I cannot alas attend it, as I will be in Australia all ~~summer~~ winter, but the program looks terrific! As Art’s tutorial so brilliantly showed at MCMskv last week, the connections between the quasi-Monte Carlo and Bayesian computational communities are underdeveloped. This is thus a great opportunity to reach across the gap (and Stanford is definitely warmer than San Francisco in the summer!). I thus encourage all MCMskv participants, BayesComp members, and ‘Og’s readers to attend! (Warning, the registrations fees are $500.) In particular, [contributed] sessions can be proposed till Feb. 22.

## precision in MCMC

Posted in Books, R, Statistics, University life with tags aperiodicity, CNRS, dynamical systems, Images des Mathématiques, MCMC algorithms, Metropolis-Hastings algorithm, Monte Carlo Statistical Methods, pseudo-random generator, round-off error, slice sampler on January 14, 2016 by xi'anWhile browsing Images des Mathématiques, I came across this article *[in French]* that studies the impact of round-off errors on number representations in a dynamical system and checked how much this was the case for MCMC algorithms like the slice sampler (recycling some R code from Monte Carlo Statistical Methods). By simply adding a few signif(…,dig=n) in the original R code. And letting the precision n vary.

“…si on simule des trajectoires pendant des intervalles de temps très longs, trop longs par rapport à la précision numérique choisie, alors bien souvent, les résultats des simulations seront complètement différents de ce qui se passe en réalité…”

Rather unsurprisingly (!), using a small enough precision (like two digits on the first row) has a visible impact on the simulation of a truncated normal. Moving to three digits seems to be sufficient in this example… One thing this tiny experiment reminds me of is the lumpability property of Kemeny and Snell. A restriction on Markov chains for aggregated (or discretised) versions to be ergodic or even Markov. Also, in 2000, Laird Breyer, Gareth Roberts and Jeff Rosenthal wrote a Statistics and Probability Letters paper on the impact of round-off errors on geometric ergodicity. However, I presume [maybe foolishly!] that the result stated in the original paper, namely that there exists an infinite number of precision digits for which the dynamical system degenerates into a small region of the space does not hold for MCMC. Maybe foolishly so because the above statement means that running a dynamical system for “too” long given the chosen precision kills the intended stationary properties of the system. Which I interpret as getting non-ergodic behaviour when exceeding the period of the uniform generator. More or less.