## single variable transformation approach to MCMC

Posted in Books, Statistics, Travel with tags , , , , on September 9, 2014 by xi'an

I read the newly arXived paper “On Single Variable Transformation Approach to Markov Chain Monte Carlo” by Dey and Bhattacharya on the pleasant train ride between Bristol and Coventry last weekend. The paper actually follows several earlier papers by the authors that I have not read in detail. The notion of single variable transform is to add plus or minus the same random noise to all components of the current value of the Markov chain, instead of the standard d-dimensional random walk proposal of the reference Metropolis-Hastings algorithm, namely all proposals are of the form

$x_i'=x_i\pm \epsilon\ i=1,\cdots,d$

meaning the chain proceeds [after acceptance] along one and only one of the d diagonals. The authors’ arguments are that (a) the proposal is cheaper and (b) the acceptance rate is higher. What I find questionable in this argument is that this does not directly matter in the evaluation of the performances of the algorithm. For instance, higher acceptance in a Metropolis-Hasting algorithm does not imply faster convergence and smaller asymptotic variance. (This goes without mentioning the fact that the comparative Figure 1 is so variable with the dimension as to be of limited worth. Figure 1 and 2 are also found in an earlier arXived paper of the authors.) For instance, restricting the moves along the diagonals of the Euclidean space implies that there is a positive probability to make two successive proposals along the same diagonal, which is a waste of time. When considering the two-dimensional case, joining two arbitrary points using an everywhere positive density g upon ε means generating two successive values from g, which is equivalent cost-wise to generating a single noise from a two-dimensional proposal. Without the intermediate step of checking the one-dimensional move along one diagonal. So much for a gain. In fine, the proposal found in this paper sums up as being a one-at-a-time version of a standard random walk Metropolis-Hastings algorithm.

## MCMC on zero measure sets

Posted in R, Statistics with tags , , , , , , , on March 24, 2014 by xi'an

Simulating a bivariate normal under the constraint (or conditional to the fact) that x²-y²=1 (a non-linear zero measure curve in the 2-dimensional Euclidean space) is not that easy: if running a random walk along that curve (by running a random walk on y and deducing x as x²=y²+1 and accepting with a Metropolis-Hastings ratio based on the bivariate normal density), the outcome differs from the target predicted by a change of variable and the proper derivation of the conditional. The above graph resulting from the R code below illustrates the discrepancy!

targ=function(y){
exp(-y^2)/(1.52*sqrt(1+y^2))}

T=10^5
Eps=3
ys=xs=rep(runif(1),T)
xs[1]=sqrt(1+ys[1]^2)
for (t in 2:T){
propy=runif(1,-Eps,Eps)+ys[t-1]
propx=sqrt(1+propy^2)
ace=(runif(1)<(dnorm(propy)*dnorm(propx))/
(dnorm(ys[t-1])*dnorm(xs[t-1])))
if (ace){
ys[t]=propy;xs[t]=propx
}else{
ys[t]=ys[t-1];xs[t]=xs[t-1]}}


  ace=(runif(1)<(dnorm(propy)*dnorm(propx)/propx)/
(dnorm(ys[t-1])*dnorm(xs[t-1])/xs[t-1]))


the fit is there. My open question is how to make this derivation generic, i.e. without requiring the (dreaded) computation of the (dreadful) Jacobian.

## sampling from time-varying log-concave distributions

Posted in Statistics, University life with tags , , , , , on October 2, 2013 by xi'an

Sasha Rakhlin from Wharton sent me this paper he wrote (and arXived) with Hariharan Narayanan on a specific Markov chain algorithm that handles sequential Monte Carlo problems for log-concave targets. By relying on novel (by my standards) mathematical techniques, they manage to obtain geometric ergodicity results for random-walk based algorithms and log-concave targets. One of the new tools is the notion of self-concordant barrier, a sort of convex potential function F associated with a reference convex set and with Lipschitz properties. The second tool is a Gaussian distribution based on the metric induced by F. The third is the Dikin walk Markov chain, which uses this Gaussian as proposal and moves almost like the Metropolis-Hastings algorithm, except that it rejects with at least a probability of ½. The scale (or step size) of the Gaussian proposal is determined by the regularity of the log-concave target. In that setting, the total variation distance between the target at the t-th level and the distribution of the Markov chain can be fairly precisely approximated. Which leads in turn to a scaling of the number of random walk steps that are necessary to ensure convergence. Depending on the pace of the moving target, a single step of the random walk may be sufficient, which is quite an interesting feature.

## making a random walk geometrically ergodic

Posted in R, Statistics with tags , , , , , , , , , on March 2, 2013 by xi'an

While a random walk Metropolis-Hastings algorithm cannot be uniformly ergodic in a general setting (Mengersen and Tweedie, AoS, 1996), because it needs more energy to leave far away starting points, it can be geometrically ergodic depending on the target (and the proposal). In a recent Annals of Statistics paper, Leif Johnson and Charlie Geyer designed a trick to turn a random walk Metropolis-Hastings algorithm into a geometrically ergodic random walk Metropolis-Hastings algorithm by virtue of an isotropic transform (under the provision that the original target density has a moment generating function). This theoretical result is complemented by an R package called mcmc. (I have not tested it so far, having read the paper in the métro.) The examples included in the paper are however fairly academic and I wonder how the method performs in practice, on truly complex models, in particular because the change of variables relies on (a) an origin and (b) changing the curvature of space uniformly in all dimensions. Nonetheless, the idea is attractive and reminds me of a project of ours with Randal Douc,  started thanks to the ‘Og and still under completion.

## Random dive MH

Posted in R, Statistics with tags , , , , on September 2, 2010 by xi'an

A new Metropolis-Hastings algorithm that I would call “universal” was posted by Somak Dutta yesterday on arXiv. Multiplicative random walk Metropolis-Hastings on the real line contains a different Metropolis-Hastings algorithm called the random dive. The proposed new value x’ given the current value x is defined by

$\begin{cases} x\epsilon &\text{with probability } 1/2\\ x/\epsilon &\text{with probability } 1/2\end{cases}$

when $\epsilon$ is a random variable on $(-1,1)$. Thus, at each iteration, the current value is either shrunk or expanded by a random amount. When I read the paper in the Paris metro, while appreciating that this algorithm could be geometrically ergodic as opposed to the classical random walk algorithm, I was not convinced by the practical impact of the method, as I thought that the special role of zero in the proposal was not always reflected in the target. Especially when considering that the proposal is parameter-free. However, after running the following R program on a target centred away from zero, I found the proposal quite efficient.

f=function(x){.5*dnorm(x,mean=14)+.5*dnorm(x,mean=35)}
Nsim=10^5
x=rep(5,Nsim)
for (t in 2:Nsim){
coef=runif(1,min=-1)^sample(c(-1,1),1)
prop=x[t-1]*coef
prob=abs(coef)*f(prop)/f(x[t-1])
x[t]=x[t-1]
if (runif(1)<prob) x[t]=prop
}
hist(x,pro=T,nclass=113,col=”wheat2″)

Obviously, it is difficult to believe that this extension will keep working similarly well when the dimension increases but this is an interesting way of creating a heavy tail proposal.

## A repulsive random walk

Posted in R, Statistics with tags , , , , on May 28, 2010 by xi'an

Matt Asher posted an R experiment on R-bloggers yesterday simulating the random walk

$x_{t+1} = x_t + \varepsilon_t / x_t$

which has the property of avoiding zero by quickly switching to a large value as soon as $x_t$ is small. He was then wondering about the “convergence” of the random walk given that it moves very little once $x_t$ is large enough. The values he found for various horizons t seemed to indicate a stable regime.

I reran the same experiment as Matt in a Monte Carlo perspective, using the R program

resu=matrix(0,ncol=100,nrow=25)
sampl=rnorm(100)
for (i in 1:25){
for (t in 2^(i-1):2^i) sampl=sampl+rnorm(100)/sampl
resu[i,]=sampl
}
boxplot(as.data.frame(t(abs(resu))),name=as.character(1:25),col="wheat3")

The outcome of this R code plotted above shows that the range and the average of the 100 replications is increasing with t. This behaviour indicates a transient behaviour of the Markov chain, which almost surely goes to infinity and never comes back (because at infinity the variance is zero). Another indication for transience is shown by the fact that $x_t$ comes back to the interval (-1,1) with probability $\Phi(-|x_t|)$, a probability which goes to zero with $x_t$. As suggested to me by Randal Douc, this transience can be established rigorously by considering

$x_{t+1}^2 = x_t^2 + 2\epsilon_t + \epsilon_t^2/x_t^2 > x_t^2 + 2\epsilon_t>2\sum_{i=1}^t \epsilon_t$

which is thus bounded from below by a null recurrent process, which almost surely goes to infinity. Therefore the above Markov chain cannot have a stationary distribution or even a stationary measure: it almost surely goes to (plus or minus) infinity.

## Random mowing

Posted in Kids, Travel with tags , , , on April 27, 2010 by xi'an

My brother-in-law Christophe has a mowing robot (called Mara!) that is completely autonomous. It starts from its plug and goes along straight lines until it hits an obstacle or the boundaries defined by an underground cable. Then it turns at random and starts again. The concept is quite interesting but I am surprised that it operates on a purely local random walk principle, without any learning about the topology of the terrain. This means that zones with more obstacles are visited less often. (To compensate for that, the mower starts moving in circles each time it hits denser regions of grass.) And that the mower spends a lot of time looking for its plug. An algorithmic determination of the terrain would be possible, obviously, using for instance a Metropolis-Hastings scheme.