## informed proposals for local MCMC in discrete spaces

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , on April 17, 2020 by xi'an

Last year Giacomo Zanella published a paper entitled informed proposals for local MCMC in discrete spaces in JASA. Which I had missed somehow and only discovered through another paper, and which we recently discussed at Paris-Dauphine with graduate students, marooned by COVID-19 . Probability targets in discrete spaces are intrinsically hard[er] to simulate in my opinion if only because there is no natural distance, hence no natural neighbourhood. A random walk proposal like the reference kernel in the paper is not directly calibrated. Without demarginalisation there is neither a clear version of calculus for implementing MALA or HMC. What indeed is HMC on a discrete space? If this requires “embedding the binary space in a continuous space”, it does not sound very enticing if the construct is context dependent.

“This would allow for more moves to be accepted and longer moves to be performed, thus improving the algorithm’s efficiency.”

A interesting aspect of the paper is that for near atomic transition kernels K, informally for small σ’s, the proposal switch to Q finds target x normalising constant as new stationary and close to the actual target. Which incidentally reminded me of our vanilla Rao-Blackwellisation with Randal Douc. This however begets the worry that it may prove unwieldy in continuous cases, as except for Gaussian kernels, the  proposal switch to Q may prove intractable and requires further MCMC steps, in a form of infinite regress. Plus a musing that, were the original kernel K to be replaced with the new Q, another informed proposal transform could be applied to Q. Further infinite regress…

“[The optimality of the Metropolis-Hastings choice of acceptance probability] does not translate to the context of balancing functions.”

The paper indeed exhibits a setting that is rehabilitating Barker’ (1965) version of the acceptance probability, but I never  was very much convinced there was a significant difference in using one or the other. During our virtual (?) discussion, we also wondered at the adaptive abilities of the approach, e.g., selecting among a finite family of g’s (according to which criterion) or parameterising g towards an optimal choice of its parameter. And at the capacity for Rao-Blackwellisation since the proposal have to consider the entire set of neighbours prior to moving to a likely one.

## generalised Poisson difference autoregressive processes

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , on February 14, 2020 by xi'an

Yesterday, Giulia Carallo arXived the paper on generalised Poisson difference autoregressive processes that is a component of her Ph.D. thesis at Ca’ Foscari Universita di Venezia and to which I contributed while visiting Venezia last Spring. The stochastic process under study is integer valued as a difference of two generalised Poisson variates, made dependent by an INGARCH process that expresses the mean as a regression over past values of the process and past means. Which can be easily simulated as a difference of (correlated) Poisson variates. These two variates can in their turn be (re)defined through a thinning operator that I find most compelling, namely as a sum of Poisson variates with a number of terms being a (quasi-) Binomial variate depending on the previous value. This representation proves useful in establishing stationarity conditions on the process. Beyond establishing various properties of the process, the paper also examines how to conduct Bayesian inference in this context, with specialised Gibbs samplers in action. And comparing models on real datasets via Geyer‘s (1994) logistic approximation to Bayes factors.

## ABC with Gibbs steps

Posted in Statistics with tags , , , , , , , , , , , , , , , , , on June 3, 2019 by xi'an

With Grégoire Clarté, Robin Ryder and Julien Stoehr, all from Paris-Dauphine, we have just arXived a paper on the specifics of ABC-Gibbs, which is a version of ABC where the generic ABC accept-reject step is replaced by a sequence of n conditional ABC accept-reject steps, each aiming at an ABC version of a conditional distribution extracted from the joint and intractable target. Hence an ABC version of the standard Gibbs sampler. What makes it so special is that each conditional can (and should) be conditioning on a different statistic in order to decrease the dimension of this statistic, ideally down to the dimension of the corresponding component of the parameter. This successfully bypasses the curse of dimensionality but immediately meets with two difficulties. The first one is that the resulting sequence of conditionals is not coherent, since it is not a Gibbs sampler on the ABC target. The conditionals are thus incompatible and therefore convergence of the associated Markov chain becomes an issue. We produce sufficient conditions for the Gibbs sampler to converge to a stationary distribution using incompatible conditionals. The second problem is then that, provided it exists, the limiting and also intractable distribution does not enjoy a Bayesian interpretation, hence may fail to be justified from an inferential viewpoint. We however succeed in producing a version of ABC-Gibbs in a hierarchical model where the limiting distribution can be explicited and even better can be weighted towards recovering the original target. (At least with limiting zero tolerance.)

## Gibbs for incompatible kids

Posted in Books, Statistics, University life with tags , , , , , , , , , , on September 27, 2018 by xi'an

In continuation of my earlier post on Bayesian GANs, which resort to strongly incompatible conditionals, I read a 2015 paper of Chen and Ip that I had missed. (Published in the Journal of Statistical Computation and Simulation which I first confused with JCGS and which I do not know at all. Actually, when looking at its editorial board,  I recognised only one name.) But the study therein is quite disappointing and not helping as it considers Markov chains on finite state spaces, meaning that the transition distributions are matrices, meaning also that convergence is ensured if these matrices have no null probability term. And while the paper is motivated by realistic situations where incompatible conditionals can reasonably appear, the paper only produces illustrations on two and three states Markov chains. Not that helpful, in the end… The game is still afoot!

## flea circus

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , , , , , on December 8, 2016 by xi'an

An old riddle found on X validated asking for Monte Carlo resolution  but originally given on Project Euler:

A 30×30 grid of squares contains 30² fleas, initially one flea per square. When a bell is rung, each flea jumps to an adjacent square at random. What is the expected number of unoccupied squares after 50 bell rings, up to six decimal places?

The debate on X validated is whether or not a Monte Carlo resolution is feasible. Up to six decimals, certainly not. But with some lower precision, certainly. Here is a rather basic R code where the 50 steps are operated on the 900 squares, rather than the 900 fleas. This saves some time by avoiding empty squares.

```xprmt=function(n=10,T=50){

mean=0
for (t in 1:n){

board=rep(1,900)
for (v in 1:T){

beard=rep(0,900)
if (board[1]>0){
poz=c(0,1,0,30)
ne=rmultinom(1,board[1],prob=(poz!=0))
beard[1+poz]=beard[1+poz]+ne}
#
for (i in (2:899)[board[-1][-899]>0]){
u=(i-1)%%30+1;v=(i-1)%/%30+1
poz=c(-(u>1),(u<30),-30*(v>1),30*(v<30))
ne=rmultinom(1,board[i],prob=(poz!=0))
beard[i+poz]=beard[i+poz]+ne}
#
if (board[900]>0){
poz=c(-1,0,-30,0)
ne=rmultinom(1,board[900],prob=(poz!=0))
beard[900+poz]=beard[900+poz]+ne}
board=beard}
mean=mean+sum(board==0)}
return(mean/n)}
```

The function returns an empirical average over n replications. With a presumably awkward approach to the borderline squares, since it involves adding zeros to keep the structure the same… Nonetheless, it produces an approximation that is rather close to the approximate expected value, in about 3mn on my laptop.

```> exprmt(n=1e3)
[1] 331.082
> 900/exp(1)
[1] 331.0915
```

Further gains follow from considering only half of the squares, as there are two independent processes acting in parallel. I looked at an alternative and much faster approach using the stationary distribution, with the stationary being the Multinomial (450,(2/1740,3/1740…,4/1740,…,2/1740)) with probabilities proportional to 2 in the corner, 3 on the sides, and 4 in the inside. (The process, strictly speaking, has no stationary distribution, since it is periodic. But one can consider instead the subprocess indexed by even times.) This seems to be the case, though, when looking at the occupancy frequencies, after defining the stationary as:

```inva=function(B=30){
return(c(2,rep(3,B-2),2,rep(c(3,
rep(4,B-2),3),B-2),2,rep(3,B-2),2))}
```

namely

```> mn=0;n=1e8 #14 clock hours!
> proz=rep(c(rep(c(0,1),15),rep(c(1,0),15)),15)*inva(30)
> for (t in 1:n)
+ mn=mn+table(rmultinom(1,450,prob=rep(1,450)))[1:4]
> mn=mn/n
> mn[1]=mn[1]-450
> mn
0      1      2     3
166.11 164.92  82.56 27.71
> exprmt(n=1e6) #55 clock hours!!
[1] 165.36 165.69 82.92 27.57```

my original confusion being that the Poisson approximation had not yet taken over… (Of course, computing the first frequency for the stationary distribution does not require any simulation, since it is the sum of the complement probabilities to the power 450, i.e., 166.1069.)