## scalable Langevin exact algorithm [Read Paper]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on June 23, 2020 by xi'an

Murray Pollock, Paul Fearnhead, Adam M. Johansen and Gareth O. Roberts (CoI: all with whom I have strong professional and personal connections!) have a Read Paper discussion happening tomorrow [under relaxed lockdown conditions in the UK, except for the absurd quatorzine on all travelers|, but still in a virtual format] that we discussed together [from our respective homes] at Paris Dauphine. And which I already discussed on this blog when it first came out.

Here are quotes I spotted during this virtual Dauphine discussion but we did not come up with enough material to build a significant discussion, although wondering at the potential for solving the O(n) bottleneck, handling doubly intractable cases like the Ising model. And noticing the nice features of the log target being estimable by unbiased estimators. And of using control variates, for once well-justified in a non-trivial environment.

“However, in practice this simple idea is unlikely to work. We can see this most clearly with the rejection sampler, as the probability of survival will decrease exponentially with t—and thus the rejection probability will often be prohibitively large.”

“This can be viewed as a rejection sampler to simulate from μ(x,t), the distribution of the Brownian motion at time  t conditional on its surviving to time t. Any realization that has been killed is ‘rejected’ and a realization that is not killed is a draw from μ(x,t). It is easy to construct an importance sampling version of this rejection sampler.”

## assessing MCMC convergence

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on June 6, 2019 by xi'an

When MCMC became mainstream in the 1990’s, there was a flurry of proposals to check, assess, and even guarantee convergence to the stationary distribution, as discussed in our MCMC book. Along with Chantal Guihenneuc and Kerrie Mengersen, we also maintained for a while a reviewww webpage categorising theses. Niloy Biswas and Pierre Jacob have recently posted a paper where they propose the use of couplings (and unbiased MCMC) towards deriving bounds on different metrics between the target and the current distribution of the Markov chain. Two chains are created from a given kernel and coupled with a lag of L, meaning that after a while, the two chains become one with a time difference of L. (The supplementary material contains many details on how to induce coupling.) The distance to the target can then be bounded by a sum of distances between the two chains until they merge. The above picture from the paper is a comparison a Polya-Urn sampler with several HMC samplers for a logistic target (not involving the Pima Indian dataset!). The larger the lag L the more accurate the bound. But the larger the lag the more expensive the assessment of how many steps are needed to convergence. Especially when considering that the evaluation requires restarting the chains from scratch and rerunning until they couple again, rather than continuing one run which can only brings the chain closer to stationarity and to being distributed from the target. I thus wonder at the possibility of some Rao-Blackwellisation of the simulations used in this assessment (while realising once more than assessing convergence almost inevitably requires another order of magnitude than convergence itself!). Without a clear idea of how to do it… For instance, keeping the values of the chain(s) at the time of coupling is not directly helpful to create a sample from the target since they are not distributed from that target.

[Pierre also wrote a blog post about the paper on Statisfaction that is definitely much clearer and pedagogical than the above.]

## Le Monde puzzle [#1088]

Posted in Books, Kids, R with tags , , , , , , , , on March 29, 2019 by xi'an

A board (Ising!) Le Monde mathematical puzzle in the optimisation mode, again:

On a 7×7 board, what is the maximal number of locations that one can occupy when imposing at least two empty neighbours ?

Which I tried to solve by brute force and simulated annealing (what else?!), first defining a target

targ=function(tabz){
sum(tabz[-c(1,9),-c(1,9)]-1.2*(tabz[-c(1,9),-c(1,9)]*tabz[-c(8,9),-c(1,9)]
+tabz[-c(1,9),-c(1,9)]*tabz[-c(1,2),-c(1,9)]
+tabz[-c(1,9),-c(1,9)]*tabz[-c(1,9),-c(8,9)]
+tabz[-c(1,9),-c(1,9)]*tabz[-c(1,9),-c(1,2)]>2))}


on a 9×9 board where I penalise prohibited configuration by a factor 1.2 (a wee bit more than empty nodes). The perimeter of the 9×9 board is filled with ones and never actualised. (In the above convoluted products, the goal is to count how many neighbours of the entries equal to one are also equal to one. More than 2 is penalised.) The simulated annealing move is then updating the 9×9 grid gridz:

temp=1
maxarg=curarg=targ(gridz)
for (t in 1:1e3){
for (v in 1:1e4){
i=sample(2:8,1);j=sample(2:8,1)
newgrid=gridz;newgrid[i,j]=1-gridz[i,j]
newarg=targ(newgrid)
if (log(runif(1))<temp*(newarg-curarg)){
gridz=newgrid;curarg=newarg}}
temp=temp+.01}


and calls to the procedure always return 28 entries as the optimum, as in

     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    0    1    0    1    0    1
[2,]    0    1    1    0    1    1    0
[3,]    1    1    0    1    0    1    1
[4,]    0    0    1    0    1    0    0
[5,]    1    1    0    1    0    1    1
[6,]    0    1    1    0    1    1    0
[7,]    1    0    1    0    1    0    1


As it happens, I had misread the wording of the original puzzle, which considered a dynamic placement of the units on the board, one at a time with two free neighbours imposed.

## Le Monde puzzle [#1087]

Posted in Books, Kids, R, Statistics with tags , , , , , on February 25, 2019 by xi'an

A board-like Le Monde mathematical puzzle in the digit category:

Given a (k,m) binary matrix, what is the maximum number S of entries with only one neighbour equal to one? Solve for k=m=2,…,13, and k=6,m=8.

For instance, for k=m=2, the matrix

$\begin{matrix} 0 &0\\ 1 &1\\ \end{matrix}$

is producing the maximal number 4. I first attempted a brute force random filling of these matrices with only a few steps of explorations and got the numbers 4,8,16,34,44,57,… for the first cases. Since I was convinced that the square k² of a number k previously exhibited to reach its maximum as S=k² was again perfect in this regard, I then tried another approach based on Gibbs sampling and annealing (what else?):

gibzit=function(k,m,A=1e2,N=1e2){
temp=1 #temperature
board=sole=matrix(sample(c(0,1),(k+2)*(m+2),rep=TRUE),k+2,m+2)
board[1,]=board[k+2,]=board[,1]=board[,m+2]=0 #boundaries
maxol=counter(board,k,m) #how many one-neighbours?
for (t in 1:A){#annealing
for (r in 1:N){#basic gibbs steps
for (i in 2:(k+1))
for (j in 2:(m+1)){
prop=board
prop[i,j]=1-board[i,j]
u=runif(1)
if (log(u/(1-u))<temp*(counter(prop,k,m)-val)){
board=prop;val=counter(prop,k,m)
if (val>maxol){
maxol=val;sole=board}}
}}
temp=temp*2}
return(sole[-c(1,k+2),-c(1,m+2)])}


which leads systematically to the optimal solution, namely a perfect square k² when k is even and a perfect but one k²-1 when k is odd. When k=6, m=8, all entries can afford one neighbour exactly,

> gibzbbgiz(6,8)
[1] 48
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    0    0    1    1    0    0    1
[2,]    1    0    0    0    0    0    0    1
[3,]    0    0    1    0    0    1    0    0
[4,]    0    0    1    0    0    1    0    0
[5,]    1    0    0    0    0    0    0    1
[6,]    1    0    0    1    1    0    0    1


but this does not seem feasible when k=6, m=7, which only achieves 40 entries with one single neighbour.

## Bayesian inference with intractable normalizing functions

Posted in Books, Statistics with tags , , , , , , , , , , , on December 13, 2018 by xi'an

In the latest September issue of JASA I received a few days ago, I spotted a review paper by Jaewoo Park & Murali Haran on intractable normalising constants Z(θ). There have been many proposals for solving this problem as well as several surveys, some conferences and even a book. The current survey focus on MCMC solutions, from auxiliary variable approaches to likelihood approximation algorithms (albeit without ABC entries, even though the 2006 auxiliary variable solutions of Møller et al. et of Murray et al. do simulate pseudo-observations and hence…). This includes the MCMC approximations to auxiliary sampling proposed by Faming Liang and co-authors across several papers. And the paper Yves Atchadé, Nicolas Lartillot and I wrote ten years ago on an adaptive MCMC targeting Z(θ) and using stochastic approximation à la Wang-Landau. Park & Haran stress the relevance of using sufficient statistics in this approach towards fighting computational costs, which makes me wonder if an ABC version could be envisioned.  The paper also includes pseudo-marginal techniques like Russian Roulette (once spelled Roullette) and noisy MCMC as proposed in Alquier et al.  (2016). These methods are compared on three examples: (1) the Ising model, (2) a social network model, the Florentine business dataset used in our original paper, and a larger one where most methods prove too costly, and (3) an attraction-repulsion point process model. In conclusion, an interesting survey, taking care to spell out the calibration requirements and the theoretical validation, if of course depending on the chosen benchmarks.