## informed proposals for local MCMC in discrete spaces

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , on April 17, 2020 by xi'an

Last year Giacomo Zanella published a paper entitled informed proposals for local MCMC in discrete spaces in JASA. Which I had missed somehow and only discovered through another paper, and which we recently discussed at Paris-Dauphine with graduate students, marooned by COVID-19 . Probability targets in discrete spaces are intrinsically hard[er] to simulate in my opinion if only because there is no natural distance, hence no natural neighbourhood. A random walk proposal like the reference kernel in the paper is not directly calibrated. Without demarginalisation there is neither a clear version of calculus for implementing MALA or HMC. What indeed is HMC on a discrete space? If this requires “embedding the binary space in a continuous space”, it does not sound very enticing if the construct is context dependent.

“This would allow for more moves to be accepted and longer moves to be performed, thus improving the algorithm’s efficiency.”

A interesting aspect of the paper is that for near atomic transition kernels K, informally for small σ’s, the proposal switch to Q finds target x normalising constant as new stationary and close to the actual target. Which incidentally reminded me of our vanilla Rao-Blackwellisation with Randal Douc. This however begets the worry that it may prove unwieldy in continuous cases, as except for Gaussian kernels, the  proposal switch to Q may prove intractable and requires further MCMC steps, in a form of infinite regress. Plus a musing that, were the original kernel K to be replaced with the new Q, another informed proposal transform could be applied to Q. Further infinite regress…

“[The optimality of the Metropolis-Hastings choice of acceptance probability] does not translate to the context of balancing functions.”

The paper indeed exhibits a setting that is rehabilitating Barker’ (1965) version of the acceptance probability, but I never  was very much convinced there was a significant difference in using one or the other. During our virtual (?) discussion, we also wondered at the adaptive abilities of the approach, e.g., selecting among a finite family of g’s (according to which criterion) or parameterising g towards an optimal choice of its parameter. And at the capacity for Rao-Blackwellisation since the proposal have to consider the entire set of neighbours prior to moving to a likely one.

## optimal choice among MCMC kernels

Posted in Statistics with tags , , , , , , , , , , on March 14, 2019 by xi'an

Last week in Siem Reap, Florian Maire [who I discovered originates from a Norman town less than 10km from my hometown!] presented an arXived joint work with Pierre Vandekerkhove at the Data Science & Finance conference in Cambodia that considers the following problem: Given a large collection of MCMC kernels, how to pick the best one and how to define what best means. Going by mixtures is a default exploration of the collection, as shown in (Tierney) 1994 for instance since this improves on both kernels (esp. when each kernel is not irreducible on its own!). This paper considers a move to local weights in the mixture, weights that are not estimated from earlier simulations, contrary to what I first understood.

As made clearer in the paper the focus is on filamentary distributions that are concentrated nearby lower-dimension sets or manifolds Since then the components of the kernel collections can be restricted to directions of these manifolds… Including an interesting case of a 2-D highly peaked target where converging means mostly simulating in x¹ and covering the target means mostly simulating in x². Exhibiting a schizophrenic tension between the two goals. Weight locally dependent means correction by Metropolis step, with cost O(n). What of Rao-Blackwellisation of these mixture weights, from weight x transition to full mixture, as in our PMC paper? Unclear to me as well [during the talk] is the use in the mixture of basic Metropolis kernels, which are not absolutely continuous, because of the Dirac mass component. But this is clarified by Section 5 in the paper. A surprising result from the paper (Corollary 1) is that the use of local weights ω(i,x) that depend on the current value of the chain does jeopardize the stationary measure π(.) of the mixture chain. Which may be due to the fact that all components of the mixture are already π-invariant. Or that the index of the kernel constitutes an auxiliary (if ancillary)  variate. (Algorithm 1 in the paper reminds me of delayed acceptance. Making me wonder if computing time should be accounted for.) A final question I briefly discussed with Florian is the extension to weights that are automatically constructed from the simulations and the target.

## A chance non-typo

Posted in Books, Statistics, University life with tags , , on March 18, 2011 by xi'an

A few days ago, my colleague Medhi Dafal in the admin branch of the university came across Monte Carlo Statistical Methods on another colleague’s desk and took it with him. Yesterday morning, when I came to his office to burden him with another admin nightmare, he opened the book [at random] and pointed out a formula pretending he had trouble with it! This was the residual distribution in the Markov chapter

$\tilde K(x,\cdot)=\dfrac{K(x,\cdot) - \epsilon\mathbb{I}_C(x)\nu(\cdot)}{1-\epsilon\mathbb{I}_C(x)}$

When I first saw it, I though this was indeed a typo because of the indicator in the denominator. Now that I look at it more carefully, I realise this is correct, because C is a small set, hence the residual kernel is the standard kernel outside C. Still, for a few minutes, I had this weird impression he had found a new typo completely at random! Which would have been a significant item of information about the frequency of typos in this book… Phew!