## Archive for auxiliary variables

Posted in Books, Statistics, University life with tags , , , , , , , , , , on October 27, 2016 by xi'an

In the March 2016 issue of JASA that currently sits on my desk, there is a paper by Liang, Jim, Song and Liu on the adaptive exchange algorithm, which aims at handling posteriors for sampling distributions with intractable normalising constants. The concept behind the algorithm is the exchange principle initiated by Jesper Møller and co-authors in 2006, where an auxiliary pseudo-observation is simulated for the missing constants to vanish in a Metropolis-Hastings ratio. (The name exchangeable was introduced in a subsequent paper by Iain Murray, Zoubin Ghahramani and David MacKay, also in 2006.)

The crux of the method is to run an iteration as [where y denotes the observation]

1. Proposing a new value θ’ of the parameter from a proposal q(θ’|θ);
2. Generate a pseudo-observation z~ƒ(z|θ’);
3. Accept with probability

$\dfrac{\pi(\theta')f(y|\theta')}{\pi(\theta)f(y|\theta)}\dfrac{q(\theta|\theta')f(z|\theta)}{q(\theta'|\theta)f(z|\theta')}$

which has the appeal to cancel all normalising constants. And the repeal of requiring an exact simulation from the very distribution with the missing constant, ƒ(.|θ). Which means that in practice a finite number of MCMC steps will be used and will bias the outcome. The algorithm is unusual in that it replaces the exact proposal q(θ’|θ) with an unbiased random version q(θ’|θ)ƒ(z|θ’), z being just an augmentation of the proposal. (The current JASA paper by Liang et al. seems to confuse augment and argument, see p.378.)

To avoid the difficulty in simulating from ƒ(.|θ), the authors draw pseudo-observations from sampling distributions with a finite number m of parameter values under the [unrealistic] assumption (A⁰) that this collection of values provides an almost complete cover of the posterior support. One of the tricks stands with an auxiliary [time-heterogeneous] chain of pseudo-observations generated by single Metropolis steps from one of these m fixed targets. These pseudo-observations are then used in the main (or target) chain to define the above exchange probability. The auxiliary chain is Markov but time-heterogeneous since the probabilities of accepting a move are evolving with time according to a simulated annealing schedule. Which produces a convergent estimate of the m normalising constants. The main chain is not Markov in that it depends on the whole history of the auxiliary chain [see Step 5, p.380]. Even jointly the collection of both chains is not Markov. The paper prefers to consider the process as an adaptive Markov chain. I did not check the rather intricate in details, so cannot judge of the validity of the overall algorithm; I simply note that one condition (A², p.383) is incredibly strong in that it assumes the Markov transition kernel to be Doeblin uniformly on any compact set of the calibration parameters. However, the major difficulty with this approach seems to be in its delicate calibration. From providing a reference set of m parameter values scanning the posterior support to picking transition kernels on both the parameter and the sample spaces, to properly cooling the annealing schedule [always a fun part!], there seems to be [from my armchair expert’s perspective, of course!] a wide range of opportunities for missing the target or running into zero acceptance problems. Both examples analysed in the paper, the auto-logistic and the auto-normal models, are actually of limited complexity in that they depend on a few parameters, 2 and 4 resp., and enjoy sufficient statistics, of dimensions 2 and 4 as well. Hence simulating (pseudo-)realisations of those sufficient statistics should be less challenging than the original approach replicating an entire vector of thousands of dimensions.

## common derivation for Metropolis–Hastings and other MCMC algorithms

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on July 25, 2016 by xi'an

Khoa Tran and Robert Kohn from UNSW just arXived a paper on a comprehensive derivation of a large range of MCMC algorithms, beyond Metropolis-Hastings. The idea is to decompose the MCMC move into

1. a random completion of the current value θ into V;
2. a deterministic move T from (θ,V) to (ξ,W), where only ξ matters.

If this sounds like a new version of Peter Green’s completion at the core of his 1995 RJMCMC algorithm, it is because it is indeed essentially the same notion. The resort to this completion allows for a standard form of the Metropolis-Hastings algorithm, which leads to the correct stationary distribution if T is self-inverse. This representation covers Metropolis-Hastings algorithms, Gibbs sampling, Metropolis-within-Gibbs and auxiliary variables methods, slice sampling, recursive proposals, directional sampling, Langevin and Hamiltonian Monte Carlo, NUTS sampling, pseudo-marginal Metropolis-Hastings algorithms, and pseudo-marginal Hamiltonian  Monte Carlo, as discussed by the authors. Given this representation of the Markov chain through a random transform, I wonder if Peter Glynn’s trick mentioned in the previous post on retrospective Monte Carlo applies in this generic setting (as it could considerably improve convergence…)

## recents advances in Monte Carlo Methods

Posted in R, Statistics, Travel, University life with tags , , , , , , , , , , , on February 8, 2012 by xi'an

Next Thursday (Feb. 16), at the RSS, there will be a special half-day meeting (afternoon, starting at 13:30) on Recent Advances in Monte Carlo Methods organised by the General Application Section. The speakers are

• Richard Everitt, University of Oxford, Missing data, and what to do about it
• Anthony Lee, Warwick University, Auxiliary variables and many-core computation
• Nicolas Kantas, Imperial College London, Particle methods for computing optimal control inputs
• Nick Whitely, Bristol University, Stability properties of some particle filters
• Simon Maskell, QinetiQ & Imperial College London, Using a Probabilistic Hypothesis Density filter to confirm tracks in a multi-target environment

(Note this is not a Read Paper meeting, so there is no paper nor discussion!)

## ABC and Monte Carlo seminar in CREST

Posted in Statistics, University life with tags , , , , , , , on January 13, 2012 by xi'an

On Monday (Jan. 16, 3pm, CRESTENSAE, Room S08), Nicolas Chopin will present a talk on:

Dealing with intractability: recent advances in Bayesian Monte-Carlo methods for intractable likelihoods
(joint works with P. Jacob, O. Papaspiliopoulos and S. Barthelmé)

This talk will start with a review of recent advancements in Monte Carlo methodology for intractable problems; that is problems involving intractable quantities, typically intractable likelihoods. I will discuss in turn ABC type methods (a.k.a. likelihood-free), auxiliary variable methods for dealing with intractable normalising constants (e.g. the exchange algorithm), and MC² type of algorithms, a recent extension of which being the PMCMC algorithm (Andrieu et al., 2010). Then, I will present two recent pieces of work in these direction. First, and more briefly briefly, I’ll present the ABC-EP algorithm (Chopin and Barthelmé, 2011). I’ll also discuss some possible future research in ABC theory. Second, I’ll discuss the SMC² algorithm (Chopin, Jacob and Papaspiliopoulos, 2011), a new type of MC² algorithm that makes it possible to perform sequential analysis for virtually any state-space models, including models with an intractable Markov transition.

## advanced Markov chain Monte Carlo methods

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on December 5, 2011 by xi'an

This book, Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples, by Faming Liang, Chuanhai Liu, and Raymond Carroll, appeared last year and has been sitting on my desk all this time, patiently (?) waiting for a review. When I received it, I took a brief look at it (further than the cool cover!) and then decided I needed more than that to write a useful review! Here are my impressions  on Advanced Markov Chain Monte Carlo Methods after a deeper read. (I have not read any other review in the main statistical journals so far.)

The title, Advanced Markov Chain Monte Carlo Methods, is a clear warning on the level of the book: “advanced”, it certainly is!!! By page 85, the general description of MCMC simulation methods is completed, including perfect sampling and reversible jump MCMC, and the authors engage into a detailed description of highly specialised topics of their choice: Auxiliary variables (Chap. 4), Population-based MCMC (Chap. 5), Dynamic weighting (Chap. 6), Stochastic approximation Monte Carlo (Chap. 7), and MCMC with adaptive proposals (Chap. 8).  The book is clearly inspired by the numerous papers the authors have written in those area, especially Faming Liang. (The uneven distribution of the number of citations per year with peaks in 2000 and 2009 reflects this strong connection.) While the book attempts at broadening the spectrum by including introductory sections, and discussing other papers, it remains nonetheless that this centred focus of the book reduces its potential readership to graduate students and researchers who could directly work on the original papers. I would thus hesitate in teaching my graduate students from this book, given that they only attend a single course on Monte Carlo methods. Continue reading

## Xiao-Li Meng’s inception [in Paris]

Posted in Statistics, University life with tags , , , , , on July 27, 2011 by xi'an

Xiao-Li Meng will give a talk in Paris next September 1st, so I advertise it now, before my Parisian readers leave the city for their August retreat. Here is the abstract, explaining the above title:

Statistical Inception for the MCMC Dream: The kick is in the residual (augmentation)!

Xiao-Li Meng

Department of Statistics, Harvard University

The development of MCMC algorithms via data augmentation (DA) or equivalently auxiliary variables has some resemblance to the theme plot of the recent Hollywood hit Inception. We MCMC designers all share essentially the same “3S” dream, that is, to create algorithms that are simple, stable, and speedy. Within that grand dream, however, we have created a rather complex web of tools, with some of them producing very similar algorithms but for unclear reasons, or others that were thought to be of different origins but actually are layered when viewed from a suitable distance. These include conditional augmentation, marginal augmentation, PX-DA, partially non-centering parameterization, sandwiched algorithms, interweaving strategies, ASIS, etc. It turns out that there is a simple statistical insight that can unify essentially all these methods conceptually, and it also provides practical guidelines for their DA constructions. It is the simple concept of regression residuals, which are constructed to be orthogonal to the regression functions. All these methods in one form or another effectively build a residual augmentation. Given a DA distribution f(T, A), where T is our targeted variable (i.e., f(T) is our targeted distribution) and A is the augmented variable, there are two broad classes of residuals depending on whether we regress T on A or A on T. In this talk we will demonstrate how methods like conditional augmentation and partially non-centering parameterization build their residual augmentations by regressing A on T, whereas methods such as marginal augmentation and ASIS effectively use residual augmentations from regressing T on A. For either class, the attempted orthogonality helps to reduce the dependence among MCMC draws, and when the orthogonality leads to true independence as occurring in some special cases, we reach the dream of producing i.i.d. draws. (The talk is based on an upcoming discussion article, especially its rejoinder, Yu and Meng (2011, JCGS) )

The talk will take place at Institut Henri Poincaré, Thursday Sept. 1, at 15:00, as part of the Big’MC seminars.

## València 9 snapshot [4]

Posted in Statistics, University life with tags , , , , , , , , on June 8, 2010 by xi'an

This one-before-last day at València 9 was fairly busy and I skipped the [tantalising] trip back to Sella to attend morning and afternoon talks. The first session involved Nicolas Chopin and Pierre Jacob’s free-energy paper whose earlier version I had heard at CREST, which builds on the earlier paper of Nicolas with Tony Lelièvre and Gabriel Stoltz to build a sequential Monte Carlo sampler that is biased along a preferential direction in order to fight multimodality and label switching in the case of mixtures. Peter Green rightly pointed out the difficulty in building this direction, which appears like a principal component to me, but this may open a new direction for research on a potentially adaptive direction updated with the SMC sampler… Although I always have trouble understanding the gist of causal models, Thomas Richardson’s talk about transparent parameterisation was quite interesting  in its links both with contingency tables and with identifiability issues (should Bayesians care about identifiability?! I did not really understand why the data could help in specifying the unidentified parameter in an empirical Bayes manner, though).

The morning talk by Darren Wilkinson was a particularly enticing talk in that Darren presented in a very articulate manner the specifics of analysing stochastic kinetic models for bacterial regulation and that he also introduced a likelihood-free MCMC that was not ABC-MCMC. (At first sight, it sounds like the auxiliary variable technique of Møller, Pettit, Reeves and Berthelsen, but I want to read the paper to understand better the differences.) Despite the appalling audio and video rendering in the conference room, the filmed discussion by Samuel Kou got into a comparison with ABC. The afternoon non-parametric session left me a bit confused as to the infinite regress on Dirichlet process expansions, but I enjoyed the next talk by Geoff Nicholls on partial ordering inference immensely, even though I missed the bishop example at the beginning because the talks got drifted due to the absence of the first speaker of the session. During the poster session (where again I only saw a fourth of the material!), I had the pleasant surprise to meet a student from the University of Canterbury, Christchurch, who took my Bayesian Core class when I visited in 2006.