## recents advances in Monte Carlo Methods

Next Thursday (Feb. 16), at the RSS, there will be a special half-day meeting (afternoon, starting at 13:30) on Recent Advances in Monte Carlo Methods organised by the General Application Section. The speakers are

• Richard Everitt, University of Oxford, Missing data, and what to do about it
• Anthony Lee, Warwick University, Auxiliary variables and many-core computation
• Nicolas Kantas, Imperial College London, Particle methods for computing optimal control inputs
• Nick Whitely, Bristol University, Stability properties of some particle filters
• Simon Maskell, QinetiQ & Imperial College London, Using a Probabilistic Hypothesis Density filter to confirm tracks in a multi-target environment

(Note this is not a Read Paper meeting, so there is no paper nor discussion!)

## ABC and Monte Carlo seminar in CREST

On Monday (Jan. 16, 3pm, CREST-ENSAE, Room S08), Nicolas Chopin will present a talk on:

Dealing with intractability: recent advances in Bayesian Monte-Carlo methods for intractable likelihoods
(joint works with P. Jacob, O. Papaspiliopoulos and S. Barthelmé)

This talk will start with a review of recent advancements in Monte Carlo methodology for intractable problems; that is problems involving intractable quantities, typically intractable likelihoods. I will discuss in turn ABC type methods (a.k.a. likelihood-free), auxiliary variable methods for dealing with intractable normalising constants (e.g. the exchange algorithm), and MC² type of algorithms, a recent extension of which being the PMCMC algorithm (Andrieu et al., 2010). Then, I will present two recent pieces of work in these direction. First, and more briefly briefly, I’ll present the ABC-EP algorithm (Chopin and Barthelmé, 2011). I’ll also discuss some possible future research in ABC theory. Second, I’ll discuss the SMC² algorithm (Chopin, Jacob and Papaspiliopoulos, 2011), a new type of MC² algorithm that makes it possible to perform sequential analysis for virtually any state-space models, including models with an intractable Markov transition.

## advanced Markov chain Monte Carlo methods

This book, Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples, by Faming Liang, Chuanhai Liu, and Raymond Carroll, appeared last year and has been sitting on my desk all this time, patiently (?) waiting for a review. When I received it, I took a brief look at it (further than the cool cover!) and then decided I needed more than that to write a useful review! Here are my impressions  on Advanced Markov Chain Monte Carlo Methods after a deeper read. (I have not read any other review in the main statistical journals so far.)

The title, Advanced Markov Chain Monte Carlo Methods, is a clear warning on the level of the book: “advanced”, it certainly is!!! By page 85, the general description of MCMC simulation methods is completed, including perfect sampling and reversible jump MCMC, and the authors engage into a detailed description of highly specialised topics of their choice: Auxiliary variables (Chap. 4), Population-based MCMC (Chap. 5), Dynamic weighting (Chap. 6), Stochastic approximation Monte Carlo (Chap. 7), and MCMC with adaptive proposals (Chap. 8).  The book is clearly inspired by the numerous papers the authors have written in those area, especially Faming Liang. (The uneven distribution of the number of citations per year with peaks in 2000 and 2009 reflects this strong connection.) While the book attempts at broadening the spectrum by including introductory sections, and discussing other papers, it remains nonetheless that this centred focus of the book reduces its potential readership to graduate students and researchers who could directly work on the original papers. I would thus hesitate in teaching my graduate students from this book, given that they only attend a single course on Monte Carlo methods. Continue reading

## Xiao-Li Meng’s inception [in Paris]

Xiao-Li Meng will give a talk in Paris next September 1st, so I advertise it now, before my Parisian readers leave the city for their August retreat. Here is the abstract, explaining the above title:

Statistical Inception for the MCMC Dream: The kick is in the residual (augmentation)!

Xiao-Li Meng

Department of Statistics, Harvard University

The development of MCMC algorithms via data augmentation (DA) or equivalently auxiliary variables has some resemblance to the theme plot of the recent Hollywood hit Inception. We MCMC designers all share essentially the same “3S” dream, that is, to create algorithms that are simple, stable, and speedy. Within that grand dream, however, we have created a rather complex web of tools, with some of them producing very similar algorithms but for unclear reasons, or others that were thought to be of different origins but actually are layered when viewed from a suitable distance. These include conditional augmentation, marginal augmentation, PX-DA, partially non-centering parameterization, sandwiched algorithms, interweaving strategies, ASIS, etc. It turns out that there is a simple statistical insight that can unify essentially all these methods conceptually, and it also provides practical guidelines for their DA constructions. It is the simple concept of regression residuals, which are constructed to be orthogonal to the regression functions. All these methods in one form or another effectively build a residual augmentation. Given a DA distribution f(T, A), where T is our targeted variable (i.e., f(T) is our targeted distribution) and A is the augmented variable, there are two broad classes of residuals depending on whether we regress T on A or A on T. In this talk we will demonstrate how methods like conditional augmentation and partially non-centering parameterization build their residual augmentations by regressing A on T, whereas methods such as marginal augmentation and ASIS effectively use residual augmentations from regressing T on A. For either class, the attempted orthogonality helps to reduce the dependence among MCMC draws, and when the orthogonality leads to true independence as occurring in some special cases, we reach the dream of producing i.i.d. draws. (The talk is based on an upcoming discussion article, especially its rejoinder, Yu and Meng (2011, JCGS) )

The talk will take place at Institut Henri Poincaré, Thursday Sept. 1, at 15:00, as part of the Big’MC seminars.

## València 9 snapshot [4]

This one-before-last day at València 9 was fairly busy and I skipped the [tantalising] trip back to Sella to attend morning and afternoon talks. The first session involved Nicolas Chopin and Pierre Jacob’s free-energy paper whose earlier version I had heard at CREST, which builds on the earlier paper of Nicolas with Tony Lelièvre and Gabriel Stoltz to build a sequential Monte Carlo sampler that is biased along a preferential direction in order to fight multimodality and label switching in the case of mixtures. Peter Green rightly pointed out the difficulty in building this direction, which appears like a principal component to me, but this may open a new direction for research on a potentially adaptive direction updated with the SMC sampler… Although I always have trouble understanding the gist of causal models, Thomas Richardson’s talk about transparent parameterisation was quite interesting  in its links both with contingency tables and with identifiability issues (should Bayesians care about identifiability?! I did not really understand why the data could help in specifying the unidentified parameter in an empirical Bayes manner, though).

The morning talk by Darren Wilkinson was a particularly enticing talk in that Darren presented in a very articulate manner the specifics of analysing stochastic kinetic models for bacterial regulation and that he also introduced a likelihood-free MCMC that was not ABC-MCMC. (At first sight, it sounds like the auxiliary variable technique of Møller, Pettit, Reeves and Berthelsen, but I want to read the paper to understand better the differences.) Despite the appalling audio and video rendering in the conference room, the filmed discussion by Samuel Kou got into a comparison with ABC. The afternoon non-parametric session left me a bit confused as to the infinite regress on Dirichlet process expansions, but I enjoyed the next talk by Geoff Nicholls on partial ordering inference immensely, even though I missed the bishop example at the beginning because the talks got drifted due to the absence of the first speaker of the session. During the poster session (where again I only saw a fourth of the material!), I had the pleasant surprise to meet a student from the University of Canterbury, Christchurch, who took my Bayesian Core class when I visited in 2006.

## Confusing slice sampler

Most embarrassingly, Liaosa Xu from Virginia Tech sent the following email almost a month ago and I forgot to reply:

I have a question regarding your example 7.11 in your book Introducing Monte Carlo Methods with R.  To further decompose the uniform simulation by sampling a and b step by step, how you determine the upper bound for sampling of a? I don’t know why, for all y(i)=0, we need a+bx(i)>- log(u(i)/(1-u(i))).  It seems that for y(i)=0, we get 0>log(u(i)/(1-u(i))).  Thanks a lot for your clarification.

There is nothing wrong with our resolution of the logit simulation problem but I acknowledge the way we wrote it is most confusing! Especially when switching from $(\alpha,\beta)$ to $(a,b)$ in the middle of the example….

Starting with the likelihood/posterior

$L(\alpha, \beta | \mathbf{y}) \propto \prod_{i=1}^n \left(\dfrac{e^{ \alpha +\beta x_i }}{1 + e^{ \alpha +\beta x_i }}\right)^{y_i}\left(\dfrac{1}{1 + e^{ \alpha +\beta x_i }}\right)^{1-y_i}$

we use slice sampling to replace each logistic expression with an indicator involving a uniform auxiliary variable

$U_i \sim \mathcal{U}\left( 0,\dfrac{e^{ y_i(\alpha +\beta x_i) }}{1 + e^{ \alpha +\beta x_i }} \right)$

[which is the first formula at the top of page 220.] Now, when considering the joint distribution of

$(\alpha,\beta,u_1,...,u_n)$,

we only get a product of indicators. Either indicators that

$u_i<\text{logit}(\alpha+\beta x_i)$ or of $u_i<1-\text{logit}(\alpha+\beta x_i)$,

depending on whether yi=1 or yi=0. The first case produces the equivalent condition

$\alpha+\beta x_i > \log(u_i/(1-u_i))$

and the second case the equivalent condition

$\alpha+\beta x_i < - \log(u_i/(1-u_i))$

This is how we derive both uniform distributions in $\alpha$ and $\beta$.

What is both a typo and potentially confusing is the second formula in page 220, where we mention the uniform over the set.

$\left\{ (a,b)\,:\ y_i(a+bx_i) > \log\dfrac{u_i}{1-u_i} \right\}$

This set is missing (a) an intersection sign before the curly bracket and (b) a $(1-)^y_i$ instead of the $y_i$. It should be

$\displaystyle{\bigcap_{i=1}^n} \left\{ (a,b)\,:\ (-1)^{y_i}(a+bx_i) > \log\dfrac{u_i}{1-u_i} \right\}$

## Bayesian k-nearest neighbours

Posted in Statistics with tags , , , on April 9, 2009 by xi'an

Our paper with Lionel Cucala, Jean-Michel Marin and Mike Titterington on A Bayesian Reassessment of Nearest Neighbor Classification has now appeared in JASA. Recall that the standard k nearest neighbor (knn) procedure is a deterministic method used in supervised classification where the classification is based on the majority rule on the neighbours. In this paper, we propose a reassessment of the knn method as a statistical technique derived from a proper probabilistic model; in particular, we differ from the assessment found in Holmes and Adams (2002, 2003), where the underlying probabilistic model is not completely coherent in terms of conditionals versus joint. In addition, we evaluate computational tools for Bayesian inference on the parameters of the corresponding model, highlighting the difficulties inherent to both pseudo-likelihood and path sampling approximations of an intractable normalizing constant and demonstrating the limitations of the pseudo-likelihood approximation in this setup.