Archive for Gibbs sampler

Bangalore workshop [ಬೆಂಗಳೂರು ಕಾರ್ಯಾಗಾರ]

Posted in pictures, R, Running, Statistics, Travel, University life, Wines with tags , , , , , , on July 31, 2014 by xi'an

mathdeptSecond day at the Indo-French Centre for Applied Mathematics and the workshop. Maybe not the most exciting day in terms of talks (as I missed the first two plenary sessions by (a) oversleeping and (b) running across the campus!). However I had a neat talk with another conference participant that led to [what I think are] interesting questions… (And a very good meal in a local restaurant as the guest house had not booked me for dinner!)

To wit: given a target like

\lambda \exp(-\lambda) \prod_{i=1}^n \dfrac{1-\exp(-\lambda y_i)}{\lambda}\quad (*)

the simulation of λ can be demarginalised into the simulation of

\pi (\lambda,\mathbf{z})\propto \lambda \exp(-\lambda) \prod_{i=1}^n \exp(-\lambda z_i) \mathbb{I}(z_i\le y_i)

where z is a latent (and artificial) variable. This means a Gibbs sampler simulating λ given z and z given λ can produce an outcome from the target (*). Interestingly, another completion is to consider that the zi‘s are U(0,yi) and to see the quantity

\pi(\lambda,\mathbf{z}) \propto \lambda \exp(-\lambda) \prod_{i=1}^n \exp(-\lambda z_i) \mathbb{I}(z_i\le y_i)

as an unbiased estimator of the target. What’s quite intriguing is that the quantity remains the same but with different motivations: (a) demarginalisation versus unbiasedness and (b) zi ∼ Exp(λ) versus zi ∼ U(0,yi). The stationary is the same, as shown by the graph below, the core distributions are [formally] the same, … but the reasoning deeply differs.

twoversions

Obviously, since unbiased estimators of the likelihood can be justified by auxiliary variable arguments, this is not in fine a big surprise. Still, I had not thought of the analogy between demarginalisation and unbiased likelihood estimation previously. Continue reading

a day for comments

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on April 21, 2014 by xi'an

As I was flying over Skye (with [maybe] a first if hazy perspective on the Cuillin ridge!) to Iceland, three long sets of replies to some of my posts appeared on the ‘Og:

Thanks to them for taking the time to answer my musings…

 

adaptive subsampling for MCMC

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , on April 15, 2014 by xi'an

Oxford to Coventry, Feb. 25, 2012

“At equilibrium, we thus should not expect gains of several orders of magnitude.”

As was signaled to me several times during the MCqMC conference in Leuven, Rémi Bardenet, Arnaud Doucet and Chris Holmes (all from Oxford) just wrote a short paper for the proceedings of ICML on a way to speed up Metropolis-Hastings by reducing the number of terms one computes in the likelihood ratio involved in the acceptance probability, i.e.

\prod_{i=1}^n\frac{L(\theta^\prime|x_i)}{L(\theta|x_i)}.

The observations appearing in this likelihood ratio are a random subsample from the original sample. Even though this leads to an unbiased estimator of the true log-likelihood sum, this approach is not justified on a pseudo-marginal basis à la Andrieu-Roberts (2009). (Writing this in the train back to Paris, I am not convinced this approach is in fact applicable to this proposal as the likelihood itself is not estimated in an unbiased manner…)

In the paper, the quality of the approximation is evaluated by Hoeffding’s like inequalities, which serves as the basis for a stopping rule on the number of terms eventually evaluated in the random subsample. In fine, the method uses a sequential procedure to determine if enough terms are used to take the decision and the probability to take the same decision as with the whole sample is bounded from below. The sequential nature of the algorithm requires to either recompute the vector of likelihood terms for the previous value of the parameter or to store all of them for deriving the partial ratios. While the authors adress the issue of self-evaluating whether or not this complication is worth the effort, I wonder (from my train seat) why they focus so much on recovering the same decision as with the complete likelihood ratio and the same uniform. It would suffice to get the same distribution for the decision (an alternative that is easier to propose than to create of course). I also (idly) wonder if a Gibbs version would be manageable, i.e. by changing only some terms in the likelihood ratio at each iteration, in which case the method could be exact… (I found the above quote quite relevant as, in an alternative technique we are constructing with Marco Banterle, the speedup is particularly visible in the warmup stage.) Hence another direction in this recent flow of papers attempting to speed up MCMC methods against the incoming tsunami of “Big Data” problems.

Seminar in Nottingham

Posted in Kids, pictures, Running, Statistics, Travel, University life with tags , , , , , on March 26, 2014 by xi'an

NottLast Thursday, I gave a seminar in Nottingham, the true birthplace of the Gibbs sampler!, and I had a quite enjoyable half-day of scientific discussions in the Department of Statistics, with a fine evening tasting a local ale in the oldest (?) inn in England (Ye Olde Trip to Jerusalem) and sampling Indian dishes at 4550 Miles [plus or minus epsilon, since the genuine distance is 4200 miles) from Dehli, plus a short morning run on the very green campus. In particular, I discussed with Theo Kypraios and Simon Preston parallel ABC and their recent paper in Statistics and Computing, their use of the splitting technique of Neiswanger et al. I discussed earlier but intended here towards a better ABC approximation since (a) each term in the product could correspond to a single observation and (b) hence no summary statistic was needed and a zero tolerance could be envisioned. The  paper discusses how to handle samples from terms in a product of densities, either by a Gaussian approximation or by a product of kernel estimates. And mentions connections with expectation propagation (EP), albeit not at the ABC level.

A minor idea that came to me during this discussion was to check whether or not a reparameterisation towards a uniform prior was a good idea: the plus of a uniform prior was that the power discussion was irrelevant, making both versions of the parallel MCMC algorithm coincide. The minus was not the computational issue since most priors are from standard families, with easily invertible cdfs, but rather why this was supposed to make a difference. When writing this on the train to Oxford, I started wondering as an ABC implementation is impervious to this reparameterisation. Indeed, simulate θ from π and pseudo-data given θ versus simulate μ from uniform and pseudo-data given T(μ) does not make a difference in the simulated pseudo-sample, hence in the distance selected θ’s, and still in one case the power does not matter while in the other case it does..!

IMG_0106Another discussion I had during my visit led me to conclude a bit hastily that a thesis topic I had suggested to a new PhD student a few months ago had already been considered locally and earlier, although it ended up as a different, more computational than conceptual, perspective (so not all was lost for my student!). In a wider discussion around lunch, we also had an interesting foray on possible alternatives to Bayes factors and their shortcomings, which was a nice preparation to my seminar on giving up posterior probabilities for posterior error estimates. And an opportunity to mention the arXival of a proper scoring rules paper by Phil Dawid, Monica Musio and Laura Ventura, related with the one I had blogged about after the Padova workshop. And then again about a connected paper with Steve Fienberg. This lunch discussion even included some (mild) debate about Murray Aitkin’s integrated likelihood.

deverreAs a completely irrelevant aside, this trip gave me the opportunity of a “pilgrimage” to Birmingham New Street train station, 38 years after “landing” for the first time in Britain! And to experience a fresco the multiple delays and apologies of East Midlands trains (“we’re sorry we had to wait for this oil train in York”, “we have lost more time since B’ham”, “running a 37 minutes delay now”, “we apologize for the delay, due to trespassing”, …), the only positive side being that delayed trains made delayed connections possible!

simulating determinantal processes

Posted in Statistics, Travel with tags , , , , , , , , , , on December 6, 2013 by xi'an

In the plane to Atlanta, I happened to read a paper called Efficient simulation of the Ginibre point process by Laurent Decreusefond, Ian Flint, and Anaïs Vergne (from Telecom Paristech). “Happened to” as it was a conjunction of getting tipped by my new Dauphine colleague (and fellow blogger!) Djalil Chaffaï about the paper, having downloaded it prior to departure, and being stuck in a plane (after watching the only Chinese [somewhat] fantasy movie onboard, Saving General Yang).

This is mostly a mathematics paper. While indeed a large chunk of it is concerned with the rigorous definition of this point process in an abstract space, the last part is about simulating such processes. They are called determinantal (and not detrimental as I was tempted to interpret on my first read!) because the density of an n-set (x1x2,…,xn) is given by a kind of generalised Vandermonde determinant

p(x_1,\ldots,x_n) = \dfrac{1}{n!} \text{det} \left( T(x_i,x_j) \right)

where T is defined in terms of an orthonormal family,

T(x,y) = \sum_{i=1}^n \psi_i(x) \overline{\psi_i(y)}.

(The number n of points can be simulated via an a.s. finite Bernoulli process.) Because of this representation, the sequence of conditional densities for the xi‘s (i.e. x1, x2 given x1, etc.) can be found in closed form. In the special case of the Ginibre process, the ψi‘s are of the form

\psi_i(z) =z^m \exp\{-|z|^2/2\}/\sqrt{\pi m!}

and the process cannot be simulated for it has infinite mass, hence an a.s. infinite number of points. Somehow surprisingly (as I thought this was the point of the paper), the authors then switch to a truncated version of the process that always has a fixed number N of points. And whose density has the closed form

p(x_1,\ldots,x_n) = \dfrac{1}{\pi^N} \prod_i \frac{1}{i!} \exp\{-|z_i|^2/2\}\prod_{i<j} |z_i-z_j|^2

It has an interestingly repulsive quality in that points cannot get close to one another. (It reminded me of the pinball sampler proposed by Kerrie Mengersen and myself at one of the Valencia meetings and not pursued since.) The conclusion (of this section) is anticlimactic, though,  in that it is known that this density also corresponds to the distribution of the eigenvalues of an Hermitian matrix with standardized complex Gaussian entries. The authors mentions that the fact that the support is the whole complex space Cn is a difficulty, although I do not see why.

The following sections of the paper move to the Ginibre process restricted to a compact and then to the truncated Ginibre process restricted to a compact, for which the authors develop corresponding simulation algorithms. There is however a drag in that the sequence of conditionals, while available in closed-form, cannot be simulated efficiently but rely on a uniform accept-reject instead. While I am certainly missing most of the points in the paper, I wonder if a Gibbs sampler would not be an interesting alternative given that the full (last) conditional is a Gaussian density…

reading classics (#6)

Posted in Statistics with tags , , , , , , , on December 21, 2012 by xi'an

Today my student Xiaolin Cheng presented the mythical 1990 JASA paper of Alan Gelfand and Adrian Smith, Sampling-based approaches to calculating marginal densities. The very one that started the MCMC revolution of the 1990’s! Re-reading it through his eyes was quite enlightening, even though he stuck quite closely to the paper. (To the point of not running his own simulation, nor even reporting Gelfand and Smith’s, as shown by the slides below. This would have helped, I think…)

Indeed, those slides focus very much on the idea that such substitution samplers can provide parametric approximations to the marginal densities of the components of the simulated parameters. To the point of resorting to importance sampling as an alternative to the standard Rao-Blackwell estimate, a solution that did not survive long. (We briefly discussed this point during the seminar, as the importance function was itself based on a Rao-Blackwell estimate, with possibly tail issues. Gelfand and Smith actually conclude on the higher efficiency of the Gibbs sampler.) Maybe not so surprisingly, the approximation of the “other” marginal, namely the marginal likelihood, as it is much more involved (and would lead to the introduction of the infamous harmonic mean estimator a few years later! And Chib’s (1995), which is very close in spirit to the Gibbs sampler). While Xiaolin never mentioned Markov chains in his talk, Gelfand and Smith only report that Gibbs sampling is a Markovian scheme, and refer to both Geman and Geman (1984) and Tanner and Wong (1987), for convergence issues. Rather than directly invoking Markov arguments as in Tierney (1994) and others. A fact that I find quite interesting, a posteriori, as it highlights the strong impact Meyn and Tweedie would have, three years later.

the biggest change

Posted in Statistics, University life with tags , , , , , , on September 29, 2011 by xi'an

The current question for the ISBA Bulletin is “What is the biggest and most surprising change in the field of Statistics that you have witnessed, and what do you think will be the next one?” The answer to the second part is easy: I do not know and even if I knew I would be writing papers about it rather than spilling the beans… The answer to the first part is anything but easy. At the most literal level, taking “witnessed” at face value, I have witnessed the “birth” of Markov chain Monte Carlo methods at the conference organised in Sherbrooke by Jean-Francois Angers in June 1989… (This was already reported in our Short history of MCMC with George Casella.) I clearly remember Adrian showing the audience a slide with about ten lines of Fortran code that corresponded to the Gibbs sampler for a Bayesian analysis of a mixed effect linear model (later to be analysed in JASA). This was so shockingly simple… It certainly was the talk that had the most impact on my whole career, even though (a) I would have certainly learned about MCMC quickly enough had I missed the Sherbrooke conference and (b) there were other talks in my academic life that also induced that “wow” moment, for sure. At a less literal level, the biggest chance if not the most surprising is that the field has become huge, multifaceted, and ubiquitous. When I started studying statistics, it was certainly far from being the sexiest possible field! (At least in the general public) And the job offers were not as numerous and diverse as they are today. (The same is true for Bayesian statistics, of course. Even though it has sounded sexy from the start!)

Follow

Get every new post delivered to your Inbox.

Join 705 other followers