Archive for MCMC

adaptive subsampling for MCMC

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , on April 15, 2014 by xi'an

Oxford to Coventry, Feb. 25, 2012

“At equilibrium, we thus should not expect gains of several orders of magnitude.”

As was signaled to me several times during the MCqMC conference in Leuven, Rémi Bardenet, Arnaud Doucet and Chris Holmes (all from Oxford) just wrote a short paper for the proceedings of ICML on a way to speed up Metropolis-Hastings by reducing the number of terms one computes in the likelihood ratio involved in the acceptance probability, i.e.

\prod_{i=1}^n\frac{L(\theta^\prime|x_i)}{L(\theta|x_i)}.

The observations appearing in this likelihood ratio are a random subsample from the original sample. Even though this leads to an unbiased estimator of the true log-likelihood sum, this approach is not justified on a pseudo-marginal basis à la Andrieu-Roberts (2009). (Writing this in the train back to Paris, I am not convinced this approach is in fact applicable to this proposal as the likelihood itself is not estimated in an unbiased manner…)

In the paper, the quality of the approximation is evaluated by Hoeffding’s like inequalities, which serves as the basis for a stopping rule on the number of terms eventually evaluated in the random subsample. In fine, the method uses a sequential procedure to determine if enough terms are used to take the decision and the probability to take the same decision as with the whole sample is bounded from below. The sequential nature of the algorithm requires to either recompute the vector of likelihood terms for the previous value of the parameter or to store all of them for deriving the partial ratios. While the authors adress the issue of self-evaluating whether or not this complication is worth the effort, I wonder (from my train seat) why they focus so much on recovering the same decision as with the complete likelihood ratio and the same uniform. It would suffice to get the same distribution for the decision (an alternative that is easier to propose than to create of course). I also (idly) wonder if a Gibbs version would be manageable, i.e. by changing only some terms in the likelihood ratio at each iteration, in which case the method could be exact… (I found the above quote quite relevant as, in an alternative technique we are constructing with Marco Banterle, the speedup is particularly visible in the warmup stage.) Hence another direction in this recent flow of papers attempting to speed up MCMC methods against the incoming tsunami of “Big Data” problems.

MCqMC 2014 [day #4]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , on April 11, 2014 by xi'an

Leuven7

I hesitated in changing the above title for “MCqMSmaug” as the plenary talk I attended this morning was given by Wenzel Jakob, who uses Markov chain Monte Carlo methods in image rendering and light simulation. The talk was low-tech’, with plenty of pictures and animations (incl. excerpts from recent blockbusters!), but it stressed how much proper rending relies on powerful MCMC techniques. One point particularly attracted my attention, namely the notion of manifold exploration as it seemed related to my zero measure recent post. (A related video is available on Jakob’s webpage.) You may then wonder where the connection with Smaug could be found: Wenzel Jakob is listed in the credits of both Hobbit movies for his contributions to the visual effects! (Hey, MCMC made Smaug [visual effects the way they are], a cool argument for selling your next MCMC course! I will for sure include a picture of Smaug in my next R class presentation…) The next sessions of the morning opposed Sobol’s memorial to more technical light rendering and I chose Sobol, esp. because I had missed Art Owen’s tutorial on Sunday, as he gave a short presentation on using Sobol’s criteria to identify variables contributing the most to the variability or extreme values of a function, an extreme value kind of ANOVA, most interesting if far from my simulation area… The afternoon sessions saw MCMC talks by Luke Bornn and Scott Schmidler, both having connection with the Wang-Landau algorithm. Actually, Scott’s talk was the one generating the most animated discussion among all those I attended in MCqMC! (To the point of the chairman getting rather rudely making faces…)

MCqMC 2014 [day #3]

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , on April 10, 2014 by xi'an

Leuven2

As the second day at MCqMC 2014, was mostly on multi-level Monte Carlo and quasi-Monte Carlo methods, I did not attend many talks but had a long run in the countryside (even saw a pheasant and a heron), worked at “home” on pressing recruiting evaluations and had a long working session with Pierre Jacob. Plus an evening out sampling (just) a few Belgian beers in the shade of the city hall…

Today was more in my ballpark as there were MCMC talks the whole day! The plenary talk was not about MCMC as Erich Novak presented a survey on the many available results bounding the complexity of approximating an integral based on a fixed number of evaluations of the integrand, some involving the dimension (and its curse), some not, some as fast as √n and some not as fast, all this depending on the regularity and the size of the classes of integrands considered. In some cases, the solution was importance sampling, in other cases, quasi-Monte Carlo, and yet other cases were still unsolved. Then Yves Atchadé gave a new perspective on computing the asymptotic variance in the central limit theorem on Markov chains when truncating the autocovariance, Matti Vihola talked about theoretical orderings of Markov chains that transmuted into the very practical consequence that using more simulations in a pseudo-marginal likelihood approximation improved acceptance rate and asymptotic variances (and this applies to aBC-MCMC as well), Radu Craiu proposed a novel processing of adaptive MCMC by treating various approximations to the true target as food for a multiple-try Metropolis algorithm, and Luca Martino had a go at resuscitating the ARMS algorithm of Gilks and Wild (used for a while in BUGS), although the talk did not dissipate all of my misgivings about the multidimensional version! I had more difficulties following the “Warwick session” which was made of four talks by current or former students from Warwick, although I appreciated the complexity of the results in infinite dimensional settings and novel approximations to diffusion based Metropolis algorithms. No further session this afternoon as the “social” activity was to visit the nearby Stella Artois brewery! This activity made us very social, for certain, even though there was hardly a soul around in this massively automated factory. (Maybe an ‘Og post to come one of those days…)

MCqMC 2014 [day #1]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on April 9, 2014 by xi'an

Leuven2

As I have been kindly invited to give a talk at MCqMC 2014, here am I. in Leuven, Belgium, for this conference I have never attended before. (I was also invited for MCqMC 2012 in Sydney The talk topics and the attendees’ “sociology” are quite similar to those of the IMACS meeting in Annecy last summer. Namely, rather little on MCMC, particle filters, and other tools familiar in Bayesian computational statistics, but a lot on diffusions and stochastic differential equations and of course quasi-Monte Carlo methods. I thus find myself at a boundary of the conference range and a wee bit lost by some talks, which even titles make little sense to me.

For instance, I have trouble to connect with multi-level Monte Carlo within my own referential. My understanding of the method is one of a control variate version of tempering, namely of using a sequence of approximations to the true target and using rougher approximations as control variates for the finer approximations. But I cannot find on the Web a statistical application of the method outside of diffusions and SDEs, i.e. outside of continuous time processes… Maybe using a particle filter from one approximation to the next, down in terms of roughness, could help.

“Several years ago, Giles (2008) introduced an intriguing multi-level idea to deal with such biased settings that can dramatically improve the rate of convergence and can even, in some settings, achieve the canonical “square root” convergence rate associated with unbiased Monte Carlo.” Rhee and Glynn, 2012

Those were my thoughts before lunchtime. today (namely April 7, 2014). And then, after lunch, Peter Glynn gave his plenary talk that just answered those questions of mine’s!!! Essentially, he showed that formula Pierre Jacob also used in his Bernoulli factory paper to transform a converging-biased-into-an-unbiased estimator, based on a telescopic series representation and a random truncation… This approach is described in a paper with Chang-han Rhee, arXived a few years ago. The talk also covered more recent work (presumably related with Chang-han Rhee’s thesis) extending the above to Markov chains. As explained to me later by Pierre Jacob [of Statisfaction fame!], a regular chain does not converge fast enough to compensate for the explosive behaviour of the correction factor, which is why Rhee and Glynn used instead a backward chain, linking to the exact or perfect samplers of the 1990′s (which origin can be related to a 1992 paper of Asmussen, Glynn and Thorisson). This was certainly the most riveting talk I attended in the past years in that it brought a direct answer to a question I was starting to investigate. And more. I was also wondering how connected it was with our “exact” representation of the stationary distribution (in an Annals of Probability paper with Jim Hobert).   Since we use a stopping rule based on renewal and a geometric waiting time, a somewhat empirical version of the inverse probability found in Peter’s talk. This talk also led me to re-consider a recent discussion we had in my CREST office with Andrew about using square root(ed) importance weights, since one of Peter’s slides exhibited those square roots as optimal. Paradoxically, Peter started the talk by down-playing it, stating there was a single idea therein and a single important slide, making it a perfect after-lunch talk: I wish I had actually had thrice more time to examine each slide! (In the afternoon session, Éric Moulines also gave a thought-provoking talk on particle islands and double bootstrap, a research project I will comment in more detail the day it gets arXived.)

MCMC on zero measure sets

Posted in R, Statistics with tags , , , , , , , on March 24, 2014 by xi'an

zeromesSimulating a bivariate normal under the constraint (or conditional to the fact) that x²-y²=1 (a non-linear zero measure curve in the 2-dimensional Euclidean space) is not that easy: if running a random walk along that curve (by running a random walk on y and deducing x as x²=y²+1 and accepting with a Metropolis-Hastings ratio based on the bivariate normal density), the outcome differs from the target predicted by a change of variable and the proper derivation of the conditional. The above graph resulting from the R code below illustrates the discrepancy!

targ=function(y){
  exp(-y^2)/(1.52*sqrt(1+y^2))}

T=10^5
Eps=3
ys=xs=rep(runif(1),T)
xs[1]=sqrt(1+ys[1]^2)
for (t in 2:T){
  propy=runif(1,-Eps,Eps)+ys[t-1]
  propx=sqrt(1+propy^2)
  ace=(runif(1)<(dnorm(propy)*dnorm(propx))/
               (dnorm(ys[t-1])*dnorm(xs[t-1])))
  if (ace){
     ys[t]=propy;xs[t]=propx
     }else{
       ys[t]=ys[t-1];xs[t]=xs[t-1]}}

If instead we add the proper Jacobian as in

  ace=(runif(1)<(dnorm(propy)*dnorm(propx)/propx)/
               (dnorm(ys[t-1])*dnorm(xs[t-1])/xs[t-1]))

the fit is there. My open question is how to make this derivation generic, i.e. without requiring the (dreaded) computation of the (dreadful) Jacobian.

zeromas

Follow

Get every new post delivered to your Inbox.

Join 551 other followers