## cut, baby, cut!

Posted in Books, Kids, Mountains, R, Statistics, University life with tags , , , , , , , , , , , , , on January 29, 2014 by xi'an

At MCMSki IV, I attended (and chaired) a session where Martyn Plummer presented some developments on cut models. As I was not sure I had gotten the idea [although this happened to be one of those few sessions where the flu had not yet completely taken over!] and as I wanted to check about a potential explanation for the lack of convergence discussed by Martyn during his talk, I decided to (re)present the talk at our “MCMSki decompression” seminar at CREST. Martyn sent me his slides and also kindly pointed out to the relevant section of the BUGS book, reproduced above. (Disclaimer: do not get me wrong here, the title is a pun on the infamous “drill, baby, drill!” and not connected in any way to Martyn’s talk or work!)

I cannot say I get the idea any clearer from this short explanation in the BUGS book, although it gives a literal meaning to the word “cut”. From this description I only understand that a cut is the removal of an edge in a probabilistic graph, however there must/may be some arbitrariness in building the wrong conditional distribution. In the Poisson-binomial case treated in Martyn’s case, I interpret the cut as simulating from

$\pi(\phi|z)\pi(\theta|\phi,y)=\dfrac{\pi(\phi)f(z|\phi)}{m(z)}\dfrac{\pi(\theta|\phi)f(y|\theta,\phi)}{m(y|\phi)}$

$\pi(\phi|z,\mathbf{y})\pi(\theta|\phi,y)\propto\pi(\phi)f(z|\phi)\pi(\theta|\phi)f(y|\theta,\phi)$

hence loosing some of the information about φ… Now, this cut version is a function of φ and θ that can be fed to a Metropolis-Hastings algorithm. Assuming we can handle the posterior on φ and the conditional on θ given φ. If we build a Gibbs sampler instead, we face a difficulty with the normalising constant m(y|φ). Said Gibbs sampler thus does not work in generating from the “cut” target. Maybe an alternative borrowing from the rather large if disparate missing constant toolbox. (In any case, we do not simulate from the original joint distribution.) The natural solution would then be to make a independent proposal on φ with target the posterior given z and then any scheme that preserves the conditional of θ given φ and y; “any” is rather wistful thinking at this stage since the only practical solution that I see is to run a Metropolis-Hasting sampler long enough to “reach” stationarity… I also remain with a lingering although not life-threatening question of whether or not the BUGS code using cut distributions provide the “right” answer or not. Here are my five slides used during the seminar (with a random walk implementation that did not diverge from the true target…):

## MCMSki IV [day 3]

Posted in Mountains, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on January 9, 2014 by xi'an

Already on the final day..! And still this frustration in being unable to attend three sessions at once… Andrew Gelman started the day with a non-computational talk that broached on themes that are familiar to readers of his blog, on the misuse of significance tests and on recommendations for better practice. I then picked the Scaling and optimisation of MCMC algorithms session organised by Gareth Roberts, with optimal scaling talks by Tony Lelièvre, Alex Théry and Chris Sherlock, while Jochen Voss spoke about the convergence rate of ABC, a paper I already discussed on the blog. A fairly exciting session showing that MCMC’ory (name of a workshop I ran in Paris in the late 90’s!) is still well and alive!

After the break (sadly without the ski race!), the software round-table session was something I was looking for. The four softwares covered by this round-table were BUGS, JAGS, STAN, and BiiPS, each presented according to the same pattern. I would have like to see a “battle of the bands”, illustrating pros & cons for each language on a couple of models & datasets. STAN got the officious prize for cool tee-shirts (we should have asked the STAN team for poster prize tee-shirts). And I had to skip the final session for a flu-related doctor appointment…

I called for a BayesComp meeting at 7:30, hoping for current and future members to show up and discuss the format of the future MCMski meetings, maybe even proposing new locations on other “sides of the Italian Alps”! But (workshop fatigue syndrome?!), no-one showed up. So anyone interested in discussing this issue is welcome to contact me or David van Dyk, the new BayesComp program chair.

## Bayes on drugs (guest post)

Posted in Books, R, Statistics, University life with tags , , , , , , , on May 21, 2012 by xi'an

This post is written by Julien Cornebise.

Last week in Aachen was the 3rd Edition of the Bayes(Pharma) workshop. Its specificity: half-and-half industry/academic participants and speakers, all in Pharmaceutical statistics, with a great care to welcome newcomers to Bayes, so as to spread as much as possible the love where it will actually be used. First things first: all the slides are available online, thanks to the speakers for sharing those. Full disclaimer: being part of the scientific committee of the workshop, I had a strong subjective prior.

3 days, 70 participants, we were fully booked, and even regretfully had to refuse inscriptions due to lack of room-space (!! German regulations are quite… enforced). Time to size it up for next year, maybe?

My most vivid impression overall: I was struck by the interactivity of the questions/answers after each talk. Rarely fewer than 5 questions per talk (come on, we’ve all attended sessions where the chairman is forced to ask the lone question — no such thing here!), on all points of each talk, with cross-references from one question to the other, even from one *talk* to the other! Seeing so much interaction and discussion in spite of (or, probably, thanks to ?) the diversity of the audience was a real treat: not only did the questions bring up additional details about the talk, they were, more importantly, bringing very precious highlight on the questioners’ mindsets, their practical concerns and needs. Both academics and industrials were learning on all counts — and, for having sometimes seen failed marriages of the kind in the past (either a French round-table degenerating in nasty polemic on “research-induced tax credit”, or just plain mismatch of interests), I was quite impressed that we were purely and simply all interested in multiple facets of the very same thing: the interface between pharma and stats.

As is now a tradition, the first day was a short course, this time by Pr. Emmanuel Lessaffre: based on his upcoming book on Bayesian Biostatistics (Xian, maybe a review someday?), it was meant to be introductory for newcomers to Bayes, but was still packed with enough “tricks of the trades” that even seasoned Bayesians could get something out of it. I very much appreciated the pedagogy in the “live” examples, with clear convergence caveats based on traceplots of common software (WinBUGS). The most vivid memory: his strong spotlight on INLA as “the future of Bayesian computation”. Although my research is mostly on MCMC/SMC, I’m now damn curious to give it a serious try — this was further reinforced by late evening discussions with Gianluca BaioM, who revealed that all his results that were all obtained in seconds of INLA computing.

Day 2 and half-day 3 were invited and contributed talks, all motivated by top-level applications. No convergence theorems here, but practical issues, with constraints that theoreticians (including myself!) would hardly guess exist: very small sample sizes, regulatory issues, concurrence with legacy methodology with only seconds-long runtime (impossible to run 1 million MCMC steps!), and sometimes even imposed software due to validation processes! Again, as stated above, the number and quality of questions is really what I will keep from those 2 days.

If I had to state one regret, maybe, it would be this unsatisfactory feeling that, for many newcomers, MCMC = WinBUGS — with its obvious restrictions. The lesson I learned: all the great methodological advances of the last 10 years, especially in Adaptive MCMC, have not yet reached most practitioners yet, since they need *tools* they can use. It may be a sign that, as methodological researchers, we should maybe put a stronger emphasis on bringing software packages forward (for R, of course, but also for JAGS or OpenBUGS!); not only a zip-file with our article’s codes, but a full-fledged package, with ongoing support, maintenance, and forum. That’s a tough balance to find, since the time maintaining a package does not count in the holy-bibliometry… but doesn’t it have more actual impact? Besides, more packages = less papers but also = more citations of the corresponding paper. Some do take this road (Robert Gramacy’s packages were cited last week as examples of great support, and Andy Gelman and Matt Hoffman are working on the much-expected STAN, and I mentioned above Havard Rue’s R-INLA), but I don’t think it is yet considered “best practices”.

As a conclusion, this Bayes-Pharma 2012 workshop reminded me a lot of the SAMSI 2010 Summer Program: while Bayes-Pharma aims to be much more introductory, they had in common this same success in blending pharma-industry and academy. Could it be a specificity of pharma? In which case, I’m looking very much forward opening ISBA’s Specialized Section on Biostat/Pharmastat that a few colleagues and I are currently working on (more on this here soon). With such a crowd on both sides of the Atlantic, and a looming Bayes 2013 in the Netherlands, that will be exciting.

## A slice of infinity

Posted in R, Statistics, University life with tags , , , , , , , on July 28, 2011 by xi'an

Peng Yu sent me an email about the conditions for convergence of a Gibbs sampler:

The following statement mentions convergence. But I’m not familiar what the regularity condition is.

“But it is necessary to have a finite probability of moving away from the current state at all times in order to satisfy the regularity conditions on which the whole MCMC theory depends.”

Slice sampler is discussed in your book Monte Carlo Statistical Methods. I think that the “regularity condition” may have been discussed in your book. If so, would you please let me know where it is? Thanks and look forward to hearing from you!

The quote is from Martyn Plummer and deals with a stopping rule in JAGS implementation of the slice sampler. (The correct wording should be “strictly positive probability” rather than “finite probability”, I think.) However, this has nothing to do with a “regularity condition” on the irreducibility of a Markov chain: if a slice sampler is implemented for an unbounded density target, say a Beta(1/2,1/2), there is no irreducibility condition connected with the infiniteness of the density. In theory, (a) the chain never visits the “state” where the density is infinite (if only because we are dealing with a continuous state space) and (b) after visiting a value x with a large density f(x), the slice sampler allows for a move away from it since the slice involves a uniform simulation over (0,f(x)). Deeper properties of the slice sampler (like geometric ergodicity) are explored in, e.g., this JRSS B paper by Gareth Roberts and Jeff Rosenthal and this one in the Annals of Statistics by Radford Neal. In practice, the problem is caused by values of f(x) that cannot be computed and hence produce an error message like

Singularity in likelihood found by Slicer.

If those singularities can be localised, a neighbourhood excluding them should be introduced. (More easily said than done, obviously!)

Here is an example of a slice sampler with the Beta(1/2,1/2) distribution:

#graphics
dote=function(x,y) points(x,y,col="gold",pch=19,cex=.4)
mote=function(x,y,z,w) lines(c(x,z),c(y,w),col="gold",lwd=.5)
cst=dbeta(.5,.5,.5)*.5 #normalising constant
#inverting f(x)=d, 2nd degree equation
hitden=function(d) .5+.5*sqrt(1-4*( cst/ max(d,dbeta(.5,.5,.5)))^2)*c(-1,1)
#output
curve(dbeta(x,.5,.5),0,1,ylab="density",lwd=2,col="steelblue",n=1001)
x=runif(1);u=runif(1)*dbeta(x,.5,.5);dote(x,u)
for (t in 1:100){ #100 slice steps
bo=hitden(u)
nx=sample(c(runif(1,0,bo[1]),runif(1,bo[2],1)),1)
nu=runif(1)*dbeta(nx,.5,.5)
mote(x,u,nx,nu)
x=nx;u=nu;dote(x,u)
}


which clearly explores the whole area under the Beta(1/2,1/2) density. Even when started at a large density value like f(.999999), it eventually leaves the vicinity of this highly improbable value.

## Trip to Lyon

Posted in Mountains, pictures, R, Statistics, University life with tags , , , , , , on January 21, 2011 by xi'an

This was my first trip to Lyon in about… 35 years, I think, but I did not have much time to tour the city! My original plan was to go climbing with Ivan near La Meije right after the talk, but our respective knees were hurting for the past week at least (since Utah in my case…), so we decided to postpone. I think the talk itself went on rather ok (?!), with good questions for possible extensions at the end. And Martyn Plummer (the creator and maintainer of JAGS) came with a most relevant question about the difficult comparison with completely independent MCMC chains, a point Colin also raised a while ago.

## CoRe in CiRM [3]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , on July 11, 2010 by xi'an

Still drudging along preparing the new edition of Bayesian Core. I am almost done with the normal chapter, where I also changed the Monte Carlo section to include specific tools (bridge) for evidence/Bayes factor approximation. Jean-Michel has now moved to the new hierarchical model chapter and analysed longitudinal  datasets that will constitute the core of the chapter, along with the introduction of DAGs. (meaning some time “wasted” on creating DAG graphs, I am afraid!) He is also considering including a comparison with OpenBUGS and JAGS implementations, with the convincing motive that the hierarchical models are used in settings where practitioners have no time/ability to derive Gibbs samplers for a whole collection of models they want to compare… And we are  vaguely wondering whether or not using a pun in the title, from Bayesian Core to Bayesian CoRe, in order to emphasise the R link. This morning, it took me half an hour to figure out how resetting new counters (like our exercise environment) in LaTeX,

\makeatletter
\makeatother