Archive for Julian Besag

amazing appendix

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , on February 13, 2018 by xi'an

In the first appendix of the 1995 Statistical Science paper of Besag, Green, Higdon and Mengersen, on MCMC, “Bayesian Computation and Stochastic Systems”, stands a fairly neat result I was not aware of (and which Arnaud Doucet, with his unrivalled knowledge of the literature!, pointed out to me in Oxford, avoiding me the tedium to try to prove it afresco!). I remember well reading a version of the paper in Fort Collins, Colorado, in 1993 (I think!) but nothing about this result.

It goes as follows: when running a Metropolis-within-Gibbs sampler for component x¹ of a collection of variates x¹,x²,…, thus aiming at simulating from the full conditional of x¹ given x⁻¹ by making a proposal q(x|x¹,x⁻¹), it is perfectly acceptable to use a proposal that depends on a parameter α (no surprise so far!) and to generate this parameter α anew at each iteration (still unsurprising as α can be taken as an auxiliary variable) and to have the distribution of this parameter α depending on the other variates x²,…, i.e., x⁻¹. This is the surprising part, as adding α as an auxiliary variable was messing up the update of x⁻¹. But the proof as found in the 1995 paper [page 35] does not require to consider α as such as it establishes global balance directly. (Or maybe still detailed balance when writing the whole Gibbs sampler as a cycle of Metropolis steps.) Terrific! And a whiff mysterious..!

Gibbs for kidds

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , on February 12, 2018 by xi'an


A chance (?) question on X validated brought me to re-read Gibbs for Kids, 25 years after it was written (by my close friends George and Ed). The originator of the question had difficulties with the implementation, apparently missing the cyclic pattern of the sampler, as in equations (2.3) and (2.4), and with the convergence, which is only processed for a finite support in the American Statistician paper. The paper [which did not appear in American Statistician under this title!, but inspired an animal bredeer, Dan Gianola, to write a “Gibbs for pigs” presentation in 1993 at the 44th Annual Meeting of the European Association for Animal Production, Aarhus, Denmark!!!] most appropriately only contains toy examples since those can be processed and compared to know stationary measures. This is for instance the case for the auto-exponential model

f(x,y) \propto exp(-xy)

which is only defined as a probability density for a compact support. (The paper does not identify the model as a special case of auto-exponential model, which apparently made the originator of the model, Julian Besag in 1974, unhappy, as George and I found out when visiting Bath, where Julian was spending the final year of his life, many years later.) I use the limiting case all the time in class to point out that a Gibbs sampler can be devised and operate without a stationary probability distribution. However, being picky!, I would like to point out that, contrary, to a comment made in the paper, the Gibbs sampler does not “fail” but on the contrary still “converges” in this case, in the sense that a conditional ergodic theorem applies, i.e., the ratio of the frequencies of visits to two sets A and B with finite measure do converge to the ratio of these measures. For instance, running the Gibbs sampler 10⁶ steps and ckecking for the relative frequencies of x’s in (1,2) and (1,3) gives 0.685, versus log(2)/log(3)=0.63, since 1/x is the stationary measure. One important and influential feature of the paper is to stress that proper conditionals do not imply proper joints. George would work much further on that topic, in particular with his PhD student at the time, my friend Jim Hobert.

With regard to the convergence issue, Gibbs for Kids points out to Schervish and Carlin (1990), which came quite early when considering Gelfand and Smith published their initial paper the very same year, but which also adopts a functional approach to convergence, along the paper’s fixed point perspective, somehow complicating the matter. Later papers by Tierney (1994), Besag (1995), and Mengersen and Tweedie (1996) considerably simplified the answer, which is that irreducibility is a necessary and sufficient condition for convergence. (Incidentally, the reference list includes a technical report of mine’s on latent variable model MCMC implementation that never got published.)

Wilfred Keith Hastings [1930-2016]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on December 9, 2016 by xi'an

A few days ago I found on the page Jeff Rosenthal has dedicated to Hastings that he has passed away peacefully on May 13, 2016 in Victoria, British Columbia, where he lived for 45 years as a professor at the University of Victoria. After holding positions at University of Toronto, University of Canterbury (New Zealand), and Bell Labs (New Jersey). As pointed out by Jeff, Hastings’ main paper is his 1970 Biometrika description of Markov chain Monte Carlo methods, Monte Carlo sampling methods using Markov chains and their applications. Which would take close to twenty years to become known to the statistics world at large, although you can trace a path through Peskun (his only PhD student) , Besag and others. I am sorry it took so long to come to my knowledge and also sorry it apparently went unnoticed by most of the computational statistics community.

no thesis no more?!

Posted in Kids, University life with tags , , , on August 5, 2016 by xi'an

thez“The traditional goal is to demonstrate the candidate’s ability to conduct independent research on a novel concept and to communicate the results in an accessible way. Where the academics differ is on how best to achieve that goal.”

Nature had an editorial and more on the changing nature of the PhD thesis and the possible abandonment of the thing. This is an interesting if radical proposal. There are many cases of highly successful research careers that bypassed the PhD station. Take for instance Julian Besag. On the one hand, what matters most for granting a PhD is the ability to produce independent research that is innovative enough. For this purpose, publication in a serious scientific journal is the right filter. (Serious as opposed to predatory.) Asking referees to review a thesis when the chapters have been published and hence refereed sounds like a waste of time. Nature also mentions the issue of the oral defence, which varies a lot across countries and institutions from inexistent to highly formal. If the relevance of the oral defence is to assess the ability of the candidate to present one’s work in an understandable manner, conferences should do. Except that talks are never assessed. If a speaker is poor, he or she may not get invited again by those who attended the talk. May. But this is somewhat secondary in that examples abound of geniuses who were or are unable to deliver good lectures, with no consequence on the quality of their research and collaborations. Collaborations is actually a sensitive aspect, as more and more papers that make the PhD thesis are written jointly. Evaluations of the contribution of the PhD candidate then get delicate, especially when several PhDs are involved. (I used to refrain from co-signing publications with my students during their thesis, but I have loosened this rule in the past years as I find myself more involved in some projects and hence more eager or impatient to see the outcome completed!)

On the other hand, a PhD thesis may help in getting students to focus on broader issues, when compared with published short papers on marginal improvements in quick succession. But this may not be enough of an incentive. The status of the PhD student is also somewhat unique and provides a buffer between studies and research position, where the student gradually morphs into a researcher (or gives up). If we were to abandon the PhD thesis, there would need to be some equivalent structure to give them status and financial support. However, most places accommodate graduate researchers and the ability to support them for variable periods. It would just mean adjusting for longer durations and some degree of protection…

[The above picture is copied from the site that compiles theses published in France and produces some basic statistics, which are all wrong in my case!]

Bayes 250th versus Bayes 2.5.0.

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 20, 2013 by xi'an

More than a year ago Michael Sørensen (2013 EMS Chair) and Fabrizzio Ruggeri (then ISBA President) kindly offered me to deliver the memorial lecture on Thomas Bayes at the 2013 European Meeting of Statisticians, which takes place in Budapest today and the following week. I gladly accepted, although with some worries at having to cover a much wider range of the field rather than my own research topic. And then set to work on the slides in the past week, borrowing from my most “historical” lectures on Jeffreys and Keynes, my reply to Spanos, as well as getting a little help from my nonparametric friends (yes, I do have nonparametric friends!). Here is the result, providing a partial (meaning both incomplete and biased) vision of the field.

Since my talk is on Thursday, and because the talk is sponsored by ISBA, hence representing its members, please feel free to comment and suggest changes or additions as I can still incorporate them into the slides… (Warning, I purposefully kept some slides out to preserve the most surprising entry for the talk on Thursday!)

Biometrika, volume 100

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on March 5, 2013 by xi'an

I had been privileged to have a look at a preliminary version of the now-published retrospective written by Mike Titterington on the 100 first issues of Biometrika (more exactly, “from volume 28 onwards“, as the title state). Mike was the dedicated editor of Biometrika for many years and edited a nice book for the 100th anniversary of the journal. He started from the 100th most highly cited papers within the journal to build a coherent chronological coverage. From a Bayesian perspective, this retrospective starts with Maurice Kendall trying to reconcile frequentists and non-frequentists in 1949, while having a hard time with fiducial statistics. Then Dennis Lindley makes it to the top 100 in 1957 with the Lindley-Jeffreys paradox. From 1958 till 1961, Darroch is quoted several times for his (fine) formalisation of the capture-recapture experiments we were to study much later (Biometrika, 1992) with Ed George… In the 1960’s, Bayesian papers became more visible, including Don Fraser (1961) and Arthur Dempster’ Demspter-Shafer theory of evidence, as well as George Box and co-authors (1965, 1968) and Arnold Zellner (1964). Keith Hastings’ 1970 paper stands as the fifth most highly cited paper, even though it was ignored for almost two decades. The number of Bayesian papers kept increasing. including Binder’s (1978) cluster estimation, Efron and Morris’ (1972) James-Stein estimators, and Efron and Thisted’s (1978) terrific evaluation of Shakespeare’s vocabulary. From then, the number of Bayesian papers gets too large to cover in its entirety. The 1980’s saw papers by Julian Besag (1977, 1989, 1989 with Peter Clifford, which was yet another precursor MCMC) and Luke Tierney’s work (1989) on Laplace approximation. Carter and Kohn’s (1994) MCMC algorithm on state space models made it to the top 40, while Peter Green’s (1995) reversible jump algorithm came close to Hastings’ (1970) record, being the 8th most highly cited paper. Since the more recent papers do not make it to the top 100 list, Mike Titterington’s coverage gets more exhaustive as the years draw near, with an almost complete coverage for the final years. Overall, a fascinating journey through the years and the reasons why Biometrika is such a great journal and constantly so.

When Buffon meets Bertrand

Posted in R, Statistics, Travel with tags , , , , , on April 7, 2011 by xi'an

When Peter Diggle gave his “short history” of spatial statistics this morning (I typed this in the taxi from Charles de Gaulle airport, after waiting one hour for my bag!), he started with a nice slide about Buffon’s needle (and Buffon’s portrait), since Julian Besag was often prone to give this problem as a final exam to Durham students (one of whom is responsible for the candidate’s formula). This started me thinking about how this was open to a Bertrand’s paradox of its own. Indeed, randomness for the needle throw can be represented in many ways:

  • needle centre uniformly distributed over the room (or the perpendicular to the boards) with a random orientation (with a provision to have the needle fit);
  • needle endpoint uniformly distributed over the room (again a uniform over the perpendicular is enough) with a random orientation (again with a constraint);
  • random orientation from one corner of the room and a uniform location of the centre on the resulting line (with constraints on both ends for the needle to fit);
  • random orientation from one corner of the room and a uniform location of one endpoint on the resulting line, plus a Bernoulli generation to decide on the orientation (with constraints on both ends for the needle to fit);
  • &tc.

I did not have time to implement those different generation mechanisms in R, but have little doubt they should lead to different probabilities of intersection between the needle and one of the board separations. I actually found a web-page at the University of Alabama Huntsville addressing this problem through exercises (plus 20,000 related entries! Including von MisesProbability, Statistics and Truth itself. A book I should read one of those days, following Andrew.). Note that each version corresponds to a physical mechanism. Thus that there is no way to distinguish between them. Had I time, I would also like to consider the limiting case when the room gets infinite as, presumably, some of those proposals would end up being identical.