In the recent days, we have had a lively discussion among AEs of the Annals of Statistics, as to whether or not set up a policy regarding publications of documents that have already been published in a shortened (8 pages) version in a machine learning conference like NIPS. Or AISTATS. While I obviously cannot disclose details here, the debate is quite interesting and may bring the machine learning and statistics communities closer if resolved in a certain way. My own and personal opinion on that matter is that what matters most is what’s best for Annals of Statistics rather than the authors’ tenure or the different standards in the machine learning community. If the submitted paper is based on a brilliant and novel idea that can appeal to a sufficiently wide part of the readership and if the maths support of that idea is strong enough, we should publish the paper. Whether or not an eight-page preliminary version has been previously published in a conference proceeding like NIPS does not seem particularly relevant to me, as I find those short papers mostly unreadable and hence do not read them. Since Annals of Statistics runs an anti-plagiarism software that is most likely efficient, blatant cases of duplications could be avoided. Of course, this does not solve all issues and papers with similar contents can and will end up being published. However, this is also the case for statistics journals and statistics, in the sense that brilliant ideas sometimes end up being split between two or three major journals.
Archive for the Books Category
[A review of Bayesian Essentials that appeared in Technometrics two weeks ago, with the first author being rechristened Jean-Michael!]
“Overall this book is a very helpful and useful introduction to Bayesian methods of data analysis. I found the use of R, the code in the book, and the companion R package, bayess, to be helpful to those who want to begin using Bayesian methods in data analysis. One topic that I would like to see added is the use of Bayesian methods in change point problems, a topic that we found useful in a recent article and which could be added to the time series chapter. Overall this is a solid book and well worth considering by its intended audience.”
David E. BOOTH
Kent State University
Khoa Tran and Robert Kohn from UNSW just arXived a paper on a comprehensive derivation of a large range of MCMC algorithms, beyond Metropolis-Hastings. The idea is to decompose the MCMC move into
- a random completion of the current value θ into V;
- a deterministic move T from (θ,V) to (ξ,W), where only ξ matters.
If this sounds like a new version of Peter Green’s completion at the core of his 1995 RJMCMC algorithm, it is because it is indeed essentially the same notion. The resort to this completion allows for a standard form of the Metropolis-Hastings algorithm, which leads to the correct stationary distribution if T is self-inverse. This representation covers Metropolis-Hastings algorithms, Gibbs sampling, Metropolis-within-Gibbs and auxiliary variables methods, slice sampling, recursive proposals, directional sampling, Langevin and Hamiltonian Monte Carlo, NUTS sampling, pseudo-marginal Metropolis-Hastings algorithms, and pseudo-marginal Hamiltonian Monte Carlo, as discussed by the authors. Given this representation of the Markov chain through a random transform, I wonder if Peter Glynn’s trick mentioned in the previous post on retrospective Monte Carlo applies in this generic setting (as it could considerably improve convergence…)
Sid Chib, Minchul Shin, and Anna Simoni (CREST) recently arXived a paper entitled “Bayesian Empirical Likelihood Estimation and Comparison of Moment Condition Models“. That Sid mentioned to me in Sardinia. The core notion is related to earlier Bayesian forays into empirical likelihood pseudo-models, like Lazar (2005) or our PNAS paper with Kerrie Mengersen and Pierre Pudlo. Namely to build a pseudo-likelihood using empirical likelihood principles and to derive the posterior associated with this pseudo-likelihood. Some novel aspects are the introduction of tolerance (nuisance) extra-parameters when some constraints do not hold, a maximum entropy (or exponentially tilted) representation of the empirical likelihood function, and a Chib-Jeliazkov representation of the marginal likelihood. The authors obtain a Bernstein-von Mises theorem under correct specification. Meaning convergence. And another one under misspecification.
While the above Bernstein-von Mises theory is somewhat expected (if worth deriving) in the light of frequentist consistency results, the paper also considers a novel and exciting aspect, namely to compare models (or rather moment restrictions) by Bayes factors derived from empirical likelihoods. A grand (encompassing) model is obtained by considering all moment restrictions at once, which first sounds like more restricted, except that the extra-parameters are there to monitor constraints that actually hold. It is unclear from my cursory read of the paper whether priors on those extra-parameters can be automatically derived from a single prior. And how much they impact the value of the Bayes factor. The consistency results found in the paper do not seem to depend on the form of priors adopted for each model (for all three cases of both correctly, one correctly and none correctly specified models). Except maybe for some local asymptotic normality (LAN). Interestingly (?), the authors consider the Poisson versus Negative Binomial test we used in our testing by mixture paper. This paper is thus bringing a better view of the theoretical properties of a pseudo-Bayesian approach based on moment conditions and empirical likelihood approximations. Without a clear vision of the implementation details, from the parameterisation of the constraints (which could be tested the same way) to the construction of the prior(s) to the handling of MCMC difficulties in realistic models.
This is the cover page of Marco Banterle‘s thesis, who will defend on Thursday [July 21, 13:00], at a rather quiet time for French universities, which is one reason for advertising it here. The thesis is built around several of Marco’s papers, like delayed acceptance, dimension expansion, and Gaussian copula for graphical models. The defence is open to everyone, so feel free to join if near Paris-Dauphine!
As I was previously unaware of this book coming up, my surprise and excitement were both extreme when I received it from CRC Press a few weeks ago! John Chambers, one of the fathers of S, precursor of R, had just published a book about extending R. It covers some reflections of the author on programming and the story of R (Parts 2 and 1), and then focus on object-oriented programming (Part 3) and the interfaces from R to other languages (Part 4). While this is “only” a programming book, and thus not strictly appealing to statisticians, reading one of the original actors’ thoughts on the past, present, and future of R is simply fantastic!!! And John Chambers is definitely not calling to simply start over and build something better, as Ross Ihaka did in this [most read] post a few years ago. (It is also great to see the names of friends appearing at times, like Julie, Luke, and Duncan!)
“I wrote most of the original software for S3 methods, which were useful for their application, in the early 1990s.”
In the (hi)story part, Chambers delves into the details of the evolution of S at Bells Labs, as described in his [first] “blue book” (which I kept on my shelf until very recently, next to the “white book“!) and of the occurrence of R in the mid-1990s. I find those sections fascinating maybe the more because I am somewhat of a contemporary, having first learned Fortran (and Pascal) in the mid-1980’s, before moving in the early 1990s to C (that I mostly coded as translated Pascal!), S-plus and eventually R, in conjunction with a (forced) migration from Unix to Linux, as my local computer managers abandoned Unix and mainframe in favour of some virtual Windows machines. And as I started running R on laptops with the help of friends more skilled than I (again keeping some of the early R manuals on my shelf until recently). Maybe one of the most surprising things about those reminiscences is that the very first version of R was dated Feb 29, 2000! Not because of Feb 29, 2000 (which, as Chambers points out, is the first use of the third-order correction to the Gregorian calendar, although I would have thought 1600 was the first one), but because I would have thought it appeared earlier, in conjunction with my first Linux laptop, but this memory is alas getting too vague!
As indicated above, the book is mostly about programming, which means in my case that some sections are definitely beyond my reach! For instance, reading “the onus is on the person writing the calling function to avoid using a reference object as the argument to an existing function that expects a named list” is not immediately clear… Nonetheless, most sections are readable [at my level] and enlightening about the mottoes “everything that exists is an object” and “everything that happens is a function” repeated throughout. (And about my psycho-rigid ways of translating Pascal into every other language!) I obviously learned about new commands and notions, like the difference between
x <- 3
x <<- 3
(but I was disappointed to learn that the number of <‘s was not related with the depth or height of the allocation!) In particular, I found the part about replacement fascinating, explaining how a command like
diag(x)[i] = 3
could modify x directly. (While definitely worth reading, the chapter on R packages could have benefited from more details. But as Chambers points out there are whole books about this.) Overall, I am afraid the book will not improve my (limited) way of programming in R but I definitely recommend it to anyone even moderately skilled in the language.
And yet another series [suggested by Amazon] I chose at random after reading the summary… The Grisha trilogy was written by Leigh Bardugo and is told by Alina Starkov, a teenage orphan from the fantasy land of Ravka [sounds like Russia, doesn’t it?!] who suddenly discovers powers she did not suspect when fighting supernatural forces. And embarks on a bleak adventure with her childhood friend to safe their country from dark forces. A rather standard trope for the fantasy literature.. The books read well, in a light sense (or mind candy variety, to borrow from the Three-Toed Sloth blog) if addictive. I went over the first one, Shadow and Bone, within a travel day to München and back. Certainly not a major trilogy. And still, those books attracted massive and enthusiastic reviews (one for each book, from different young readers) in The Guardian! And another one in the NYT, nothing less… The explanation is that what I did not get before starting the trilogy [but started suspecting well into the first volume] this is a young adult (or teenager) series. Or even a children’s book, according to The Guardian! So do not expect any level of subtlety or elaborate plots or clever connections with our own world history. Even the Russian environment is caricaturesque with an annoying flow of kvas and tea and caftans. One character is closely related to Rasputin, the ruling family reminds me of the Romanovs, old and grumpy babushkas pop in now and then, the heroes hunt a firebird, &tc. And still the addiction operates to some level. [Try at your own risk and give the books to younger readers if it does not work!]