Archive for the Statistics Category

superintelligence [book review]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on November 28, 2015 by xi'an

“The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.” I.J. Good

I saw the nice cover of Superintelligence: paths, dangers, strategies by Nick Bostrom [owling at me!] at the OUP booth at JSM this summer—nice owl cover that comes will a little philosophical fable at the beginning about sparrows—and, after reading an in-depth review [in English] by Olle Häggström, on Häggström hävdar, asked OUP for a review copy. Which they sent immediately. The reason why I got (so) interested in the book is that I am quite surprised at the level of alertness about the dangers of artificial intelligence (or computer intelligence) taking over. As reported in an earlier blog, and with no expertise whatsoever in the field, I was not and am not convinced that the uncontrolled and exponential rise of non-human or non-completely human intelligences is the number One entry in Doom Day scenarios. (As made clear by Radford Neal and Corey Yanovsky in their comments, I know nothing worth reporting about those issues, but remain presumably irrationally more concerned about climate change and/or a return to barbarity than by the incoming reign of the machines.) Thus, having no competence in the least in either intelligence (!), artificial or human, or in philosophy and ethics, the following comments on the book only reflect my neophyte’s reactions. Which means the following rant should be mostly ignored! Except maybe on a rainy day like today…

“The ideal is that of the perfect Bayesian agent, one that makes probabilistically optimal use of available information.  This idea is unattainable (…) Accordingly, one can view artificial intelligence as a quest to find shortcuts…” (p.9)

Overall, the book stands much more at a philosophical and exploratory level than at attempting any engineering or technical assessment. The graphs found within are sketches rather than outputs of carefully estimated physical processes. There is thus hardly any indication how those super AIs could be coded towards super abilities to produce paper clips (but why on Earth would we need paper clips in a world dominated by AIs?!) or to involve all resources from an entire galaxy to explore even farther. The author envisions (mostly catastrophic) scenarios that require some suspension of belief and after a while I decided to read the book mostly as a higher form of science fiction, from which a series of lower form science fiction books could easily be constructed! Some passages reminded me quite forcibly of Philip K. Dick, less of electric sheep &tc. than of Ubik, where a superpowerful AI(s) turn humans into jar brains satisfied (or ensnared) with simulated virtual realities. Much less of Asimov’s novels as robots are hardly mentioned. And the third laws of robotics dismissed as ridiculously too simplistic (and too human). Continue reading

on the origin of the Bayes factor

Posted in Books, Statistics with tags , , , , , , , on November 27, 2015 by xi'an

Alexander Etz and Eric-Jan Wagenmakers from the Department of Psychology of the University of Amsterdam just arXived a paper on the invention of the Bayes factor. In particular, they highlight the role of John Burdon Sanderson (J.B.S.) Haldane in the use of the central tool for Bayesian comparison of hypotheses. In short, Haldane used a Bayes factor before Jeffreys did!

“The idea of a significance test, I suppose, putting half the probability into a constant being 0, and distributing the other half over a range of possible values.”H. Jeffreys

The authors analyse Jeffreys’ 1935 paper on significance tests, which appears to be the very first occurrence of a Bayes factor in his bibliography, testing whether or not two probabilities are equal. They also show the roots of this derivation in earlier papers by Dorothy Wrinch and Harold Jeffreys. [As an “aside”, the early contributions of Dorothy Wrinch to the foundations of 20th Century Bayesian statistics are hardly acknowledged. A shame, when considering they constitute the basis and more of Jeffreys’ 1931 Scientific Inference, Jeffreys who wrote in her necrology “I should like to put on record my appreciation of the substantial contribution she made to [our joint] work, which is the basis of all my later work on scientific inference.” In retrospect, Dorothy Wrinch should have been co-author to this book…] As early as 1919. These early papers by Wrinch and Jeffreys are foundational in that they elaborate a construction of prior distributions that will eventually see the Jeffreys non-informative prior as its final solution [Jeffreys priors that should be called Lhostes priors according to Steve Fienberg, although I think Ernest Lhoste only considered a limited number of transformations in his invariance rule]. The 1921 paper contains de facto the Bayes factor but it does not appear to be advocated as a tool per se for conducting significance tests.

“The historical records suggest that Haldane calculated the first Bayes factor, perhaps almost by accident, before Jeffreys did.” A. Etz and E.J. Wagenmakers

As another interesting aside, the historical account points out that Jeffreys came out in 1931 with what is now called Haldane’s prior for a Binomial proportion, proposed in 1931 (when the paper was read) and in 1932 (when the paper was published in the Mathematical Proceedings of the Cambridge Philosophical Society) by Haldane. The problem tackled by Haldane is again a significance on a Binomial probability. Contrary to the authors, I find the original (quoted) text quite clear, with a prior split before a uniform on [0,½] and a point mass at ½. Haldane uses a posterior odd [of 34.7] to compare both hypotheses but… I see no trace in the quoted material that he ends up using the Bayes factor as such, that is as his decision rule. (I acknowledge decision rule is anachronistic in this setting.) On the side, Haldane also implements model averaging. Hence my reading of this reading of the 1930’s literature is that it remains unclear that Haldane perceived the Bayes factor as a Bayesian [another anachronism] inference tool, upon which [and only which] significance tests could be conducted. That Haldane had a remarkably modern view of splitting the prior according to two orthogonal measures and of correctly deriving the posterior odds is quite clear. With the very neat trick of removing the infinite integral at p=0, an issue that Jeffreys was fighting with at the same time. In conclusion, I would thus rephrase the major finding of this paper as Haldane should get the priority in deriving the Bayesian significance test for point null hypotheses, rather than in deriving the Bayes factor. But this may be my biased views of Bayes factors speaking there…

Another amazing fact I gathered from the historical work of Etz and Wagenmakers is that Haldane and Jeffreys were geographically very close while working on the same problem and hence should have known and referenced their respective works. Which did not happen.

The Richard Price Society

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , on November 26, 2015 by xi'an

As an item of news coming to me via ISBA News, I learned of the Richard Price Society and of its endeavour to lobby for the Welsh government to purchase Richard Price‘s birthplace as an historical landmark. As discussed in a previous post, Price contributed so much to Bayes’ paper that one may wonder who made the major contribution. While I am not very much inclined in turning old buildings into museums, feel free to contact the Richard Price Society to support this action! Or to sign the petition there. Which I cannot resist but  reproduce in Welsh:

Datblygwch Fferm Tynton yn Ganolfan Ymwelwyr a Gwybodaeth

​Rydym yn galw ar Lywodraeth Cymru i gydnabod cyfraniad pwysig Dr Richard Price nid yn unig i’r Oes Oleuedig yn y ddeunawfed ganrif, ond hefyd i’r broses o greu’r byd modern yr ydym yn byw ynddo heddiw, a datblygu ei fan geni a chartref ei blentyndod yn ganolfan wybodaeth i ymwelwyr lle gall pobl o bob cenedl ac oed ddarganfod sut mae ei gyfraniadau sylweddol i ddiwinyddiaeth, mathemateg ac athroniaeth wedi dylanwadu ar y byd modern.

pseudo slice sampling

Posted in Books, Statistics, University life with tags , , , , , on November 26, 2015 by xi'an

The workshop in Warwick last week made me aware of (yet) another arXiv posting I had missed: Pseudo-marginal slice sampling by Iain Murray and Matthew Graham. The idea is to mix the pseudo-marginal approach of Andrieu and Roberts (2009) with a noisy slice sampling scheme à la Neal (2003). The auxiliary random variable u used in the (pseudo-marginal) unbiased estimator of the target I(θ), Î(θ,u), and with distribution q(u) is merged with the random variable of interest so that the joint is


and a Metropolis-Hastings proposal on that target simulating from k(θ,θ’)q(u’) [meaning the auxiliary is simulated independently] recovers the pseudo-marginal Metropolis-Hastings ratio


(which is a nice alternative proof that the method works!). The novel idea in the paper is that the proposal on the auxiliary u can be of a different form, while remaining manageable. For instance, as a two-block Gibbs sampler. Or an elliptical slice sampler for the u component. The argument being that an independent update of u may lead the joint chain to get stuck. Among the illustrations in the paper, an Ising model (with no phase transition issue?) and a Gaussian process applied to the Pima Indian data set (despite a recent prohibition!). From the final discussion, I gather that the modification should be applicable to every (?) case when a pseudo-marginal approach is available, since the auxiliary distribution q(u) is treated as a black box. Quite an interesting read and proposal!

a programming bug with weird consequences

Posted in Kids, pictures, R, Statistics, University life with tags , , , , , , on November 25, 2015 by xi'an

One student of mine coded by mistake an independent Metropolis-Hastings algorithm with too small a variance in the proposal when compared with the target variance. Here is the R code of this implementation:

#target is N(0,1)
#proposal is N(0,.01)
for (t in 2:T){
  if (logu[t]>ratav){

It produces outputs of the following shape
smalvarwhich is quite amazing because of the small variance. The reason for the lengthy freezes of the chain is the occurrence with positive probability of realisations from the proposal with very small proposal density values, as they induce very small Metropolis-Hastings acceptance probabilities and are almost “impossible” to leave. This is due to the lack of control of the target, which is flat over the domain of the proposal for all practical purposes. Obviously, in such a setting, the outcome is unrelated with the N(0,1) target!

It is also unrelated with the normal proposal in that switching to a t distribution with 3 degrees of freedom produces a similar outcome:

It is only when using a Cauchy proposal that the pattern vanishes:

independent Metropolis-Hastings

Posted in Books, Statistics with tags , , , , , , on November 24, 2015 by xi'an

“In this paper we have demonstrated the potential benefits, both theoretical and practical, of the independence sampler over the random walk Metropolis algorithm.”

Peter Neal and Tsun Man Clement Lee arXived a paper on optimising the independent Metropolis-Hastings algorithm. I was a bit surprised at this “return” of the independent sampler, which I hardly mention in my lectures, so I had a look at the paper. The goal is to produce an equivalent to what Gelman, Gilks and Wild (1996) obtained for random walk samplers.  In the formal setting when the target is a product of n identical densities f, the optimal number k of components to update in one Metropolis-Hastings (within Gibbs) round is approximately 2.835/I, where I is the symmetrised Kullback-Leibler distance between the (univariate) target f and the independent proposal q. When I is finite. The most surprising part is that the optimal acceptance rate is again 0.234, as in the random walk case. This is surprising in that I usually associate the independent Metropolis-Hastings algorithm with high acceptance rates. But this is of course when calibrating the proposal q, not the block size k of the Gibbs part. Hence, while this calibration of the independent Metropolis-within-Gibbs sampler is worth the study and almost automatically applicable, it remains that it only applies to a certain category of problems where blocking can take place. As in the disease models illustrating the paper. And requires an adequate choice of proposal distribution for, otherwise, the above quote becomes inappropriate.

borderline infinite variance in importance sampling

Posted in Books, Kids, Statistics with tags , , , , , on November 23, 2015 by xi'an

borde1As I was still musing about the posts of last week around infinite variance importance sampling and its potential corrections, I wondered at whether or not there was a fundamental difference between “just” having a [finite] variance and “just” having none. In conjunction with Aki’s post. To get a better feeling, I ran a quick experiment with Exp(1) as the target and Exp(a) as the importance distribution. When estimating E[X]=1, the above graph opposes a=1.95 to a=2.05 (variance versus no variance, bright yellow versus wheat), a=2.95 to a=3.05 (third moment versus none, bright yellow versus wheat), and a=3.95 to a=4.05 (fourth moment versus none, bright yellow versus wheat). The graph below is the same for the estimation of E[exp(X/2)]=2, which has an integrand that is not square integrable under the target. Hence seems to require higher moments for the importance weight. Hard to derive universal theories from those two graphs, however they show that protection against sudden drifts in the estimation sequence. As an aside [not really!], apart from our rather confidential Confidence bands for Brownian motion and applications to Monte Carlo simulation with Wilfrid Kendall and Jean-Michel Marin, I do not know of many studies that consider the sequence of averages time-wise rather than across realisations at a given time and still think this is a more relevant perspective for simulation purposes.



Get every new post delivered to your Inbox.

Join 946 other followers