I however got most interested by another comment by MacCoun and Perlmutter, where they advocate a systematic blinding of data to avoid conscious or unconscious biases. While I deem the idea quite interesting and connected with anonymisation techniques in data privacy, I find the presentation rather naïve in its goals (from a statistical perspective). Indeed, if we consider data produced by a scientific experiment towards the validation or invalidation of a scientific hypothesis, it usually stands on its own, with no other experiment of a similar kind to refer to. Add too much noise and only noise remains. Add too little and the original data remains visible. This means it is quite difficult to calibrate the blinding mechanisms in order for the blinded data to remain realistic enough to be analysed. Or to be different enough from the original data for different conclusions to be drawn. The authors suggest blinding being done by a software, by adding noise, bias, label switching, &tc. But I do not think this blinding can be done blindly, i.e., without a clear idea of what the possible models are, so that the perturbed datasets created out of the original data favour more one of the models under comparison. And are realistic for at least one of those models. Thus, some preliminary analysis of the original or of some pseudo-data from each of the proposed models is somewhat unavoidable to calibrate the blinding machinery towards realistic values. If designing a new model is part of the inferential goals, this may prove impossible… Again, I think having several analyses run in parallel with several perturbed datasets quite a good idea to detect the impact of some prior assumptions. But this requires statistically savvy programmers. And possibly informative prior distributions.

Filed under: Books, Statistics Tagged: blinding, data privacy, maths house, Nature, Red State Blue State, reproducible research, Royal Statistical Society, Sally Clark, University of Warwick ]]>

The paper does not start in the best possible light since it seems to justify the use of a sample mean through maximum likelihood estimation, which only is the case for a limited number of probability distributions (including the Normal distribution, which may be an implicit assumption). For instance, when the data is Student’s t, the MLE is not the sample mean, no matter how shocking that might sounds! (And while this is a minor issue, results about the Stein effect taking place in non-normal settings appear much earlier than 1998. And earlier than in my dissertation. See, e.g., Berger and Bock (1975). Or in Brandwein and Strawderman (1978).)

While the linear regression explanation for the Stein effect is already exposed in Steve Stigler’s Neyman Lecture, I still have difficulties with the argument in that for instance we do not know the value of the parameter, which makes the regression and the inverse regression of parameter means over Gaussian observations mere concepts and nothing practical. (Except for the interesting result that two observations make both regressions coincide.) And it does not seem at all intuitive (to me) that imposing a constraint should improve the efficiency of a maximisation program…

Another difficulty I have with the discussion of the case against the MLE is not that there exist admissible estimators that dominate the MLE (when k≥5, as demonstrated by Bill Strawderman in 1975), but on the contrary that (a) there is an infinity of them and (b) they do not come out as closed-form expressions. Even for James’ and Stein’s or Efron’s and Morris’, shrinkage estimators, there exists a continuum of them, with no classical reason for picking one against the other.

Not that it really matters, but I also find rechristening the Stein phenomenon as *holistic pragmatism* somewhat inappropriate. Or just ungrounded. It seems to me the phenomenon simply relates to collective decision paradoxes, with multidimensional or multi-criteria utility functions having no way of converging to a collective optimum. As illustrated in [Lakanal alumni] Allais’ paradox.

“We think the most plausible Bayesian response to Stein’s results is to either reject them outright or to adopt an instrumentalist view of personal probabilities.”

The part connecting Stein with Bayes again starts on the wrong foot, since it is untrue that *any* shrinkage estimator can be expressed as a Bayes posterior mean. This is not even true for the *original* James-Stein estimator, i.e., it is not a Bayes estimator and cannot be a Bayes posterior mean. I also do neither understand nor relate to the notion of “Bayesians of the first kind”, especially when it merges with an empirical Bayes argument. More globally, the whole discourse about Bayesians “taking account of Stein’s result” does not stand on very sound ground because Bayesians automatically integrate the shrinkage phenomenon when minimising a posterior loss. Rather than trying to accommodate it as a special goal. Laughing (in the paper) at the prior assumption that all means should be “close” to zero or “close together” does not account for the choice of the location (zero) or scale (one) when measuring quantities of interest. And for the fact that Stein’s effect holds even when the means are far from zero or from being similar, albeit as a minuscule effect. That is, when the prior disagrees with the data, because Stein’s phenomenon is a frequentist occurrence. What I find amusing is instead to mention a “prior whose probability mass is centred about the sample mean”. (I am unsure the authors are aware that the shrinkage effect is irrelevant for all practical purposes unless the true means are close to the shrinkage centre.) *And* to state that improper priors “integrate to a number larger than 1” and that “it’s not possible to be more than 100% confident in anything”… *And* to confuse the Likelihood Principle with the prohibition of data dependent priors. *And* to consider that the MLE and any shrinkage estimator have the same expected utility under a flat prior (since, if they had, there would be no Bayes estimator!). The only part with which I can agree is, again, that Stein’s phenomenon is a *frequentist* notion. But one that induces us to use Bayes estimators as the only coherent way to make use of the loss function. The paper is actually silent about the duality existing between losses and priors, duality that would put Stein’s effect into a totally different light. As expressed e.g. in Herman Rubin’s paper. Because shrinkage both in existence and in magnitude is deeply connected with the choice of the loss function, arguing against an almost universal Bayesian perspective of shrinkage while adhering to a single loss function is rather paradoxical. Similarly, very little of substance can be found about empirical Bayes estimation and its philosophical foundations.

While it is generally agreed that shrinkage estimators trade some bias for a decrease in variance, the connection with AIC is at best tenuous. Because AIC or other model choice tools are not estimation devices *per se*. And because they force infinite shrinkage, namely to have some components of the estimator precisely equal to zero. Which is an impossibility for Bayes estimates. A much more natural (and already made) connection would be to relate shrinkage and LASSO estimators, since the difference can be rephrased as the opposition between Gaussian and Laplace priors.

I also object at the concept of “linguistic invariance”, which simply means (for me) absolute invariance, namely that the estimate of the transform must be the transform of the estimate for every and all transforms. Which holds for the MLE. But also, contrary to the author’s assertion, for Bayes estimation under my intrinsic loss functions.

“But when and how problems should be lumped together or split apart remains an important open problem in statistics.”

The authors correctly point out the accuracy of AIC (over BIC) for making predictions, but shrinkage does not necessarily suffer from this feature as Stein’s phenomenon also holds for prediction, if predicting enough values at the same time… I also object to the envisioned possibility of a shrinkage estimator that would improve every component of the MLE (in a uniform sense) as it contradicts the admissibility of the single component MLE! And the above quote shows the decision theoretic part of inference is not properly incorporated.

Overall, I thus clearly wonder at the purpose of the paper, given the detailed coverage of many aspects of the Stein phenomenon provided by Stephen Stigler and others over the years. Obviously, a new perspective is always welcome, but this paper somewhat lacks enough appeal. While missing essential features making the Stein phenomenon look like a poor relative of Bayesian inference. In my opinion, Stein’s phenomenon remains an epi-phenomenon, which rather signals the end of the search for a golden standard in frequentist estimation than the opening of a new era of estimation. It pushed me almost irresistibly into Bayesianism, a move I do not regret to this day! *In fine*, I also have trouble seeing Stein’s phenomenon as durably impacting the field, more than 50 years later, and hence think it remains of little importance for epistemology and philosophy of science. Except maybe for marking the end of an era, where the search for “the” ideal estimator was still on the agenda.

Filed under: Books, pictures, Statistics, University life Tagged: Bayesian Analysis, Bayesian Choice, Charles Stein, decision theory, frequentist inference, James-Stein estimator, loss functions, philosophy of sciences, Stein effect, Stein's phenomenon, Stephen Stigler ]]>

**A**s the latest Neal Stephenson’s novel, I was waiting most eagerly to receive Seveneves (or SevenEves ). Now I have read it, I am a bit disappointed by the book. It is a terrific concept, full of ideas and concepts, linking our current society and its limitations with what a society exiled in space could become, and with a great style as well, but as far as the story itself goes I have trouble buying it! In short, there is too much technology and not enough psychology, too many details and not enough of a grand scheme… This certainly is far from being the best book of the author. When compared with Snow Crash, Cryptonomicon, Anathem, or Reamde for instance. Even the fairly long and meandering Baroque Cycle comes on top of this space opera à la Arthur Clarke (if only for the cables linking Earth and space stations at 36,000 kilometres…).

The basis of Seveneves is a catastrophic explosion of our Moon that leads to the obliteration of live on Earth within a range of two years. The only way out is to send a small number of people to a space station with enough genetic material to preserve the diversity of the Human species. Two-third of the book is about the frantic scramble to make this possible. Then Earth is bombarded by pieces of the Moon, while the inhabitants of the expanded space station try to get organised and to get more energy from iced asteroids to get out of the way, while badly fighting for power. This leads the crowd of survivors to eventually reduce to seven women, hence the seven Eves. Then, a five thousand year hiatus, and the last part of the book deals with the new Human society, hanging up in a gigantic sphere of space modules around the regenerated Earth, where we follow a team of seven (!) characters whose goal is not exactly crystal clear.

While most books by Stephenson manage to produce a good plot on top of fantastic ideas, with some characters developed with enough depth to be really compelling, this one is missing at the plot level and even more at the character level, maybe because we know most characters are supposed to die very early in the story. But they do look like caricatures, frankly! And behave like kids astray on a desert island. Unless I missed the deeper message… The construction of the spatial mega-station is detailed in such details that it hurts!, but some logistic details on how to produce food or energy are clearly missing. And missing is also the feat of reconstituting an entire Human species out of *seven* women, even with a huge bank of human DNAs. The description of the station five thousand years later is even more excruciatingly precise. At a stage where I have mostly lost interest in the story, especially to find very little differences in the way the new and the old societies operate. And to avoid spoilers, gur er-nccnevgvba bs gur gjb tebhcf bs crbcyr jub erznvarq ba Rnegu, rvgure uvqqra va n qrrc pnir be ng gur obggbz bs gur qrrcrfg gerapu, vf pbzcyrgryl vzcynhfvoyr, sbe ubj gurl pbhyq unir fheivirq bire gubhfnaqf bs lrnef jvgu ab npprff gb erfbheprf rkprcg jung gurl unq cnpxrq ng gur ortvaavat… It took me some effort and then some during several sleepless nights to get over this long book and I remain perplexed at the result, given the past masterpieces of the author.

Filed under: Books, Kids Tagged: Anathem, echidna, Neal Stephenson, ROT13, Seveneves, Snow Crash, space opera ]]>

*“The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.” *I.J. Good

**I** saw the nice cover of Superintelligence: paths, dangers, strategies by Nick Bostrom [owling at me!] at the OUP booth at JSM this summer—nice owl cover that comes will a little philosophical fable at the beginning about sparrows—and, after reading an in-depth review [in English] by Olle Häggström, on Häggström hävdar, asked OUP for a review copy. Which they sent immediately. The reason why I got (so) interested in the book is that I am quite surprised at the level of alertness about the dangers of artificial intelligence (or computer intelligence) taking over. As reported in an earlier blog, and with no expertise whatsoever in the field, I was not and am not convinced that the uncontrolled and exponential rise of non-human or non-completely human intelligences is the number One entry in Doom Day scenarios. (As made clear by Radford Neal and Corey Yanovsky in their comments, I know nothing worth reporting about those issues, but remain presumably irrationally more concerned about climate change and/or a return to barbarity than by the incoming reign of the machines.) Thus, having no competence in the least in either intelligence (!), artificial or human, or in philosophy and ethics, the following comments on the book only reflect my neophyte’s reactions. *Which means the following rant should be mostly ignored! Except maybe on a rainy day like today…*

“The ideal is that of the perfect Bayesian agent, one that makes probabilistically optimal use of available information. This idea is unattainable (…) Accordingly, one can view artificial intelligence as a quest to find shortcuts…” (p.9)

Overall, the book stands much more at a philosophical and exploratory level than at attempting any engineering or technical assessment. The graphs found within are sketches rather than outputs of carefully estimated physical processes. There is thus hardly any indication how those super AIs could be coded towards super abilities to produce paper clips (but why on Earth would we need paper clips in a world dominated by AIs?!) or to involve all resources from an entire galaxy to explore even farther. The author envisions (mostly catastrophic) scenarios that require some suspension of belief and after a while I decided to read the book mostly as a higher form of science fiction, from which a series of lower form science fiction books could easily be constructed! Some passages reminded me quite forcibly of Philip K. Dick, less of electric sheep &tc. than of Ubik, where a superpowerful AI(s) turn humans into jar brains satisfied (or ensnared) with simulated virtual realities. Much less of Asimov’s novels as robots are hardly mentioned. And the third laws of robotics dismissed as ridiculously too simplistic (and too human).

“These occasions grace us with the opportunity to abandon a life of overconfidence and resolve to become better Bayesians.” (p.130)

Another level at which to read the book is as a deep reflection on the notions of intelligence, ethics, and morality. In the human sense. Indeed, before defining and maybe controlling such notions for machines, we should reflect how they are defined or coded for humans. I do not find the book very successful at this level (but, again, I know nothing!), as even intelligence does not get a clear definition, maybe because it is simply impossible to do so. The section on embryo selection towards more intelligent newborns made me cringe, not only because of the eugenic tones, but also because I am not aware of any characterisation so far of gene mutations promoting intelligence. (So far, of course, and the book generally considers that any technology or advance that is conceivable now will eventually be conceived. Presumably thanks to our own species’ intelligence.) And of course the arguments get much less clear when ethics and morality are concerned. Which brings me to one question I kept asking myself when going through the book, namely why would we be interested in replicating a human brain and its operation, or creating a superintelligent and self-enhancing machine, except for the sake of proving we can do it? With a secondary question, why would a superintelligent AI necessarily and invariably want to take over the world, a running assumption throughout the book?

“Within a Bayesian framework, we can think of the epistemology as a prior probability function.” (p.224)

While it is an easy counter-argument (and thus can be easily countered itself), notions that we can control the hegemonic tendencies of a powerful AI by appealing to utility and game theory are difficult to accept. This formalism hardly works for us (irrational) humans, so I see no reason why an inhuman form of intelligence could be thus constrained, as it can as well pick another form of utility or game theory as it evolves, following a inhuman logic that we cannot even fathom. Everything is possible, not even the sky is a limit… Even the conjunction of super AIs and of nano-technologies, from which we should be protected by the AI(s) if I follow the book (p.131). The difference between both actually is a matter of perspective as we can envision a swarm of nano-particules endowed with a collective super-intelligence…

“At a pre-set time, nanofactories producing nerve gas or target-seeking-mosquito-like robots might then burgeon forth simultaneously from every square metre of the globe.” (p.97)

Again, this is a leisurely read with no attempt at depth. If you want a deeper perspective, read for instance Olle Häggstöom’s review. Or ask Bill Gates, who “highly recommend this book” as indicated on the book cover. I found the book enjoyable in its systematic exploration of “all” possible scenarios and its connections with (Bayesian) decision theory and learning. As well as well-written, with a pleasant style, rich in references as well as theories, scholarly in its inclusion of as many aspects as possible, possibly lacking some backup from a scientific perspective, and somehow too tentative and exploratory. I cannot say I am now frightened by the emergence of the amoral super AIs or on the contrary reassured that there could be ways of keeping them under human control. (A primary question I did not see processed and would have liked to see is why we should fight this emergence. If AIs are much more intelligent than us, shouldn’t we defer to this intelligence? Just like we cannot not fathom chicken resisting their (unpleasant) fate, except in comics like Chicken Run… Thus completing the loop with the owl.)

Filed under: Books, Statistics, Travel, University life Tagged: 2001: A Space Odyssey, AIs, artificial intelligence, Bill Gates, Chicken Run, doomsday argument, ethics, HAL, intelligence, Isaac Asimov, JSM 2015, morality, Nick Bostrom, Philip K. DIck, Seattle ]]>

Filed under: Kids, pictures, Travel, Wines Tagged: 13 Novembre 2015, France, Paris ]]>

“The idea of a significance test, I suppose, putting half the probability into a constant being 0, and distributing the other half over a range of possible values.”H. Jeffreys

The authors analyse Jeffreys’ 1935 paper on significance tests, which appears to be the very first occurrence of a Bayes factor in his bibliography, testing whether or not two probabilities are equal. They also show the roots of this derivation in earlier papers by Dorothy Wrinch and Harold Jeffreys. [As an “aside”, the early contributions of Dorothy Wrinch to the foundations of 20th Century Bayesian statistics are hardly acknowledged. A shame, when considering they constitute the basis and more of Jeffreys’ 1931 *Scientific Inference*, Jeffreys who wrote in her necrology “I should like to put on record my appreciation of the substantial contribution she made to [our joint] work, which is the basis of all my later work on scientific inference.” In retrospect, Dorothy Wrinch should have been co-author to this book…] As early as 1919. These early papers by Wrinch and Jeffreys are foundational in that they elaborate a construction of prior distributions that will eventually see the Jeffreys non-informative prior as its final solution [*Jeffreys priors* that should be called *Lhostes priors* according to Steve Fienberg, although I think Ernest Lhoste only considered a limited number of transformations in his invariance rule]. The 1921 paper contains *de facto* the Bayes factor but it does not appear to be advocated as a tool *per se* for conducting significance tests.

“The historical records suggest that Haldane calculated the first Bayes factor, perhaps almost by accident, before Jeffreys did.” A. Etz and E.J. Wagenmakers

As another interesting aside, the historical account points out that Jeffreys came out in 1931 with what is now called Haldane’s prior for a Binomial proportion, proposed in 1931 (when the paper was read) and in 1932 (when the paper was published in the *Mathematical Proceedings of the Cambridge Philosophical Society)* by Haldane. The problem tackled by Haldane is again a significance on a Binomial probability. Contrary to the authors, I find the original (quoted) text quite clear, with a prior split before a uniform on [0,½] and a point mass at ½. Haldane uses a posterior odd [of 34.7] to compare both hypotheses but… I see no trace in the quoted material that he ends up using the Bayes factor as such, that is as his decision rule. (I acknowledge *decision rule* is anachronistic in this setting.) On the side, Haldane also implements model averaging. Hence my reading of this reading of the 1930’s literature is that it remains unclear that Haldane perceived the Bayes factor as a Bayesian [another anachronism] inference tool, upon which [and only which] significance tests could be conducted. That Haldane had a remarkably modern view of splitting the prior according to two orthogonal measures and of correctly deriving the posterior odds is quite clear. With the very neat trick of removing the infinite integral at p=0, an issue that Jeffreys was fighting with at the same time. In conclusion, I would thus rephrase the major finding of this paper as Haldane should get the priority in deriving the Bayesian significance test for point null hypotheses, rather than in deriving the Bayes factor. But this may be my biased views of Bayes factors speaking there…

Another amazing fact I gathered from the historical work of Etz and Wagenmakers is that Haldane and Jeffreys were geographically very close while working on the same problem and hence should have known and referenced their respective works. Which did not happen.

Filed under: Books, Statistics Tagged: Bayes factors, full Bayesian significance test, Haldane's prior, Harold Jeffreys, Jack Haldane, Jeffreys priors, non-informative priors, scientific inference ]]>

## Datblygwch Fferm Tynton yn Ganolfan Ymwelwyr a Gwybodaeth

Rydym yn galw ar Lywodraeth Cymru i gydnabod cyfraniad pwysig Dr Richard Price nid yn unig i’r Oes Oleuedig yn y ddeunawfed ganrif, ond hefyd i’r broses o greu’r byd modern yr ydym yn byw ynddo heddiw, a datblygu ei fan geni a chartref ei blentyndod yn ganolfan wybodaeth i ymwelwyr lle gall pobl o bob cenedl ac oed ddarganfod sut mae ei gyfraniadau sylweddol i ddiwinyddiaeth, mathemateg ac athroniaeth wedi dylanwadu ar y byd modern.

Filed under: Books, pictures, Statistics, Travel, University life Tagged: ISBA, Richard Price, Richard Price Society, Thomas Bayes, Wales, Welsh ]]>

Î(θ,**u**)q(**u**)/C

and a Metropolis-Hastings proposal on that target simulating from k(θ,θ’)q(**u’**) *[meaning the auxiliary is simulated independently]* recovers the pseudo-marginal Metropolis-Hastings ratio

Î(θ’,**u**‘)k(θ’,θ)/Î(θ,**u**)k(θ,θ’)

(which is a nice alternative proof that the method works!). The novel idea in the paper is that the proposal on the auxiliary **u** can be of a different form, while remaining manageable. For instance, as a two-block Gibbs sampler. Or an elliptical slice sampler for the **u** component. The argument being that an independent update of **u** may lead the joint chain to get stuck. Among the illustrations in the paper, an Ising model (with no phase transition issue?) and a Gaussian process applied to the Pima Indian data set (despite a recent prohibition!). From the final discussion, I gather that the modification should be applicable to every (?) case when a pseudo-marginal approach is available, since the auxiliary distribution q(**u**) is treated as a black box. Quite an interesting read and proposal!

Filed under: Books, Statistics, University life Tagged: Alan Turing Institute, auxiliary variable, doubly intractable problems, pseudo-marginal MCMC, slice sampling, University of Warwick ]]>

Filed under: Kids, Uncategorized Tagged: International Day for the Elimination of Violence against Women, Orange Day, UNiTE to End Violence against Women ]]>

#target is N(0,1) #proposal is N(0,.01) T=1e5 prop=x=rnorm(T,sd=.01) ratop=dnorm(prop,log=TRUE)-dnorm(prop,sd=.01,log=TRUE) ratav=ratop[1] logu=ratop-log(runif(T)) for (t in 2:T){ if (logu[t]>ratav){ x[t]=prop[t];ratav=ratop[t]}else{x[t]=x[t-1]} }

It produces outputs of the following shape

which is quite amazing because of the small variance. The reason for the lengthy freezes of the chain is the occurrence with positive probability of realisations from the proposal with very small proposal density values, as they induce very small Metropolis-Hastings acceptance probabilities and are almost “impossible” to leave. This is due to the lack of control of the target, which is flat over the domain of the proposal for all practical purposes. Obviously, in such a setting, the outcome is unrelated with the N(0,1) target!

It is also unrelated with the normal proposal in that switching to a t distribution with 3 degrees of freedom produces a similar outcome:

It is only when using a Cauchy proposal that the pattern vanishes:

Filed under: Kids, pictures, R, Statistics, University life Tagged: acceptance probability, convergence assessment, heavy-tail distribution, independent Metropolis-Hastings algorithm, Metropolis-Hastings algorithm, normal distribution, Student's t distribution ]]>