Archive for Significance

a Simpson paradox of sorts

Posted in Books, Kids, pictures, R with tags , , , , , , , , , on May 6, 2016 by xi'an

The riddle from The Riddler this week is about finding an undirected graph with N nodes and no isolated node such that the number of nodes with more connections than the average of their neighbours is maximal. A representation of a connected graph is through a matrix X of zeros and ones, on which one can spot the nodes satisfying the above condition as the positive entries of the vector (X1)^2-(X^21), if 1 denotes the vector of ones. I thus wrote an R code aiming at optimising this target

targe <- function(F){
  sum(F%*%F%*%rep(1,N)/(F%*%rep(1,N))^2<1)}

by mere simulated annealing:

rate <- function(N){ 
# generate matrix F
# 1. no single 
F=matrix(0,N,N) 
F[sample(2:N,1),1]=1 
F[1,]=F[,1] 
for (i in 2:(N-1)){ 
if (sum(F[,i])==0) 
F[sample((i+1):N,1),i]=1 
F[i,]=F[,i]} 
if (sum(F[,N])==0) 
F[sample(1:(N-1),1),N]=1 
F[N,]=F[,N] 
# 2. more connections 
F[lower.tri(F)]=F[lower.tri(F)]+
  sample(0:1,N*(N-1)/2,rep=TRUE,prob=c(N,1)) 
F[F>1]=1
F[upper.tri(F)]=t(F)[upper.tri(t(F))]
#simulated annealing
T=1e4
temp=N
targo=targe(F)
for (t in 1:T){
  #1. local proposal
  nod=sample(1:N,2)
  prop=F
  prop[nod[1],nod[2]]=prop[nod[2],nod[1]]=
     1-prop[nod[1],nod[2]]
  while (min(prop%*%rep(1,N))==0){
    nod=sample(1:N,2)
    prop=F
    prop[nod[1],nod[2]]=prop[nod[2],nod[1]]=
     1-prop[nod[1],nod[2]]}
  target=targe(prop)
  if (log(runif(1))*temp<target-targo){ 
    F=prop;targo=target} 
#2. global proposal 
  prop=F prop[lower.tri(prop)]=F[lower.tri(prop)]+
   sample(c(0,1),N*(N-1)/2,rep=TRUE,prob=c(N,1)) 
prop[prop>1]=1
  prop[upper.tri(prop)]=t(prop)[upper.tri(t(prop))]
  target=targe(prop)
  if (log(runif(1))*temp<target-targo){
      F=prop;targo=target}
   temp=temp*.999
   }
return(F)}

Eward SimpsonThis code returns quite consistently (modulo the simulated annealing uncertainty, which grows with N) the answer N-2 as the number of entries above average! Which is rather surprising in a Simpson-like manner since all entries but two are above average. (Incidentally, I found out that Edward Simpson recently wrote a paper in Significance about the Simpson-Yule paradox and him being a member of the Bletchley Park Enigma team. I must have missed out the connection with the Simpson paradox when reading the paper in the first place…)

exoplanets at 99.999…%

Posted in Books, pictures, Statistics, University life with tags , , , , , on January 22, 2016 by xi'an

The latest Significance has a short article providing some coverage of the growing trend in the discovery of exoplanets, including new techniques used to detect those expoplanets from their impact on the associated stars. This [presumably] comes from the recent book Cosmos: The Infographics Book of Space [a side comment: new books seem to provide material for many articles in Significance these days!] and the above graph is also from the book, not the ultimate infographic representation in my opinion given that a simple superposition of lines could do as well. Or better.

¨A common approach to ruling out these sorts of false positives involves running sophisticated numerical algorithms, called Monte Carlo simulations, to explore a wide range of blend scenarios (…) A new planet discovery needs to have a confidence of (…) a one in a million chance that the result is in error.”

The above sentence is obviously of interest, first because the detection of false positives by Monte Carlo hints at a rough version of ABC to assess the likelihood of the observed phenomenon under the null [no detail provided] and second because the probability statement in the end is quite unclear as of its foundations… Reminding me of the Higgs boson controversy. The very last sentence of the article is however brilliant, albeit maybe unintentionaly so:

“To date, 1900 confirmed discoveries have been made. We have certainly come a long way from 1989.”

Yes, 89 down, strictly speaking!

the latest Significance: Astrostats, black swans, and pregnant drivers [and zombies]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on February 4, 2015 by xi'an

Reading Significance is always an enjoyable moment, when I can find time to skim through the articles (before my wife gets hold of it!). This time, I lost my copy between my office and home, and borrowed it from Tom Nichols at Warwick with four mornings to read it during breakfast. This December issue is definitely interesting, as it contains several introduction articles on astro- and cosmo-statistics! One thing I had not noticed before is how a large fraction of the papers is written by authors of books, giving a quick entry or interview about their book. For instance, I found out that Roberto Trotta had written a general public book called the Edge of the Sky (All You Need to Know About the All-There-Is) which exposes the fundamentals of cosmology through the 1000 most common words in the English Language.. So Universe is replaced with All-There-Is! I can understand and to some extent applaud the intention, but it nonetheless makes for a painful read, judging from the excerpt, when researcher and telescope are not part of the accepted vocabulary. Reading the corresponding article in Significance let me a bit bemused at the reason provided for the existence of a multiverse, i.e., of multiple replicas of our universe, all with different conditions: multiplying the universes makes our more likely, while it sounds almost impossible on its own! This sounds like a very frequentist argument… and I am not even certain it would convince a frequentist. The other articles in this special astrostatistics section were of a more statistical nature, from estimating the number of galaxies to the chances of a big asteroid impact. Even though I found the graphical representation of the meteorite impacts in the past century because of the impact drawing in the background. However, when I checked the link to Carlo Zapponi’s website, I found the picture was a still of a neat animation of meteorites falling since the first report.

Continue reading

Price’s theorem?

Posted in Statistics with tags , , , , , , on March 16, 2013 by xi'an

A very interesting article by Martyn Hooper in Significance Feb. 2013 issue I just received. (It is available on-line for free.) It raises the question as to how much exactly Price contributed to the famous Essay… Given the percentage of the Essay that can be attributed to Price with certainty (Bayes’ part stops at page 14 out of 32 pages), given the lack of the original manuscript by Bayes, given the delay between the composition of this original manuscript (1755?), its delivery to Price (1761?) and its publication in 1763, given the absence of any other document published by Bayes on the topic, I tend to concur with Martyn Hooper (and Sharon McGrayne) that Price contributed quite significantly to the 1763 paper. Of course, it would sound quite bizarre to start calling our approach to Statistics Pricean or Pricey (or even Priceless!) Statistics, but this may constitute one of the most striking examples of Stigler’s Law of Eponymy!

Statistics may be harmful to your freedom

Posted in Statistics with tags , , , , , on January 29, 2013 by xi'an

On Wednesday, I was reading the freshly delivered Significance and esp. the several papers therein about statisticians being indicted, fired, or otherwise sued for doing statistics. I mentioned a while ago the possible interpretations of L’Aquila verdict (where I do not know whether any of the six scientists is a statistician), but did not know about Graciela Bevacqua‘s hardship in the Argentinian National Statistics Institute, nor about David Nutt being sacked from the Advisory Council on the Misuse of Drugs, nor about Peter Wilmshurst being sued by NMT (a US medical device corporation) for expressing concern about a clinical trial they conducted. What is most frightening in those stories is that those persons ended up facing those hardships without any support from their respective institutions (quite the opposite in two cases!). And then, on the way home, I further read that the former head of the Greek National Statistics Institute (Elstat) was fired and indicted for over-estimating the Greek deficit, after resisting official pressure to lower it down…  Tough job!

when the Earth was flat

Posted in Books with tags , , , , on January 22, 2013 by xi'an

I received yet another popular science book to review (for Significance), When the Earth was flat by Graeme Donald. The subtitle is “All the bits of Science we got wrong”, which is both very ambitious (“All”, really?!) and modest (in that most scientific theories are approximations waiting to be invalidated and improved by the next theory). (I wrote this review during my trip in Gainesville, maybe too quickly!)

The themes processed and debunked in this book are wide-ranging. In fact they do not necessarily fall under my definition of science. They often are related to commercial swindles and political agendas loosely based on plainly wrong scientific theories. The book is thus more about the uses of (poor) science than about Science itself. Continue reading

new significance (out)

Posted in Statistics, University life with tags , , , on July 8, 2012 by xi'an

I have just received the latest issue of significance (June 2012) and there are plenty of interesting articles in it (with no horror story as in the latest issue!). From the cover story about finding emperor penguin colonies on satellite images via guano stains (large scale!, with a terrific and terrifying extract from Mawson’s journal) to “moral maps” à la Quételet, to teaching statistics as seen by the young statisticians section (of the RSS), to Tony O’Hagan favourite formul

var(X) = E[var(X|Y)]+var(E[X|Y])

(where he curiously fails to mention Pythagoras, which is how I justify the formula to my students), to the inappropriateness of using hand X-rays to determine whether Indonesian smugglers are under age or not. The less convincing section is obviously the “controversy” one, where the authors make a mechanistic proposal to bypass the drawbacks of p-values and Type I error, without contemplating the ultimate uses of tests…. Very pleasant read (I could have kept for the looong flight to Australia…)