Archive for August, 2011

Posts of the year

Posted in Books, R, Statistics, University life with tags , , , , , , , on August 31, 2011 by xi'an

Like last year, here are the most popular posts since last August:

  1. Home page 92,982
  2. In{s}a(ne)!! 6,803
  3. “simply start over and build something better” 5,834
  4. Julien on R shortcomings 2,373
  5. Parallel processing of independent Metropolis-Hastings algorithms 1,455
  6. Do we need an integrated Bayesian/likelihood inference? 1,361
  7. Coincidence in lotteries 1,256
  8. #2 blog for the statistics geek?! 863
  9. ABC model choice not to be trusted 814
  10. Sudoku via simulated annealing 706
  11. Bayes on the Beach 2010 [2] 704
  12. News about speeding R up 688
  13. Solution manual for Introducing Monte Carlo Methods with R 688
  14. R exam 617
  15. Bayesian p-values 607
  16. Monte Carlo Statistical Methods third edition 577
  17. Le Monde puzzle [49] 499
  18. The foundations of Statistics: a simulation-based approach 493
  19.  The mistborn trilogy 492
  20. Lack of confidence in ABC model choice 487
  21. Solution manual to Bayesian Core on-line 481
  22. Bayes’ Theorem 459
  23. Julian Besag 1945-2010 452
  24. Millenium 1 [movie] 448
  25. ABC lectures [finale] 436

No major surprise in this ranking: R related blogs keep the upper part, partly thanks to being syndicated on R-bloggers, partly thanks to the tribunes contributed by Ross Ihaka and Julien Cornebise, even though I am surprised a rather low-key Le Monde puzzle made it to the list (maybe because it became part of my latest R exam?). Controversial books reviews are great traffic generators, even though the review of The foundations of Statistics: a simulation-based approach was posted less than a month ago. At last, it is comforting to see two of our major research papers for the 2010-2011 period on the list: the Parallel processing of independent Metropolis-Hastings algorithms with Pierre and Murray, and the more controversial Lack of confidence in ABC model choice with Jean-Michel and Natesh (twice). The outlier in the list is undoubtedly Bayes on the Beach 2010 [2] which got undeserved traffic for pointing out to Surfers Paradise , a highly popular entry! On my side unscientific entries, Saunderson’s Mistborn and Larson’s Millenium, McCarthy’s Border trilogy missing the top list by three entries…

Keys in and out

Posted in Kids, pictures, Travel with tags , , , , , , on August 30, 2011 by xi'an

Spending a few days in the Keys was both relaxing, replacing the rudimentary comfort on the catamaran with the amenities of an American house, and frustrating, because of the sudden decrease in the intensity of our activities during those days. Indeed, finding snorkeling areas in the area was much less obvious than in the Bahamas, with less variety in the fish population. The best snorkeling spot happened to be a few meters away from Fort Zachary Taylor beacht in Key West, with even a reported sighting of a manatee a few meters away. The few accessible beaches in the remainder of the Keys were rather disappointing, with a band of dried sea grass that put the kids away, and long flat plateaus that prevented swimming and favoured bacterias. The best memory of the Keys is presumably the quality of its sunsets, which were uniformly magic. (A related disappointment was a fruitless search for a rising full moon, due to the lack of accessible beach anywhere close to our rental place…) We also enjoyed visiting the turtle hospital in Marathon, a private charity that provides medical care for turtles entangled in flotsam, jetsam, and fishing nets, damaged by boat [and idiotic jetski] propellers, suffering from plastic ingestion (impactation) or from fibropapilloma… This was in sharp contrast with the dolphin and shark “research centres” we briefly considered, which are nothing more than expensive petting zoos!

published in PNAS!

Posted in Statistics, University life with tags , , , , on August 30, 2011 by xi'an

The paper “Lack of confidence in approximate Bayesian computation model choice“, with  Jean-Marie Cornuet, Jean-Michel Marin, and Natesh S. Pillai, has now appeared in the Early Edition of PNAS! It is in Open Access, so fully accessible to everyone. Thanks to the referees and to the PNAS editor, Steve Fienberg, for their support. A very fitting ending for a paper started around a (fake) log-fire in Park City! (And my very first paper in PNAS!)

another lottery coincidence

Posted in R, Statistics with tags , , , on August 30, 2011 by xi'an

Once again, meaningless figures are published about a man who won the French lottery (Le Loto) for the second time. The reported probability of the event is indeed one chance out of 363 (US) trillions (i.e., billions in the metric system. or 1012)… This number is simply the square of

{49 \choose 5}\times{10 \choose 1} = 19,068,840

which is the number of possible loto grids. Thus, the probability applies to the event “Mr so-&-so plays a winning grid of Le Loto on May 6, 1995 and a winning grid of Le Loto on July 27, 2011“. But this is not the event that occured: one of the bi-weekly winners of Le Loto won a second time and this was spotted by Le Loto spokepersons. If we take the specific winner for today’s draw, Mrs such-&-such, who played bi-weekly one single grid since the creation of Le Loto in 1976, i.e. about 3640 times, the probability that she won earlier is of the order of

1-\left(1-\frac{1}{{49\choose 5}\times{10\choose 1}}\right)^{3640}=2\cdot 10^{-4}.

There are thus two chances in 10 thousands to win again for a given (unigrid) winner, not much indeed, but no billion involved either. Now, this is also the probability that, for a given draw (like today’s draw), one of the 3640 previous winners wins again (assuming they all play only one grid,  play independently from each other, &tc.). Over a given year, i.e. over 104 draws, the probability that there is no second-time winner is thus approximately

\left(1-\frac{1}{2\cdot10^4}\right)^{104} = 0.98,

showing that within a year there is a 2% chance to find an earlier winner. Not so extreme, isn’t it?! Therefore, less bound to make the headlines…

Now, the above are rough and conservative calculations. The newspaper articles about the double winner report that the man is playing about 1000 euros a month (this is roughly the minimum wage!), representing the equivalent of 62 grids per draw (again I am simplifying to get the correct order of magnitude). If we repeat the above computations, assuming this man has played 62 grids per draw from the beginning of the game in 1976 till now, the probability that he wins again conditional on the fact that he won once is

1-\left(1-\frac{62}{{49 \choose 5}\times{10 \choose 1}}\right)^{3640} = 0.012,

a small but not impossible event. (And again, we consider the probability only for Mr so-&-so, while the event of interest does not.) (I wrote this post before Alex pointed out the four-time lottery winner in Texas, whose “luck” seems more related with the imperfections of the lottery process…)

I also stumbled on this bogus site providing the “probabilities” (based on the binomial distribution, nothing less!) for each digit in Le Loto, no need for further comments. (Even the society that runs Le Loto hints at such practices, by providing the number of consecutive draws a given number has not appeared, with the sole warning “N’oubliez jamais que le hasard ne se contrôle pas“, i.e. “Always keep in mind that chance cannot be controlled“…!)

On Congdon’s estimator

Posted in Statistics, University life with tags , , , , on August 29, 2011 by xi'an

I got the following email from Bob:

I’ve been looking at some methods for Bayesian model selection, and read your critique in Bayesian Analysis of Peter Congdon’s method. I was wondering if it could be fixed simply by including the prior densities of the pseudo-priors in the calculation of P(M=k|y), i.e. simply removing the approximation in Congdon’s eqn. 3 so that the product over the parameters of the other models (i.e. j≠k) is included in the calculation of P(M=k|y, \theta^(t))? This seems an easy fix, so I’m wondering why you didn’t suggest it.

This relates to our Bayesian Analysis criticism of Peter Congdon’s approximation of posterior model probabilities. The difficulty with the estimator is that it uses simulations from the separate [model-based] posteriors when it should rely on simulations from the marginal [model-integrated] posterior (in order to satisfy an unbiasedness property). After a few email exchanges with Bob, I think I understand correctly the fix he proposes, i.e. that the “other model” parameters are simulated from the corresponding model-based posteriors, rather than being jointly simulated with the parameter from the “current model” from the joint posterior. However, the correct weight in Carlin and Chib’s approximation then involves the product of the [model-based] posteriors (including the normalisation constant) as “pseudo-priors”. I also think that even if the exact [model-based] posteriors were used, the fact that the weight involves a product over a large number of densities should induce an asymmetric behaviour. Indeed this product, while on average equal to one (or 1/M if M is the number of models), is more likely to take very small values than to take very large values (by a supermartingale argument)…

Follow

Get every new post delivered to your Inbox.

Join 619 other followers