i-like[d the] workshop

Posted in Running, Statistics, Travel, University life with tags , , , , , , , , on May 17, 2013 by xi'an

Indeed, I liked the i-like workshop very much. Among the many interesting talks of the past two days (incl. Cristiano Varin’s ranking of Series B as the top influential stat. journal!) , Matti Vihola’s and Nicolas Chopin’s had the strongest impact on me (to the point of scribbling in my notebook). In a joint work with Christophe Andrieu, Matti focussed on evaluating the impact of replacing the target with an unbiased estimate in a Metropolis-Hastings algorithm. In particular, they found necessary and sufficient conditions for keeping geometric and uniform ergodicity. My question (asked by Iain Murray) was whether they had derived ways of selecting the number of terms in the unbiased estimator towards maximal efficiency. I also wonder if optimal reparameterisations can be found in this sense (since unbiased estimators remain unbiased after reparameterisation).

Nicolas’ talk was about particle Gibbs sampling, a joint paper with Sumeet Singh recently arXived. I did not catch the whole detail of their method but/as I got intrigued by a property of Marc Beaumont’s algorithm (the very same algorithm used by Matti & Christophe). Indeed, the notion is that an unbiased estimator of the target distribution can be found in missing variable settings by picking an importance sampling distribution q on those variables. This representation leads to a pseudo-target Metropolis-Hastings algorithm. In the stationary regime, there exists a way to derive an “exact” simulation from the joint posterior on (parameter,latent). All the remaining/rejected latents are then distributed from the proposal q. What I do not see is how this impacts the next MCMC move since it implies generating a new sample of latent variables. I spoke with Nicolas about this over breakfast: the explanation is that this re-generated set of latent variables can be used in the denominator of the Metropolis-Hastings acceptance probability and is validated as a Gibbs step. (Incidentally, it may be seen as a regeneration event as well.)

bike trail from Kenilworth to the University of WarwickFurthermore, I had a terrific run in the rising sun (at 5am) all the way to Kenilworth where I was a deer, pheasants and plenty of rabbits. (As well as this sculpture that now appears to me as being a wee sexist…)

Warwickshire snapshot

Posted in pictures, Running, Travel with tags , , on May 17, 2013 by xi'an

Westwood church

i-like workshop [talk]

Posted in Statistics, Travel, University life with tags , , , , , , on May 16, 2013 by xi'an

Here are the slides of my talk at the i-like workshop in Warwick today:

I am really glad I could make it there and meet with many (highly supportive) friends for three days! The slides are quite similar to those I presented in Padova. I just added a few perspective slides…

the cartoon introduction to statistics

Posted in Books, Kids, Statistics, University life with tags , , , , , on May 16, 2013 by xi'an

A few weeks ago, I received a copy of The Cartoon Introduction to Statistics by Grady Klein and Alan Dabney, send by their publisher, Farrar, Staus and Giroux from New York City.  (Never heard of this publisher previously, but I must admit the aggregation of those three names sounds great!) As this was an unpublished version of the book, to appear in July 2013, I first assumed my copy was a draft version, with black and white drawings using limited precision graphics.. However, when checking the already published Cartoon Introduction to Economics, I realised this was the style of Grady Klein (as reflected below).

Thus, I have to assume this is how The Cartoon Introduction to Statistics will look like  when published in July… I am quite perplexed by the whole project. First, I do not see how a newcomer to the field can learn better from a cartoon with an average four sentences per page than from a regular introductory textbook. Cartoons introduce an element of fun into the explanation, with jokes and (irrelevant) side stories, but they are also distracting as readers are not always in  a position to know what matters and what does not. Second, as the drawings are done in a rough style, I find this increases the potential for confusion. For instance, the above cover reproduces an example linking the histogram of a sample of averages and the normal distribution. If a reader has never heard of histograms, I do not see how he or she could gather how they are constructed in practice. The width of the bags is related to the number of persons in each bag (50 random Americans) in the story, while it should be related to the inverse of the square root of this number in the theory.  Similarly, I find the explanation about confidence intervals lacking: when trying to reassure the readers about the fact that any given random sample from a population might be misleading, the authors state that “in the long run most cans [of worms] have averages in the clump under the hump [of the normal pdf]“. This is not reassuring at all: when using confidence intervals based on 10 or on 10⁵ normal observations, the corresponding 95% confidence intervals on their mean both have 95% chances to contain the true mean. The long run aspect refers to the repeated use of those intervals. (I am not even mentioning the classical fallacy of stating that “we are 99.7% confident that the population average is somewhere between -1.73 and -0.27″…)

In conclusion, I remember buying an illustrated entry to Marx’ Das Kapital when I started economics in graduate school (as a minor). This gave me a very quick idea of the purpose of the book. However, I read through the whole book to understand (or try to understand) Marx’ analysis of the economy. And the introduction did not help much in this regard. In the present setting, we are dealing with statistics, not economics, not philosophy. Having read a cartoon about the average length of worms within a can of worms is not going to help much in understanding the Central Limit Theorem and the subsequent derivation of confidence intervals. The validation of statistical methods is done through mathematics, which provides a formal language cartoons cannot reproduce.

Le Monde puzzle [#820]

Posted in Books, Kids, R with tags , , , , , on May 15, 2013 by xi'an

The current puzzle is… puzzling:

Given the set {1,…,N} with N<61, one iterates the following procedure: take (x,y) within the set and replace the pair with the smallest divider of x+y (bar 1). What are the values of N such that the final value in the set is 61?

I find it puzzling because the way the pairs are selected impacts the final value. Or not, depending upon N. Using the following code (with factors() from the pracma package):

library(pracma)
endof=function(N){
  coll=1:N
  for (t in 1:(N-1)){

    pair=sample(1:length(coll),2)
    dive=min(factors(sum(coll[pair])))
    coll=coll[-pair]
    coll=c(coll,dive)
    }
  print(dive)
  }

I got:

> for (t in 1:10) endof(10)
[1] 5
[1] 3
[1] 3
[1] 5
[1] 7
[1] 5
[1] 5
[1] 7
[1] 3
[1] 3> for (t in 1:10) endof(16)
[1] 2
[1] 2
[1] 2
[1] 2
[1] 2
[1] 2
[1] 2
[1] 2
[1] 2
[1] 2

For N of the form 4k or 4k-1, the final number is always 2 while for N‘s of the form 4k-2 and 4k-3, the final number varies, sometimes producing 61′s. Although I could not find solutions for N less than 17…  Looking more closely into the sequence leading to 61, I could not see a pattern, apart from producing prime numbers as, in, e.g.

61 = 2 + [12 +  (4 + {14 + [13 + 16]})]

for N=17.  (Another puzzle is that 61 plays no particular role: a long run of random calls to endof() return all prime numbers up to 79…)

Udate: Looking at the solution in today’s edition, there exist a solution for N=13 and a solution for N=14. Even though my R code fails to spot it. Of course, an exhaustive search would be feasible in these two cases.  (I had also eliminated values below as not summing up to 61.) The argument for eliminating 4k and 4k-1 is that there must be an odd number of odd numbers in the collection, otherwise, the final number is always 2.

Rによるモンテカルロ法入門

Posted in Books, R, Statistics with tags , , on May 14, 2013 by xi'an

Here is the cover of the Japanese translation of our Introducing Monte Carlo methods with R book.  A few year after the French translation. It actually appeared last year in August but I was not informed of this till a few weeks ago. The publisher is Maruzen, with an associated webpage if you want to order… Unless I am confused the translators are Hiro Ishida and Kazue Ishida; they deserve a major ありがとう ! And too bad George is no longer with us: this must have been the first translation of one of his books in Japanese..

awalé

Posted in Kids, pictures, R with tags , , , on May 13, 2013 by xi'an

Awalé board on my garden table, March 15, 2013Following Le Monde puzzle #810, I tried to code an R program (not reproduced here) to optimise an awalé game but the recursion was too rich for R:

Error: evaluation nested too deeply:
infinite recursion / options(expressions=)?

even with a very small number of holes and seeds in the awalé… Searching on the internet, it seems the computer simulation of a winning strategy for an awalé game still is an open problem! Here is a one-step R function that does  not produce sure gains for the first player, far from it, as shown by the histogram below…  I would need a less myopic strategy by iterating  this function at least twice.

onemorestep=function(x,side){
# x current state of the awale,
# side side of the awale (0 vs 1)

M=length(x);N=as.integer(M/2)
rewa=rep(0,M)
newb=matrix(0,ncol=M,nrow=M)

for (i in ((1:N)+N*side)){

 if (x[i]>0){
   y=x
   y[i]=0
   for (t in 0:(x[i]-1))
     y[1+(i+t)%%M]=y[1+(i+t)%%M]+1

   last=1+(i+t)%%M
   if (side){ gain=(last<=N)
    }else{ gain=(last>N)}

   if (gain){# ending up on the right side
     rewa[i]=0
     while (((last>0)&&(side))||((last>N)||(!side)))
     if ((y[last]==2)||(y[last]==3)){
          rewa[i]=rewa[i]+y[last];y[last]=0
          last=last-1
          }else{ break()}
     }
   newb[i,]=y
   }
  }
if (max(rewa)>0){
  sol=order(-rewa)[1]
  }else{ sol=rang=((1:N)+N*side)[x[((1:N)+N*side)]>0]
   if (length(rang)>1) sol=sample(rang,1,prob=x[rang]^3)}

   return(list(reward=max(rewa),board=newb[sol,]))
}

gains of player 1 obtained from using associated R code

Follow

Get every new post delivered to your Inbox.

Join 339 other followers