Archive for probability

surprises in probability [book review]

Posted in Books, Statistics, Travel with tags , , , , , , , , , on November 20, 2018 by xi'an

A very short book (128 pages, but with a very high price!) I received from CRC Press is Henk Tijms’ Surprises in Probability (Seventeen Short Stories). Henk Tijms is an emeritus professor of econometrics at the Vrije University in Amsterdam and he wrote these seventeen pieces either for the Dutch Statistical Society magazine or for a blog he ran for the NYt. (The video of A Night in Casablanca above is only connected to this blog through Chico mimicking the word surprise as soup+rice.)

The author mentions that the book can be useful for teachers and indeed this is a collection of surprising probability results, surprising in the sense that the numerical probabilities are not necessarily intuitive. Most illustrations involve betting of one sort or another,  with only basic (combinatorial) probability distributions involved. Readers should not worry about even this basic probability background since most statements are exposed without a proof. Most examples are very classical, from the prisoner’s problem, to the Monty Hall paradox, to the birthday problem, to Benford’s distribution of digits, to gambler’s ruin, gambler’s fallacy, and the St Petersbourg paradox, to the secretary’s problem and stopping rules. The most advanced notion is the one of (finite state) Markov chains. As martingales are only mentionned in connection with pseudo-probabilist schemes for winning the lottery. For which (our very own!) Jeff Rosenthal makes an appearance, thanks to his uncovering of the Ontario Lottery scam!

“In no other branch of mathematics is it so easy for experts to blunder as in probability theory.”  Martin Gardner

A few stories have entries about Bayesian statistics, with mentions made of the O.J. Simpson, Sally Clark and Lucia de Berk miscarriages of justice, although these mentions make the connection most tenuous. Simulation is also mentioned as a manner of achieving approximations to more complex probabilities. But not to the point of discussing surprises about simulation, which could have been the case with the simulation of rare events.

Ten most beautiful probability formulas (Story 10) reminded me of Ian Steward 17 formulas that changed the World. Obviously at another scale and in a much less convincing way. To wit, the Normal (or Gauss) density, Bayes’ formula, the gambler’s ruin formula, the squared-root formula (meaning standard deviation decreases as √n), Kelly’s betting formula (?), the asymptotic law of distribution of prime numbers (??), another squared-root formula for the one-dimensional random walk, the newsboy formula (?), the Pollaczek-Khintchine formula (?), and the waiting-time formula. I am not sure I would have included any of these…

All in all this is a nice if unsurprising database for illustrations and possibly exercises in elementary probability courses, although it will require some work from the instructor to link the statements to their proof. As one would expect from blog entries. But this makes for a nice reading, especially while traveling and I hope some fellow traveler will pick the book from where I left it in Mexico City airport.

position at Harvard

Posted in pictures, Running, University life with tags , , , , , , , , on October 27, 2018 by xi'an

This to point out an opening for a tenure track position in statistics and probability at Harvard University, with deadline December 1. More specifically, for a candidate in any field of statistics and probability as well as in any interdisciplinary areas where innovative and principled use of statistics and/or probability is of vital importance

riddles on a line [#2]

Posted in Books, Kids, R with tags , , , , , , , on September 11, 2018 by xi'an

A second Riddle(r), with a puzzle related with the integer set Ð={,12,3,…,N}, in that it summarises as

Given a random walk on Ð, starting at the middle N/2, with both end states being absorbing states, and a uniform random move left or right of the current value to the (integer) middle of the corresponding (left or right) integer interval, what is the average time to one absorbing state as a function of N?

Once the Markov transition matrix M associated with this random walk is defined, the probability of reaching an absorbing state in t steps can be derived from the successive powers of M by looking at the difference between the probabilities to be (already) absorbed at both t-1 and t steps. From which the average can be derived.

avexit <- function(N=100){
 #transition matrix M for the walk
 #1 and N+2 are trapping states
 tranz=matrix(0,N+2,N+2)
 tranz[1,1]=tranz[N+2,N+2]=1
 for (i in 2:(N+1))
  tranz[i,i+max(trunc((N+1-i)/2),1)]=tranz[i,i-max(trunc((i-2)/2),1)]=1/2
 #probabilities of absorption
 prowin=proloz=as.vector(0)
 init=rep(0,N+2)
 init[trunc((N+1)/2)]=1 #first position
 curt=init
 while(1-prowin[length(prowin)]-proloz[length(prowin)]>1e-10){
  curt=curt%*%tranz
  prowin=c(prowin,curt[1])
  proloz=c(proloz,curt[N+2])}
 #probability of new arrival in trapping state
 probz=diff(prowin+proloz)
 return(sum((2:length(proloz))*probz))}

leading to an almost linear connection between N and expected trapping time.

riddles on a line

Posted in Books, Kids, R with tags , , , , , , , , , on August 22, 2018 by xi'an

In the Riddler of August 18, two riddles connected with the integer set Ð={2,3,…,10}:

  1. Given a permutation of Ð, what is the probability that the most likely variation (up or down) occurs at each term?
  2. Given three players choosing three different integers in Ð sequentially, and rewards in Ð allocated to the closest of the three players (with splits in case of equal distance), what is the reward for each given an optimal strategy?

For the first question, a simple code returns 0.17…

winofail<-function(permz){ 
  if (length(permz)>1){
    lenoperm=length(permz[-1])/2
    win=(permz[1]<permz[2])*(sum(permz[-1]>permz[1])>lenoperm)+
     (permz[1]>permz[2])*(sum(permz[-1]<permz[1])>lenoperm)+
      (runif(1)<.5)*(sum(permz[-1]>permz[1])==lenoperm)
    win=win&&winofail(permz[-1])
  }else win=TRUE
  return(win)}

(but the analytic solution exhibits a cool Pascal triangular relation!) and for the second question, a quick recursion or dynamic programming produces 20, 19, 15 as the rewards (and 5, 9, 8 as the locations)

gainz<-function(seqz){
  difz=t(abs(outer(2:10,seqz,"-")))
  cloz=apply(difz,2,rank)
  return(apply(rbind(2:10,2:10,2:10)*
   ((cloz==1)+.5*(cloz==1.5)),1,sum))}

timeline<-function(prev){ 
  if (length(prev)==0){ 
   sol=timeline(10);bez=gainz(sol)[1] 
   for (i in 2:9){ 
    bol=timeline(i);comp=gainz(bol)[1] 
    if (comp>bez){
        bez=comp;sol=bol}}}
  if (length(prev)==1){
    bez=-10
    for (i in (2:10)[-(prev-1)]){
      bol=timeline(c(prev,i))
      comp=gainz(bol)[2]
      if (comp>bez){
        bez=comp;sol=bol}}}
  if (length(prev)==2){
    bez=-10
    for (i in (2:10)[-(prev-1)]){
      bol=c(prev,i)
      comp=gainz(bol)[3]
      if (comp>bez){
        bez=comp;sol=bol}}}
  return(sol)}

After reading the solution on the Riddler, I realised I had misunderstood the line as starting at 2 when it was actually starting from 1. Correcting for this leads to the same 5, 9, 8 locations of picks, with rewards of 21, 19, 15.

Why is it necessary to sample from the posterior distribution if we already KNOW the posterior distribution?

Posted in Statistics with tags , , , , , , , , on October 27, 2017 by xi'an

I found this question on X validated somewhat hilarious, the more because of the shouted KNOW! And the confused impression that because one can write down π(θ|x) up to a constant, one KNOWS this distribution… It is actually one of the paradoxes of simulation that, from a mathematical perspective, once π(θ|x) is available as a function of (θ,x), all other quantities related with this distribution are mathematically perfectly and uniquely defined. From a numerical perspective, this does not help. Actually, when starting my MCMC course at ENSAE a few days later, I had the same question from a student who thought facing a density function like

f(x) ∞ exp{-||x||²-||x||⁴-||x||⁶}

was enough to immediately produce simulations from this distribution. (I also used this example to show the degeneracy of accept-reject as the dimension d of x increases, using for instance a Gamma proposal on y=||x||. The acceptance probability plunges to zero with d, with 9 acceptances out of 10⁷ for d=20.)