Archive for Stack Exchange

Bayesian sampling without tears

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , , , , on May 24, 2022 by xi'an

Following a question on Stack Overflow trying to replicate a figure from the paper written by Alan Gelfand and Adrian Smith (1990) for The American Statistician, Bayesian sampling without tears, which precedes their historical MCMC papers, I looked at the R code produced by the OP and could not spot an issue as to why their simulation did not fit the posterior produced in the paper. Which proposes acceptance-rejection and sampling-importance-resampling as two solutions to approximately simulate from the posterior. The later being illustrated by simulations from the prior being weighted by the likelihood… The illustration is made of 3 observations from the sum of two Binomials with different success probabilities, θ¹ and θ². With a Uniform prior on both.

for (i in 1:N)
  for (k in 1:3){
    llh<-0
    for (j in max(0,n2[k]-y[k]):min(y[k],n1[k]))
      llh<-llh+choose(n1[k],j)*choose(n2[k],y[k]-j)*
         theta[i,1]^j*(1-theta[i,1])^(n1[k]-j)*theta[i,2]^(y[k]-j)*
         (1-theta[i,2])^(n2[k]-y[k]+j)
    l[i]=l[i]*llh}

To double-check, I also wrote a Gibbs version:

theta=matrix(runif(2),nrow=T,ncol=2)
x1=rep(NA,3)
for(t in 1:(T-1)){
   for(j in 1:3){
     a<-max(0,n2[j]-y[j]):min(y[j],n1[j])
     x1[j]=sample(a,1,
        prob=choose(n1[j],a)*choose(n2[j],y[j]-a)*
           theta[t,1]^a*(1-theta[t,1])^(n1[j]-a)*
           theta[t,2]^(y[j]-a)*(1-theta[t,2])^(n2[j]-y[j]+a)
           )}
      theta[t+1,1]=rbeta(1,sum(x1)+1,sum(n1)-sum(x1)+1)
      theta[t+1,2]=rbeta(1,sum(y)-sum(x1)+1,sum(n2)-sum(y)+sum(x1)+1)}

which did not show any difference with the above. Nor with the likelihood surface.

stack exchange will not restrict access from Russia

Posted in Statistics with tags , , , , , on March 18, 2022 by xi'an

triple ruin

Posted in Books, Kids, pictures, R, Statistics, Wines with tags , , , , , , , , , , on December 28, 2021 by xi'an

An almost straightforward riddle from The Riddler involving a triple gambler’s ruin: Dawn competes against three players Alessandra, Berenike, and Chinue, with probabilities of winning one round ¾, ½, and ¼, respectively, until the cumulated score reaches ±15, ±30, and ±45, for the first, second, and third games. What is Dawn’s optimal sequence of adversaries?

First, a brute force R simulation shows that the optimal ordering is to play the three adversaries first weakest, third strongest and middle fair:

ord=function(p){
  z=2*(runif(1)<p[1])-1
  while(abs(z)<15)z=z+2*(runif(1)<p[1])-1
  y=2*(runif(1)<p[2])-1
  while(abs(z+y)<30)y=y+2*(runif(1)<p[2])-1
  x=2*(runif(1)<p[3])-1
  while(abs(z+y+x)<45)x=x+2*(runif(1)<p[3])-1 
  return(x+y+z>0)}
mcord=function(p,T=1e2){
  for(t in 1:T)F=F+ord(p)
  return(F/T)}
comp=function(T=1e2){
  return(c(mcord(c(.5,.55,.45),t),
    #mcord(c(.5,.45,.55),t),#1-above
    mcord(c(.55,.5,.45),t),
    #mcord(c(.45,.5,.55),t),#1-above
    mcord(c(.55,.45,.5),t)
    #mcord(c(.45,.55,.5),t)))#1-above
    ))}

where I used probabilities closer to ½ to avoid estimated probabilities equal to one.

> comp(1e3)
[1] 0.051 0.038 0.183

(and I eliminated the three other probabilities by sheer symmetry). Second, checking in Feller’s bible (Vol. 1, XIV.3) for the gambler’s ruin probability, a simple comparison of the six orderings confirms this simulation.

inf R ! [book review]

Posted in Books, R, Travel with tags , , , , , , , , , , , on June 10, 2021 by xi'an

Thanks to my answering a (basic) question on X validated involving an R code, R mistakes and some misunderstanding about Bayesian hierarchical modelling, I got pointed out to Patrick Burns’ The R inferno. This is not a recent book as the second edition is of 2012, with a 2011 version still available on-line. Which is the version I read. As hinted by the cover, the book plays on Dante’s Inferno and each chapter is associated with a circle of Hell… Including drawings by Botticelli. The style is thus most enjoyable and sometimes hilarious. Like hell!

The first circle (reserved for virtuous pagans) is about treating integral reals as if they were integers, the second circle (attributed to gluttons, although Dante’s is for the lustful) is about allocating more space along the way, as in the question I answered and in most of my students’ codes! The third circle (allocated here to blasphemous sinners, destined for Dante’s seven circle, when Dante’s third circle is to the gluttons) points out the consequences of not vectorising, with for instance the impressive capacities of the ifelse() function [exploited to the max in R codecolfing!].  And the fourth circle (made for the lustfuls rather than Dante’s avaricious and prodigals) is a short warning about the opposite over-vectorising. Circle five (destined for the treasoners, and not Dante’s wrathfuls) pushes for and advises about writing R functions. Circle six recovers Dante’s classification, welcoming (!) heretics, and prohibiting global assignments, in another short chapter. Circle seven (alloted to the simoniacs—who should be sharing the eight circle with many other sinners—rather than the violents as in Dante’s seventh) discusses object attributes, with the distinction between S3 and S4 methods somewhat lost on me. Circle eight (targeting the fraudulents, as in Dante’s original) is massive as it covers “a large number of ghosts, chimeras and devils”, a collection of difficulties and dangers and freak occurences, with the initial warning that “It is a sin to assume that code does what is intended”. A lot of these came as surprises to me and I was rarely able to spot the difficulty without the guidance of the book. Plenty to learn from these examples and counter-examples. Reaching Circle nine (where live (!) the thieves, rather than Dante’s traitors). A “special place for those who feel compelled to drag the rest of us into hell.” Discussing the proper ways to get help on fori. Like Stack Exchange. Concluding with the tongue-in-cheek comment that “there seems to be positive correlation between a person’s level of annoyance at [being asked several times the same question] and ability to answer questions.” This being a hidden test, right?!, as the correlation should be negative.

Kempner Fi

Posted in Books, Kids, R, Statistics with tags , , , , , , , on January 19, 2021 by xi'an

A short code-golf challenge led me to learn about the Kempner series, which is the series made of the inverted integers, excluding all those containing the digit 9. Most surprisingly this exclusion is enough to see the series converging (close to 23). The explanation for this convergence is that, citing Wikipedia,

“The number of n-digit positive integers that have no digit equal to ‘9’ is 8 × 9n−1

and since the inverses of these n-digit positive integers are less than 101−n the series is bounded by 80. In simpler terms, it converges because the fraction of remaining terms in the series is geometrically decreasing as (9/10)1−n. Unsurprisingly (?) the series is also atrociously slow to converge (for instance the first million terms sum up to 11) and there exist recurrence representations that speed up its computation.  Here is the code-golf version

n=scan()+2
while(n<-n-1){
F=F+1/T
while(grepl(9,T<-T+1))0}
F

that led me to learn about the R function grepl. (The explanation for the pun in the title is that Semper Fidelis is the motto of the corsair City of Saint-Malo or Sant-Maloù, Brittany.)

%d bloggers like this: