## inf R ! [book review]

Posted in Books, R, Travel with tags , , , , , , , , , , , on June 10, 2021 by xi'an

Thanks to my answering a (basic) question on X validated involving an R code, R mistakes and some misunderstanding about Bayesian hierarchical modelling, I got pointed out to Patrick Burns’ The R inferno. This is not a recent book as the second edition is of 2012, with a 2011 version still available on-line. Which is the version I read. As hinted by the cover, the book plays on Dante’s Inferno and each chapter is associated with a circle of Hell… Including drawings by Botticelli. The style is thus most enjoyable and sometimes hilarious. Like hell!

The first circle (reserved for virtuous pagans) is about treating integral reals as if they were integers, the second circle (attributed to gluttons, although Dante’s is for the lustful) is about allocating more space along the way, as in the question I answered and in most of my students’ codes! The third circle (allocated here to blasphemous sinners, destined for Dante’s seven circle, when Dante’s third circle is to the gluttons) points out the consequences of not vectorising, with for instance the impressive capacities of the ifelse() function [exploited to the max in R codecolfing!].  And the fourth circle (made for the lustfuls rather than Dante’s avaricious and prodigals) is a short warning about the opposite over-vectorising. Circle five (destined for the treasoners, and not Dante’s wrathfuls) pushes for and advises about writing R functions. Circle six recovers Dante’s classification, welcoming (!) heretics, and prohibiting global assignments, in another short chapter. Circle seven (alloted to the simoniacs—who should be sharing the eight circle with many other sinners—rather than the violents as in Dante’s seventh) discusses object attributes, with the distinction between S3 and S4 methods somewhat lost on me. Circle eight (targeting the fraudulents, as in Dante’s original) is massive as it covers “a large number of ghosts, chimeras and devils”, a collection of difficulties and dangers and freak occurences, with the initial warning that “It is a sin to assume that code does what is intended”. A lot of these came as surprises to me and I was rarely able to spot the difficulty without the guidance of the book. Plenty to learn from these examples and counter-examples. Reaching Circle nine (where live (!) the thieves, rather than Dante’s traitors). A “special place for those who feel compelled to drag the rest of us into hell.” Discussing the proper ways to get help on fori. Like Stack Exchange. Concluding with the tongue-in-cheek comment that “there seems to be positive correlation between a person’s level of annoyance at [being asked several times the same question] and ability to answer questions.” This being a hidden test, right?!, as the correlation should be negative.

## on arithmetic derivations of square roots

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , , , on November 13, 2020 by xi'an

An intriguing question made a short-lived appearance on the CodeGolf section of Stack Exchange, before being removed, namely the (most concise possible) coding of an arithmetic derivation of the square root of an integer, S, with a 30 digit precision and using only arithmetic operators. I was not aware of the myriad of solutions available, as demonstrated on the dedicated WIkipedia page. And ended playing with three of them during a sleepless pre-election night!

The first solution for finding √S is based on a continued fraction representation of the root,

$\sqrt{S}=a+\cfrac{r}{2a+\cfrac{r}{2a+\ddots}}$

with a²≤S and r=S-a². It is straightforward to code-golf:

while((r<-S-T*T)^2>1e-9)T=(F<-2*T+r/(2*T+F))-T;F


but I found it impossible to reach the 30 digit precision (even when decreasing the error bound from 10⁻⁹). Given the strict rules of the game, this would have been considered to be a failure.

The second solution is Goldschmidt’s algorithm

b=S
T=1/sum((1:S)^2<S)
while((1-S*prod(T)^2)^2>1e-9){
b=b*T[1]^2
T=c((3-b)/2,T)}
S*prod(T)


which is longer for code-golfing but produces both √S and 1/√S (and is faster than the Babylonian method and Newton-Raphson). Again no luck with high precision and almost surely unacceptable for the game.

The third solution is the most interesting [imho] as it mimicks long division, working two digits at a time (and connection with Napier’s bones)

~=length
D=~S
S=c(S,0*(1:30))
p=d=0
a=1:9
while(~S){
F=c(F,x<-sum(a*(20*p+a)<=(g<-100*d+10*S[1]+S[2])))
d=g-x*(20*p+x)
p=x+10*p
S=S[-1:-2]}
sum(10^{1+D/2-1:~F}*F)


plus providing an arbitrary number of digits with no error. This code requires S to be entered as a sequence of digits (with a possible extra top digit 0 to make the integer D even). Returning one digit at a time, it would further have satisfied the constraints of the question (if in a poorly condensed manner).

## the strange occurrence of the one bump

Posted in Books, Kids, R, Statistics with tags , , , , , , , , on June 8, 2020 by xi'an

When answering an X validated question on running an accept-reject algorithm for the Gamma distribution by using a mixture of Beta and drifted (bt 1) Exponential distributions, I came across the above glitch in the fit of my 10⁷ simulated sample to the target, apparently displaying a wrong proportion of simulations above (or below) one.

a=.9
g<-function(T){
x=rexp(T)
v=rt(T,1)<0
x=c(1+x[v],exp(-x/a)[!v])
x[runif(T)<x^a/x/exp(x)/((x>1)*exp(1-x)+a*(x<1)*x^a/x)*a]}

It took me a while to spot the issue, namely that the output of

  z=g(T)
while(sum(!!z)<T)z=c(z,g(T))
z[1:T]


was favouring simulations from the drifted exponential by truncating. Permuting the elements of z before returning solved the issue (as shown below for a=½)!

## Le Monde puzzle [#1146]

Posted in Books, Kids, R with tags , , , , , , , , on June 5, 2020 by xi'an

The weekly puzzle from Le Monde is once more disappointing.

Everyday of the month, take 0, 1 or 2 units. If one unit taken past day, next day none can be taken. If two units taken two day ago, none can be taken the current day. What is the strategy maximising the number of units  for February? March? April? Generalise to 0,..,3 units taken each day with 0 compulsory three days after taking 3 units.

as taking 2-1-0 (or 3-2-1-0) sounds like the optimal strategy. Except at the final step when 2, 2-2, 2-2-0, and 1-0-2-2 are best. But then why would one distinguish between the three months..?! Because the outcome differ, as 30, 32, and 33, resp. (or 45, 48 and 51). With an average increase of 1 in the first case and 1.5 in the second.

Another puzzle took too much of my time, namely a code golf challenge to devise a code taking as input a matrix of moves over an n x x grid and returning as input the number of nodes and transient tributaries for each loop (or irreducible set) of the moves. Which I solved by running Markov chains from each starting point long enough to reach stationarity. Entering the moves as

n=3;M=matrix(sample(1:4,n^2,rep=T),n)


and returning the result via 390 bytes

j=cbind;l=sum;a=apply
m=l(!!M);n=m^.5
g=function(A,r=M[A])A+c((r<2)*(1-n*(A[,1]==n))-(r==2)*(1-n*(A[,1]<2)),
(r==3)*(1-n*(A[,2]==n))-(r>3)*(1-n*(A[,2]<2)))
I=c()
for(i in d<-1:n)I=rbind(I,j(i*d/d,d))
for(t in b<-1:m)I=g(I)
p=function(i)i[,1]+n*i[,2]-n-1
K=matrix(0,m,m)
for(t in b)K[b+m*p(I<-g(I))]=1
s=o=a(u<-unique(K),1,l)
for(k in 1:l(!!s))s[k]=l(!a(!!sweep(K,2,u[k,],'-'),1,l))
j(o,s-o)


which could be much shorter if only matrices could be indexed from 0 as in C.

## Le Monde puzzle [#1141]

Posted in Kids, pictures, R, University life with tags , , , , , , , , on May 4, 2020 by xi'an

The weekly puzzle from Le Monde is in honour of John Conway, who just passed away, ending up his own game of life:

On an 8×8 checker-board, Alice picks n squares as “infected”. She then propagates the disease by having each square with least two infected neighbours to become infected as well. What is the minimal value of n for the entire board to become infected? What if three infected neighbours are required?

A plain brute force R random search for proper starting points led to n=8 (with a un-code-golfed fairly ugly rendering of the neighbourhood relation, I am afraid!) with the following initial position

With three neighbours, an similar simulation failed to return anything below n=35 as for instance:

oops, n=34 when running a little longer:

which makes sense since an upper bound is found by filling one square out of two (32) and adding both empty corners (2). But this upper bound is only considering one step ahead, so is presumably way too large. (And indeed the minimal value is 28, showing that brute force does not always work! Or it may be that forcing the number of live cells to grow at each step is a coding mistake…)