## Archive for R

## Those who live by ChatGPT are destined to get advice of unpredictable quality

Posted in Statistics with tags ANOVA, ChatGTP, cross validated, predictor, R, regression on January 18, 2023 by xi'an## Tribonacci sequence

Posted in Books, Kids, R with tags brute-force solution, Fibonacci, FiveThirtyEight, mathematical puzzle, R, The Riddler on January 3, 2023 by xi'an**A** simplistic puzzle from The Riddler when applying brute force:

A Tribonacci sequence is based on three entry integers a ≤ b ≤ c, and subsequent terms are the sum of the previous three. Among Tribonacci sequences containing 2023, which one achieves the smallest fourth term, a+b+c ?

The R code

tri<-function(a,b,e){ while(F<2023){ F=a+b+e;a=b;b=e;e=F} return(F<2024)} sol=NULL;m=674 for(a in 1:m) for(b in a:m) for(e in b:m) if(tri(a,b,e)){ sol=rbind(sol,c(a,b,e))}

leads to (1,1,6) as the solution… Incidentally, this short exercise led me to finally look for a fix to entering vectors as arguments of functions requesting lists:

do.call("tri",as.list(sol[2023,]))

## foliage to the max

Posted in Books, Kids, pictures with tags Broad Street pump, Fall, foliage, R, riddle, The Riddler on December 17, 2022 by xi'an**A**n easy riddle from The Riddler that did not even require coding! Given that a tree changes colours at a random time A distributed according to a Uniform distribution (over (0,1)) and that it sheds its leave at a random time B distributed according to a Uniform distribution (over (A,1)), what is the time when a maximal number of trees show their new colour?

Which means optimising in t the probability that A<t<B. Which is equal to -(1-t)log(1-t) and maximal for t=1-e⁻¹, resulting in a (maximal) fraction of e⁻¹ of the trees holding to their new colour at that time.

## Bayes Factors for Forensic Decision Analyses with R [book review]

Posted in Books, R, Statistics with tags Bayes factor, book review, Bruno de Finetti, Ca' Foscari University, CHANCE, Chib's approximation, classification, forensic statistics, improper prior, Jeffreys-Lindley paradox, MCMC, open access, R, springer on November 28, 2022 by xi'anMy friend EJ Wagenmaker pointed me towards an *entire* book on the BF by Bozza (from Ca’Foscari, Venezia), Taroni and Biederman. It is providing a sort of blueprint for using Bayes factors in forensics for both investigative and evaluative purposes. With R code and free access. I am of course unable to judge of the relevance of the approach for forensic science (I was under the impression that Bayesian arguments were usually not well-received in the courtroom) but find that overall the approach is rather one of repositioning the standard Bayesian tools within a forensic framework.

*“The [evaluative] purpose is to assign a value to the result of a comparison between an item of unknown source and an item from a known source.”*

And thus I found nothing shocking or striking from this standard presentation of Bayes factors, including the call to loss functions, if a bit overly expansive in its exposition. The style is also classical, with a choice of grey background vignettes for R coding parts that we also picked in our R books! If anything, I would have expected more realistic discussions and illustrations of prior specification across the hypotheses (see e.g. page 34), while the authors are mostly centering on conjugate priors and the (de Finetti) trick of the equivalent prior sample size. Bayes factors are mostly assessed using a conservative version of Jeffreys’ “scale of evidence”. The computational section of the book introduces MCMC (briefly) and mentions importance sampling, harmonic mean (with a minimalist warning), and Chib’s formula (with no warning whatsoever).

*“The [investigative] purpose is to provide information in investigative proceedings (…) The scientist (…) uses the findings to generate hypotheses and suggestions for explanations of observations, in order to give guidance to investigators or litigants.”*

Chapter 2 is about standard models: inferring about a proportion, with some Monte Carlo illustration, and the complication of background elements, normal mean, with an improper prior making an appearance [on p.69] with no mention being made of the general prohibition of such generalised priors when using Bayes factors or even of the Lindley-Jeffreys paradox. Again, the main difference with Bayesian textbooks stands with the chosen examples.

Chapter 3 focus on evidence evaluation [not in the computational sense] but, again, the coverage is about standard models: processing the Binomial, multinomial, Poisson models, again though conjugates. (With the side remark that Fig 3.2 is rather unhelpful: when moving the prior probability of the null from zero to one, its posterior probability also moves from zero to one!) We are back to the Normal mean case with the model variance being known then unknown. (An unintentionally funny remark (p.96) about the dependence between mean and variance being seen as too restrictive and replaced with… independence!). At last (for me!), the book is pointing [p.99] out that the BF is highly sensitive to the choice of the prior variance (Lindley-Jeffreys, where art thou?!), but with a return of the improper prior (on said variance, p.102) with no debate on the ensuing validity of the BF. Multivariate Normals are also presented, with Wishart priors on the precision matrix, and more details about Chib’s estimate of the evidence. This chapter also contains illustrations of the so-called *score-based* BF which is simply (?) a Bayes factor using a distribution on a distance summary (between an hypothetical population and the data) and an approximation of the distributions of these summaries, provided enough data is available… I also spotted a potentially interesting foray into BF variability (Section 3.4.2), although not reaching all the way to a notion of BF posterior distributions.

Chapter 4 stands for Bayes factors for investigation, where alternative(s) is(are) less specified, as testing eg Basmati rice vs non-Basmati rice. But there is no non-parametric alternative considered in the book. Otherwise, it looks to me rather similar to Chapter 3, i.e. being back to binomial, multinomial models, with more discussions onm prior specification, more normal, or non-normal model, where the prior distribution is puzzingly estimated by a kernel density estimator, a portmanteau alternative (p.157), more multivariate Normals with Wishart priors and an entry on classification & discrimination.

*[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE. As appropriate for a book about Chance!]*

## optimal leap year

Posted in Books, Kids, Mountains, pictures, R with tags Australia, daylight saving time, Gregorian calendar, Kata Tjuta, leap year, Mounts Olga, R, sunset on November 19, 2022 by xi'an**A** riddle about leap years: a solar year consists of approximately 365.24217 mean solar days, which is why there is a leap year approximately every four years. Approximately because the Gregorian calendar plans 97 and not 100 leap years over 400 years. Is this the optimal solution? No, since the Gregorian difference is 3.3 10⁻⁴ day per year, or 0.132 day per 400 years, while using 85 leap years over every 351 years leads to a difference of 4.76 10⁻⁶ day per year, or 0.002 day per 400 years… (With a further gain by a factor 4 with 116 leap years every 479 years.) This can be found by a basic R code

for(N in 10:1000) for(L in 1:N){ p=abs(L/N-.24217) if(p<T){T=p;lo=L;no=N}}