## Archive for R

## Model-Based Clustering, Classification, and Density Estimation Using mclust in R [not a book review]

Posted in Statistics with tags Adrian Raftery, book reviews, Chapman & Hall, classification, clustering, CRC Press, mclust, R, The R Series on May 29, 2023 by xi'an## operation precisely impossible

Posted in Books, Kids, R, University life with tags almost.unique(), arithmetics, bazar, brute force, CRAN, evolution tree, mathematical puzzle, OEIS sequence, R, R package, riddle, The Riddler, unique() on May 13, 2023 by xi'an**S**ince the solution to the previous riddle from The Riddler on the maximum of different terms in the composed operation

a∅b∅c∅d∅e∅f

depending on the bracketing ordering and the meaning of each ∅ among one of the six elementary operations got posted today as 974,860, I got back to my R code to understand why it differed from my figures by two orders of magnitude and realised I was overly trusting the R function *unique*. As it was returning more “different” entries than it should have, especially when choosing the six starting numbers (a,…,f) as Uniform (0,1). Using integers instead led for instance to 946,558, which was not so far from the target. But still imprecise as to whether or not some entries had been counted several times. I mentioned the issue to Robin, who rose to the challenge and within minutes came up with using the R function *almost.unique* from the CRAN package **bazar**, then producing outcomes like 974,513, hence quite close to 974,860 for random initialisations!

## operation impossible

Posted in Books, R with tags arithmetics, brute force, evolution tree, mathematical puzzle, R, riddle, The Riddler on April 30, 2023 by xi'an**A** riddle from The Riddler on how many different numbers one could at most produce from six initial values and the four basic operations. In other words, how many values could the terms in

a∅(b∅{c∅[d∅(e∅f)]})

and other positioning of the brackets could take? (With each ∅ being one of the four operations and a,…,f the initial values or a permutation of these.) A very crude evaluation leads to twenty million possible values, forgetting that addition and multiplication are commutative, while subtraction and division are anti-commutative. I tried a brute force approach in R, rather than exploring the tree of possible paths, but could not approach this figure by far, the number of different values still increasing for the largest manageable number of replicas I could try. Reducing the initial values at n=3, I could come closer to 123 with 95 different values and, for n=4, not too far from 1972 with 1687 values. Moving to n=6 by hopefully exhausting all (?) entries led to another guess of 50,524,809 values. I am however unsure that the R code is able to spot identical values, given the variability of the above figure…

## simulated annealing and logistic regression to the max

Posted in Kids, R, Statistics with tags guessing game, logistic regression, mathematical puzzle, order statistics, R, simulated annealing, The Riddler on April 26, 2023 by xi'an**A** Riddler puzzle on the three binary and sequential questions one should ask three players hiding their respective U(0,1) realisation, U, V, and W, to best guess which player holds the largest number, max{U,V,W}. Assuming questions of the type *Is U<b¹*, &tc., the challenge boils down to selecting seven bounds (one for U, two for V, and four for W) in order to optimise the probability of picking the right player. Which means for a given collection of such bounds to learn this probability from the three binary answers. These can be turned into eight (2³) binary variables and I used them as entries in a logistic regression to predict that W was larger than max(U,V), itself predicted by the first two answers. The optimisation of the bounds can then be achieved by simulated annealing (or otherwise) and the approach returns (random) outputs like the following bounds (one on U, two on V, and four on W)

```
````0.616 0.434 0.830 0.350 0.736 0.913 0.796 0.827`

for an estimated probability of 0.827. This is a somewhat coherent sequence of bounds when considering the simpler case of two players. Indeed, with three bounds, the probability of winning can be readily derived as

logically optimised by (b¹,b²,b³)=(1/2,1/4,3/4) for a success probability of 0.875. And it coïncides with the solution posted by The Riddler, although there is no intuition behind the figures, contrary to the two player situation. In fact, I am surprised that the bound on W does not equate the expectation of max{U,V} under the current conditions:

> x=runif(1e6,0,.616);y=runif(1e6,0,.434) > mean(y+(x-y>0)*(x-y)) [1] 0.3591648