Archive for the R Category

plusquamperfect squares

Posted in Books, Kids, R with tags , , , on April 2, 2021 by xi'an

A perfect riddle:

For some perfect squares, when you remove the last digit, you get another perfect square. The first five perfect squares are 16, 49, 169, 256 and 361. What are the next three ones? Is there a more than perfect square other than 169 such that removing the last two digits returns a perfect square?

Writing an R code for plusquamperfect squares is straightforward and returns the following first 20 values

 [1]         16         49        169        256        361       1444
 [7]       3249      18496      64009     237169     364816     519841
[13]    2079364    4678569   26666896   92294449  341991049  526060096
[19]  749609641 2998438564

Adding the second constraint does not return a solution other than 169.

composition versus inversion

Posted in Books, Kids, R, Statistics with tags , , , , , , , on March 31, 2021 by xi'an

While trying to convey to an OP on X validated why the inversion method was not always the panacea in pseudo-random generation, I took the example of a mixture of K exponential distributions when K is very large, in order to impress (?) upon said OP that solving F(x)=u for such a closed-form cdf F was very costly even when using a state-of-the-art (?) inversion algorithm, like uniroot, since each step involves adding the K terms in the cdf. Selecting the component from the cumulative distribution function on the component proves to be quite fast since using the rather crude

x=rexp(1,lambda[1+sum(runif(1)>wes)])

brings a 100-fold improvement over

Q = function(u) uniroot((function(x) F(x) - u), lower = 0, 
    upper = qexp(.999,rate=min(la)))[1] #numerical tail quantile
x=Q(runif(1))

when K=10⁵, as shown by a benchmark call

         test elapsed
1       compo   0.057
2      Newton  45.736
3     uniroot   5.814

where Newton denotes a simple-minded Newton inversion. I wonder if there is a faster way to select the component in the mixture. Using a while loop starting from the most likely components proves to be much slower. And accept-reject solutions are invariably slow or fail to work with such a large number of components. Devroye’s Bible has a section (XIV.7.5) on simulating sums of variates from an infinite mixture distribution, but, for once,  nothing really helpful. And another section (IV.5) on series methods, where again I could not find a direct connection.

handbook of mixture analysis [review]

Posted in Books, R, Statistics with tags , , , , , , , , , on March 19, 2021 by xi'an

“In my opinion, the editors have done an excellent job when selecting the contents of the handbook and putting the different chapters together. For instance, this can be appreciated by the fact that, despite the large number of authors and contributions, all chapters have kept the same notation. Furthermore, in addition to a sound description of the underlying theory and methods, several chapters include information about how to fit the presented models using the R programming language. However, I missed pointers to repositories to download the code and datasets for some of the examples used in the book. To sum up, this is an excellent reference book on mixture models.” Virgilio Gómez-Rubio, JRSS A, 2021

meandering

Posted in Books, Kids, R, Statistics with tags , , , , , , , on March 12, 2021 by xi'an

A bit of a misunderstanding from Randall Munroe and then some: the function F returns a triplet, hence G should return a triplet as well. Even if the limit does return three identical values. And he should have also included the (infamous) harmonic mean! And the subtext (behind the picture) mentions random forest statistics, using every mean one can think of and dropping those that are doing worse, while here all solutions return the same value, hence do not directly discriminate between the averages (and there is no objective function to create the nodes in the trees, &tc.).

Here is a test R code including the harmonic mean:

xkcd=function(x)c(mean(x),exp(mean(log(x))),median(x),1/mean(1/x))
xxxkcd=function(x,N=10)ifelse(rep(N==1,4),xkcd(x),xxxkcd(xkcd(x),N-1))
xxxkcd(rexp(11))
[1] 1.018197 1.018197 1.018197 1.018197

stack overload

Posted in Books, Kids, R with tags , , , , , on March 3, 2021 by xi'an

The Riddle this week is rather straightforward to explain: stacking identical objects (bars of length and mass two, say) on top of one another so that the center of each new bar is uniformly distributed along the previous bar, what is the distribution of the number of bars when the stack collapses? If I am not confused, the stack collapses the first time the centre of gravity of an upper stack leaves the interval represented by the bar just below. Namely

\left|\frac{1}{N-j} \sum_{i=j+1}^N x_i -x_j\right|>1

when the xi are the bar centres, or equivalently

\max_{2\le j\le N-1} \left|\frac{1}{N-j} \sum_{i=j+1}^N \sum_{k=j+1}^i\epsilon_i \right|>1

where the ε_i‘s are U(-1,1). Which is straightforward to code in R by looking at means of cumulated sums.