## composition versus inversion

Posted in Books, Kids, R, Statistics with tags , , , , , , , on March 31, 2021 by xi'an

While trying to convey to an OP on X validated why the inversion method was not always the panacea in pseudo-random generation, I took the example of a mixture of K exponential distributions when K is very large, in order to impress (?) upon said OP that solving F(x)=u for such a closed-form cdf F was very costly even when using a state-of-the-art (?) inversion algorithm, like uniroot, since each step involves adding the K terms in the cdf. Selecting the component from the cumulative distribution function on the component proves to be quite fast since using the rather crude

`x=rexp(1,lambda[1+sum(runif(1)>wes)])`

brings a 100-fold improvement over

```Q = function(u) uniroot((function(x) F(x) - u), lower = 0,
upper = qexp(.999,rate=min(la)))[1] #numerical tail quantile
x=Q(runif(1))```

when K=10⁵, as shown by a benchmark call

```         test elapsed
1       compo   0.057
2      Newton  45.736
3     uniroot   5.814
```

where Newton denotes a simple-minded Newton inversion. I wonder if there is a faster way to select the component in the mixture. Using a while loop starting from the most likely components proves to be much slower. And accept-reject solutions are invariably slow or fail to work with such a large number of components. Devroye’s Bible has a section (XIV.7.5) on simulating sums of variates from an infinite mixture distribution, but, for once,  nothing really helpful. And another section (IV.5) on series methods, where again I could not find a direct connection.

## meandering

Posted in Books, Kids, R, Statistics with tags , , , , , , , on March 12, 2021 by xi'an

A bit of a misunderstanding from Randall Munroe and then some: the function F returns a triplet, hence G should return a triplet as well. Even if the limit does return three identical values. And he should have also included the (infamous) harmonic mean! And the subtext (behind the picture) mentions random forest statistics, using every mean one can think of and dropping those that are doing worse, while here all solutions return the same value, hence do not directly discriminate between the averages (and there is no objective function to create the nodes in the trees, &tc.).

Here is a test R code including the harmonic mean:

```xkcd=function(x)c(mean(x),exp(mean(log(x))),median(x),1/mean(1/x))
xxxkcd=function(x,N=10)ifelse(rep(N==1,4),xkcd(x),xxxkcd(xkcd(x),N-1))
xxxkcd(rexp(11))
[1] 1.018197 1.018197 1.018197 1.018197
```