## sum of Paretos

Posted in Books, Kids, R, Statistics with tags , , , , , , on May 12, 2022 by xi'an

A rather curious question on X validated about the evolution of

$\mathbb E^{U,V}\left[\sum_{i=1}^M U_i\Big/\sum_{i=1}^M U_i/V_i \right]\quad U_i,V_i\sim\mathcal U(0,1)$

when M increases. Actually, this expectation is asymptotically equivalent to

$\mathbb E^{V}\left[M\big/\sum_{i=1}^M 2U_i/V_i \right]\quad U_i,V_i\sim\mathcal U(0,1)$

or again

$\mathbb E^{V}\left[1\big/(1+2\overline R_{M/2})\right]$

where the average is made of Pareto (1,1), since one can invoke Slutsky many times. (And the above comparison of the integrated rv’s does not show a major difference.) Comparing several Monte Carlo sequences shows a lot of variability, though, which is not surprising given the lack of expectation of the Pareto (1,1) distribution. But over the time I spent on that puzzle last week end, I could not figure the limiting value, despite uncovering the asymptotic behaviour of the average.

## truncated mixtures

Posted in Books, pictures, R, Statistics with tags , , , , , on May 4, 2022 by xi'an

A question on X validated about EM steps for a truncated Normal mixture led me to ponder whether or not a more ambitious completion [more ambitious than the standard component allocation] was appropriate. Namely, if the mixture is truncated to the interval (a,b), with an observed sample x of size n, this sample could be augmented into an untrucated sample y by latent samples over the complement of (a,b), with random sizes corresponding to the probabilities of falling within (-∞,a), (a,b), and (b,∞). In other words, y is made of three parts, including x, with sizes N¹, n, N³, respectively, the vector (N¹, n, N³) being a trinomial M(N⁺,p) random variable and N⁺ an extra unknown in the model. Assuming a (pseudo-) conjugate prior, an approximate Gibbs sampler can be run (by ignoring the dependence of p on the mixture parameters!). I did not go as far as implementing the idea for the mixture, but had a quick try for a simple truncated Normal. And did not spot any explosive behaviour in N⁺, which is what I was worried about.  Of course, this is mostly anecdotal since the completion does not bring a significant improvement in coding or convergence (the plots corresponds to 10⁴ simulations, for a sample of size n=400).

## sketchbook [stop the war!]

Posted in Books, pictures, Statistics, University life with tags , , , , on April 24, 2022 by xi'an

## information loss from the median

Posted in Books, Kids, Statistics with tags , , , , , , on April 19, 2022 by xi'an

An interesting side item from a X validated question about calculating the Fisher information for the Normal median (as an estimator of the mean). While this information is not available in closed form, it has a “nice” expression

$1+n\mathbb E[Z_{n/2:n}\varphi(Z_{n/2:n})]-n\mathbb E[Z_{n/2:n-1}\varphi(Z_{n/2:n-1})]+$
$\frac{n(n-1)}{n/2-2}\varphi(Z_{n/2-2:n-2})^2+\frac{n(n-1)}{n-n/2-1}\varphi(Z_{n/2:n-2})^2$

which can easily be approximated by simulation (much faster than by estimating the variance of said median). This shows that the median is about 1.57 less informative than the empirical mean. Bonus points for computing the information brought by the MAD statistic! (The information loss against the MLE is 2.69,  since the Monte Carlo ratio of their variances is 0.37.)

## mixtures of sums vs. sum of mixtures

Posted in Statistics with tags , , , on April 13, 2022 by xi'an

A (mildly) interesting question on X validated last nigh, namely the distribution of a sum of n iid variables distributed from a mixture of exponentials. The rather obvious answer is a mixture of (n+1) distributions, each of them corresponding to a sum of two Gamma variates (but for the extreme cases). But the more interesting component for my personal consumption is that the distribution of this sum of two Gammas with different scales writes up as a signed mixture of Gammas, which comes as an handy (if artificial) illustration for a paper we are completing with Julien Stoehr.