Archive for arithmetic mean

geometric climbing

Posted in Mountains, pictures with tags , , , , , , , , , , , , on August 5, 2021 by xi'an

On the qualifying round for the Tokyo Olympics, the French climber Mickaël Mawem ended up first, while his brother Bassa was the fastest on the speed climb (as a 2018 and 2019 World Champion) but ruptured a tendon while lead climbing and had to be flown back to Paris for a operation. The New York Times inappropriately and condescendingly qualified this first position as being “unexpected” when Mickaël is the 2019 European Champion in bouldering… The NYT is piling up in its belittling by stating that “Anouck Jaubert of France used a second-place finish in speed to squeak into the final¨… (The other French female climber did not make it, despite being one of the first women to reach the 9b level.)

I remain puzzled by the whole concept of mixing the three sports together. As well as by the scoring system, based on a geometric average of the three rankings, which means in particular that the eight finalists will suffer less than in the qualifying round from a poor performance in one of the three climbs (as Adam Ondra for the speed climb). In addition, there is an obscure advantage coming to Adam Ondra for Bassa Mawem cancelling his participation: according to the NYT, “Ondra will receive a bye and an automatic slot in the speed semifinals” meaning “that a likely eighth-place finish in speed — a ranking number that can be hard to overcome in the multiplication of the combined format — will now be no worse than fourth for Ondra”. (The sentence on the strong impact due to the geometric mean is incorrect in that it has less impact that the arithmetic!)

meandering

Posted in Books, Kids, R, Statistics with tags , , , , , , , on March 12, 2021 by xi'an

A bit of a misunderstanding from Randall Munroe and then some: the function F returns a triplet, hence G should return a triplet as well. Even if the limit does return three identical values. And he should have also included the (infamous) harmonic mean! And the subtext (behind the picture) mentions random forest statistics, using every mean one can think of and dropping those that are doing worse, while here all solutions return the same value, hence do not directly discriminate between the averages (and there is no objective function to create the nodes in the trees, &tc.).

Here is a test R code including the harmonic mean:

xkcd=function(x)c(mean(x),exp(mean(log(x))),median(x),1/mean(1/x))
xxxkcd=function(x,N=10)ifelse(rep(N==1,4),xkcd(x),xxxkcd(xkcd(x),N-1))
xxxkcd(rexp(11))
[1] 1.018197 1.018197 1.018197 1.018197

an arithmetic mean identity

Posted in Books, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 19, 2019 by xi'an

A 2017 paper by Ana Pajor published in Bayesian Analysis addresses my favourite problem [of computing the marginal likelihood] and which I discussed on the ‘Og, linking with another paper by Lenk published in 2012 in JCGS. That I already discussed here last year. Lenk’s (2009) paper is actually using a technique related to the harmonic mean correction based on HPD regions Darren Wraith and myself proposed at MaxEnt 2009. And which Jean-Michel and I presented at Frontiers of statistical decision making and Bayesian analysis in 2010. As I had only vague memories about the arithmetic mean version, we discussed the paper together with graduate students in Paris Dauphine.

The arithmetic mean solution, representing the marginal likelihood as the prior average of the likelihood, is a well-known approach used as well as the basis for nested sampling. With the improvement consisting in restricting the simulation to a set Ð with sufficiently high posterior probability. I am quite uneasy about P(Ð|y) estimated by 1 as the shape of the set containing all posterior simulations is completely arbitrary, parameterisation dependent, and very random since based on the extremes of this posterior sample. Plus, the set Ð converges to the entire parameter space with the number of posterior simulations. An alternative that we advocated in our earlier paper is to take Ð as the HPD region or a variational Bayes version . But the central issue with the HPD regions is how to construct these from an MCMC output and how to compute both P(Ð) and P(Ð|y). It does not seem like a good idea to set P(Ð|x) to the intended α level for the HPD coverage. Using a non-parametric version for estimating Ð could be in the end the only reasonable solution.

As a test, I reran the example of a conjugate normal model used in the paper, based on (exact) simulations from both the prior and  the posterior, and obtained approximations that were all close from the true marginal. With Chib’s being exact in that case (of course!), and an arithmetic mean surprisingly close without an importance correction:

> print(c(hame,chme,came,chib))
[1] -107.6821 -106.5968 -115.5950 -115.3610

Both harmonic versions are of the right order but not trustworthy, the truncation to such a set Ð as the one chosen in this paper having little impact.

%d bloggers like this: