## the Grumble distribution and an ODE

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on December 3, 2014 by xi'an

As ‘Og’s readers may have noticed, I paid some recent visits to Cross Validated (although I find this too addictive to be sustainable on a long term basis!, and as already reported a few years ago frustrating at several levels from questions asked without any preliminary personal effort, to a lack of background material to understand hints towards the answer, to not even considering answers [once the homework due date was past?], &tc.). Anyway, some questions are nonetheless great puzzles, to with this one about the possible transformation of a random variable R with density

$p(r|\lambda) = \dfrac{2\lambda r\exp\left(\lambda\exp\left(-r^{2}\right)-r^{2}\right)}{\exp\left(\lambda\right)-1}$

into a Gumble distribution. While the better answer is that it translates into a power law,

$V=e^{e^{-R^2}}\sim q(v|\lambda)\propto v^{\lambda-1}\mathbb{I}_{(1,e)}(v)$,

I thought using the S=R² transform could work but obtained a wrong sign in the pseudo-Gumble density

$W=S-\log(\lambda)\sim \eth(w)\propto\exp\left(\exp(-w)-w\right)$

and then went into seeking another transform into a Gumbel rv T, which amounted to solve the differential equation

$\exp\left(-e^{-t}-t\right)\text{d}t=\exp\left(e^{-w}-w\right)\text{d}w$

As I could not solve analytically the ODE, I programmed a simple Runge-Kutta numerical resolution as follows:

solvR=function(prec=10^3,maxz=1){
z=seq(1,maxz,le=prec)
t=rep(1,prec) #t(1)=1
for (i in 2:prec)
t[i]=t[i-1]+(z[i]-z[i-1])*exp(-z[i-1]+
exp(-z[i-1])+t[i-1]+exp(-t[i-1]))
zold=z
z=seq(.1/maxz,1,le=prec)
t=c(t[-prec],t)
for (i in (prec-1):1)
t[i]=t[i+1]+(z[i]-z[i+1])*exp(-z[i+1]+
exp(-z[i+1])+t[i+1]+exp(-t[i+1]))
return(cbind(c(z[-prec],zold),t))
}


Which shows that [the increasing] t(w) quickly gets too large for the function to be depicted. But this is a fairly useless result in that a transform of the original variable and of its parameter into an arbitrary distribution is always possible, given that  W above has a fixed distribution… Hence the pun on Gumble in the title.

## some LaTeX tricks

Posted in Books, Kids, Statistics, University life with tags , , , , on November 21, 2014 by xi'an

Here are a few LaTeX tricks I learned or rediscovered when working on several papers the past week:

1. I am always forgetting how to make aligned equations with a single equation number, so I found this solution on the TeX forum of stackexchange, Namely use the equation environment and then an aligned environment inside. Or the split environment. But it does not always work…
2. Another frustrating black hole is how to deal with integral signs that do not adapt to the integrand. Too bad we cannot use \left\int, really! Another stackexchange question led me to the bigints package. Not perfect though.
3. Pierre Pudlo also showed me the commands \graphicspath{{dir1}{dir2}} and \DeclareGraphicsExtensions{.pdf,.png,.jpg} to avoid coding the entire path to each image and to put an order on the extension type, respectively. The second one is fairly handy when working on drafts. The first one does not seem to work with symbolic links, though…

## unicode in LaTeX

Posted in Books, Linux, Statistics, University life with tags , , , , , , on October 9, 2014 by xi'an

As I was hurriedly trying to cram several ‘Og posts into a conference paper (!), I looked around for a way of including Unicode characters straight away. And found this solution on StackExchange:

\usepackage[mathletters]{ucs}
\usepackage[utf8x]{inputenc}

which just suited me fine!

## Le Monde sans puzzle [& sans penguins]

Posted in Books, Kids, R, University life with tags , , , , , on April 12, 2014 by xi'an

As the Le Monde mathematical puzzle of this week was a geometric one (the quadrangle ABCD is divided into two parts with the same area, &tc…) , with no clear R resolution, I chose to bypass it. In this April 3 issue, several items of interest: first, a report by Etienne Ghys on Yakov Sinaï’s Abel Prize for his work “between determinism and randomness”, centred on ergodic theory for dynamic systems, which sounded like the ultimate paradox the first time I heard my former colleague Denis Bosq give a talk about it in Paris 6. Then a frightening fact: the summer conditions have been so unusually harsh in Antarctica (or at least near the Dumont d’Urville French austral station) that none of the 15,000 Adélie penguin couples studied there managed to keep their chick alive. This was due to an ice shelf that did not melt at all over the summer, forcing the penguins to walk an extra 40k to reach the sea… Another entry on the legal obligation for all French universities to offer a second chance exam, no matter how students are evaluated in the first round. (Too bad, I always find writing a second round exam a nuisance.)

## Le Monde puzzle [#843]

Posted in Books, Kids, R with tags , , , , , on December 7, 2013 by xi'an

A Le Monde mathematical puzzle of moderate difficulty:

How many binary quintuplets (a,b,c,d,e) can be found such that any pair of quintuplets differs by at least two digits?

I solved it by the following R code that iteratively eliminates quintuplets that are not different enough from the first ones, for a random order of the 2⁵ quintuplets because the order matters in the resulting number (the intToBits trick was provided by an answer on StackExchange/stackoverflow):

sol=0
for (t in 1:10^5){ #random permutations
as.integer(intToBits(x))})[1:5,sample(1:32)]
V=32;inin=rep(TRUE,V);J=1
while (J<V){
for (i in (J+1):V)
inin[i]=FALSE
J=J+1}
if (sol<V){
}


which returns solutions like

> sol
[1] 16
> levote
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]  0    0    0    0    1    1    1    1    0     1     0
[2,]  0    1    0    1    0    1    0    1    0     1     1
[3,]  0    1    1    0    1    0    1    1    1     0     0
[4,]  0    1    1    1    0    0    0    0    0     1     0
[5,]  0    0    0    0    0    0    1    0    0     0     0
[,12] [,13] [,14] [,15] [,16]
[1,]    0    1     1     0     1
[2,]    0    1     1     0     1
[3,]    1    0     0     1     1
[4,]    0    0     1     1     0
[5,]    1    0     1     0     1


In the same Science leaflet, Marco Zito had yet another tribune worth bloggin’ about (or against), under the title “Voyage au bout du bruit” (with no apologies to Céline!), where he blathers about (background) noise [“bruit”] versus signal without ever mentioning statistics. I will not repeat the earlier feat of translating the tribune, but he also includes an interesting trivia: in the old TV sets of my childhood, the “snow” seen in the absence of transmission signal is due in part to CMB!

## numbers

Posted in Statistics with tags , , , , on December 2, 2012 by xi'an

Last week, the ‘Og reached 2000 posts, 4000 comments, and 600,000 views. These are the most popular entries

 In{s}a(ne)!! 8,277 “simply start over and build something better” 7,069 George Casella 5,757 Julien on R shortcomings 3,226 Sudoku via simulated annealing 2,995 #2 blog for the statistics geek?! 2,676 Bayesian p-values 2,395 Solution manual to Bayesian Core on-line 2,111 Of black swans and bleak prospects 2,009 Solution manual for Introducing Monte Carlo Methods with R 1,996 Parallel processing of independent Metropolis-Hastings algorithms 1,862 Bayes’ Theorem 1,721 Bayes on the Beach 2010 [2] 1,718 Do we need an integrated Bayesian/likelihood inference? 1,585 Coincidence in lotteries 1,486 Julian Besag 1945-2010 1,407

As noted earlier this year, the posts on the future of R remain the top visited posts. Sadly and comfortingly, the entry I wrote for mourning George passing away was the most visited this year. Bayes on the Beach 2010 [2] gets traffic for the wrong reason, simply for mentioning Surfers’ Paradise… As a coincidence, I also reached the 4000 level on Stack Exchange – Cross Validation, but this is so completely anecdotal…

## an unbiased estimator of the Hellinger distance?

Posted in Statistics with tags , , , on October 22, 2012 by xi'an

Here is a question I posted on Stack Exchange a while ago:

In a setting where one observes X1,…,Xn distributed from a distribution with (unknown) density f, I wonder if there is an unbiased estimator (based on the Xi‘s) of the Hellinger distance to another distribution with known density f0, namely

$\mathfrak{H}(f,f_0)=\left\{1-\int\sqrt{f_0(x)/(x)}\text{d}x\right\}^{1/2}$
Now, Paulo has posted an answer that is rather interesting, if formally “off the point”. There exists a natural unbiased estimator of if not of H, based on the original sample and using the alternative representation
$\mathfrak{H}^2(f,f_0)=1-\mathbb{E}_f[\sqrt{f_0(X)/f(X)}]$

for the Hellinger distance. In addition, this estimator is guaranteed to enjoy a finite variance since

$\mathbb{E}_f[\sqrt{f_0(X)/f(X)}^2]=1\,.$

Considering this question again, I am now fairly convinced there cannot be an unbiased estimator of H, as it behaves like a standard deviation for which there usually is no unbiased estimator!