## Le Monde sans puzzle [& sans penguins]

Posted in Books, Kids, R, University life with tags , , , , , on April 12, 2014 by xi'an

As the Le Monde mathematical puzzle of this week was a geometric one (the quadrangle ABCD is divided into two parts with the same area, &tc…) , with no clear R resolution, I chose to bypass it. In this April 3 issue, several items of interest: first, a report by Etienne Ghys on Yakov Sinaï’s Abel Prize for his work “between determinism and randomness”, centred on ergodic theory for dynamic systems, which sounded like the ultimate paradox the first time I heard my former colleague Denis Bosq give a talk about it in Paris 6. Then a frightening fact: the summer conditions have been so unusually harsh in Antarctica (or at least near the Dumont d’Urville French austral station) that none of the 15,000 Adélie penguin couples studied there managed to keep their chick alive. This was due to an ice shelf that did not melt at all over the summer, forcing the penguins to walk an extra 40k to reach the sea… Another entry on the legal obligation for all French universities to offer a second chance exam, no matter how students are evaluated in the first round. (Too bad, I always find writing a second round exam a nuisance.)

## Le Monde puzzle [#843]

Posted in Books, Kids, R with tags , , , , , on December 7, 2013 by xi'an

A Le Monde mathematical puzzle of moderate difficulty:

How many binary quintuplets (a,b,c,d,e) can be found such that any pair of quintuplets differs by at least two digits?

I solved it by the following R code that iteratively eliminates quintuplets that are not different enough from the first ones, for a random order of the 2⁵ quintuplets because the order matters in the resulting number (the intToBits trick was provided by an answer on StackExchange/stackoverflow):

sol=0
for (t in 1:10^5){ #random permutations
as.integer(intToBits(x))})[1:5,sample(1:32)]
V=32;inin=rep(TRUE,V);J=1
while (J<V){
for (i in (J+1):V)
inin[i]=FALSE
J=J+1}
if (sol<V){
}


which returns solutions like

> sol
[1] 16
> levote
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]  0    0    0    0    1    1    1    1    0     1     0
[2,]  0    1    0    1    0    1    0    1    0     1     1
[3,]  0    1    1    0    1    0    1    1    1     0     0
[4,]  0    1    1    1    0    0    0    0    0     1     0
[5,]  0    0    0    0    0    0    1    0    0     0     0
[,12] [,13] [,14] [,15] [,16]
[1,]    0    1     1     0     1
[2,]    0    1     1     0     1
[3,]    1    0     0     1     1
[4,]    0    0     1     1     0
[5,]    1    0     1     0     1


In the same Science leaflet, Marco Zito had yet another tribune worth bloggin’ about (or against), under the title “Voyage au bout du bruit” (with no apologies to Céline!), where he blathers about (background) noise ["bruit"] versus signal without ever mentioning statistics. I will not repeat the earlier feat of translating the tribune, but he also includes an interesting trivia: in the old TV sets of my childhood, the “snow” seen in the absence of transmission signal is due in part to CMB!

## numbers

Posted in Statistics with tags , , , , on December 2, 2012 by xi'an

Last week, the ‘Og reached 2000 posts, 4000 comments, and 600,000 views. These are the most popular entries

 In{s}a(ne)!! 8,277 “simply start over and build something better” 7,069 George Casella 5,757 Julien on R shortcomings 3,226 Sudoku via simulated annealing 2,995 #2 blog for the statistics geek?! 2,676 Bayesian p-values 2,395 Solution manual to Bayesian Core on-line 2,111 Of black swans and bleak prospects 2,009 Solution manual for Introducing Monte Carlo Methods with R 1,996 Parallel processing of independent Metropolis-Hastings algorithms 1,862 Bayes’ Theorem 1,721 Bayes on the Beach 2010 [2] 1,718 Do we need an integrated Bayesian/likelihood inference? 1,585 Coincidence in lotteries 1,486 Julian Besag 1945-2010 1,407

As noted earlier this year, the posts on the future of R remain the top visited posts. Sadly and comfortingly, the entry I wrote for mourning George passing away was the most visited this year. Bayes on the Beach 2010 [2] gets traffic for the wrong reason, simply for mentioning Surfers’ Paradise… As a coincidence, I also reached the 4000 level on Stack Exchange – Cross Validation, but this is so completely anecdotal…

## an unbiased estimator of the Hellinger distance?

Posted in Statistics with tags , , , on October 22, 2012 by xi'an

Here is a question I posted on Stack Exchange a while ago:

In a setting where one observes X1,…,Xn distributed from a distribution with (unknown) density f, I wonder if there is an unbiased estimator (based on the Xi‘s) of the Hellinger distance to another distribution with known density f0, namely

$\mathfrak{H}(f,f_0)=\left\{1-\int\sqrt{f_0(x)/(x)}\text{d}x\right\}^{1/2}$
Now, Paulo has posted an answer that is rather interesting, if formally “off the point”. There exists a natural unbiased estimator of if not of H, based on the original sample and using the alternative representation
$\mathfrak{H}^2(f,f_0)=1-\mathbb{E}_f[\sqrt{f_0(X)/f(X)}]$

for the Hellinger distance. In addition, this estimator is guaranteed to enjoy a finite variance since

$\mathbb{E}_f[\sqrt{f_0(X)/f(X)}^2]=1\,.$

Considering this question again, I am now fairly convinced there cannot be an unbiased estimator of H, as it behaves like a standard deviation for which there usually is no unbiased estimator!

## estimating a constant (not really)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on October 12, 2012 by xi'an

Larry Wasserman wrote a blog entry on the normalizing constant paradox, where he repeats that he does not understand my earlier point…Let me try to recap here this point and the various comments I made on StackExchange (while keeping in mind all this is for intellectual fun!)

The entry is somehow paradoxical in that Larry acknowledges (in that post) that the analysis in his book, All of Statistics, is wrong. The fact that “g(x)/c is a valid density only for one value of c” (and hence cannot lead to a notion of likelihood on c) is the very reason why I stated that there can be no statistical inference nor prior distribution about c: a sample from f does not bring statistical information about c and there can be no statistical estimate of c based on this sample. (In case you did not notice, I insist upon statistical!)

To me this problem is completely different from a statistical problem, at least in the modern sense: if I need to approximate the constant c—as I do in fact when computing Bayes factors—, I can produce an arbitrarily long sample from a certain importance distribution and derive a converging (and sometimes unbiased) approximation of c. Once again, this is Monte Carlo integration, a numerical technique based on the Law of Large Numbers and the stabilisation of frequencies. (Call it a frequentist method if you wish. I completely agree that MCMC methods are inherently frequentist in that sense, And see no problem with this because they are not statistical methods. Of course, this may be the core of the disagreement with Larry and others, that they call statistics the Law of Large Numbers, and I do not. This lack of separation between both notions also shows up in a recent general public talk on Poincaré’s mistakes by Cédric Villani! All this may just mean I am irremediably Bayesian, seeing anything motivated by frequencies as non-statistical!) But that process does not mean that c can take a range of values that would index a family of densities compatible with a given sample. In this Monte Carlo integration approach, the distribution of the sample is completely under control (modulo the errors induced by pseudo-random generation). This approach is therefore outside the realm of Bayesian analysis “that puts distributions on fixed but unknown constants”, because those unknown constants parameterise the distribution of an observed sample. Ergo, c is not a parameter of the sample and the sample Larry argues about (“we have data sampled from a distribution”) contains no information whatsoever about c that is not already in the function g. (It is not “data” in this respect, but a stochastic sequence that can be used for approximation purposes.) Which gets me back to my first argument, namely that c is known (and at the same time difficult or impossible to compute)!

Let me also answer here the comments on “why is this any different from estimating the speed of light c?” “why can’t you do this with the 100th digit of π?” on the earlier post or on StackExchange. Estimating the speed of light means for me (who repeatedly flunked Physics exams after leaving high school!) that we have a physical experiment that measures the speed of light (as the original one by Rœmer at the Observatoire de Paris I visited earlier last week) and that the statistical analysis infers about c by using those measurements and the impact of the imprecision of the measuring instruments (as we do when analysing astronomical data). If, now, there exists a physical formula of the kind

$c=\int_\Xi \psi(\xi) \varphi(\xi) \text{d}\xi$

where φ is a probability density, I can imagine stochastic approximations of c based on this formula, but I do not consider it a statistical problem any longer. The case is thus clearer for the 100th digit of π: it is also a fixed number, that I can approximate by a stochastic experiment but on which I cannot attach a statistical tag. (It is 9, by the way.) Throwing darts at random as I did during my Oz tour is not a statistical procedure, but simple Monte Carlo à la Buffon…

Overall, I still do not see this as a paradox for our field (and certainly not as a critique of Bayesian analysis), because there is no reason a statistical technique should be able to address any and every numerical problem. (Once again, Persi Diaconis would almost certainly differ, as he defended a Bayesian perspective on numerical analysis in the early days of MCMC…) There may be a “Bayesian” solution to this particular problem (and that would nice) and there may be none (and that would be OK too!), but I am not even convinced I would call this solution “Bayesian”! (Again, let us remember this is mostly for intellectual fun!)

## testing via credible sets

Posted in Statistics, University life with tags , , , , , , , , , , , on October 8, 2012 by xi'an

Måns Thulin released today an arXiv document on some decision-theoretic justifications for [running] Bayesian hypothesis testing through credible sets. His main point is that using the unnatural prior setting mass on a point-null hypothesis can be avoided by rejecting the null when the point-null value of the parameter does not belong to the credible interval and that this decision procedure can be validated through the use of special loss functions. While I stress to my students that point-null hypotheses are very unnatural and should be avoided at all cost, and also that constructing a confidence interval is not the same as designing a test—the former assess the precision in the estimation, while the later opposes two different and even incompatible models—, let us consider Måns’ arguments for their own sake.

The idea of the paper is that there exist loss functions for testing point-null hypotheses that lead to HPD, symmetric and one-sided intervals as acceptance regions, depending on the loss func. This was already found in Pereira & Stern (1999). The issue with these loss functions is that they involve the corresponding credible sets in their definition, hence are somehow tautological. For instance, when considering the HPD set and T(x) as the largest HPD set not containing the point-null value of the parameter, the corresponding loss function is

$L(\theta,\varphi,x) = \begin{cases}a\mathbb{I}_{T(x)^c}(\theta) &\text{when }\varphi=0\\ b+c\mathbb{I}_{T(x)}(\theta) &\text{when }\varphi=1\end{cases}$

parameterised by a,b,c. And depending on the HPD region.

Måns then introduces new loss functions that do not depend on x and still lead to either the symmetric or the one-sided credible intervals.as acceptance regions. However, one test actually has two different alternatives (Theorem 2), which makes it essentially a composition of two one-sided tests, while the other test returns the result to a one-sided test (Theorem 3), so even at this face-value level, I do not find the result that convincing. (For the one-sided test, George Casella and Roger Berger (1986) established links between Bayesian posterior probabilities and frequentist p-values.) Both Theorem 3 and the last result of the paper (Theorem 4) use a generic and set-free observation-free loss function (related to eqn. (5.2.1) in my book!, as quoted by the paper) but (and this is a big but) they only hold for prior distributions setting (prior) mass on both the null and the alternative. Otherwise, the solution is to always reject the hypothesis with the zero probability… This is actually an interesting argument on the why-are-credible-sets-unsuitable-for-testing debate, as it cannot bypass the introduction of a prior mass on Θ0!

Overall, I furthermore consider that a decision-theoretic approach to testing should encompass future steps rather than focussing on the reply to the (admittedly dumb) question is θ zero? Therefore, it must have both plan A and plan B at the ready, which means preparing (and using!) prior distributions under both hypotheses. Even on point-null hypotheses.

Now, after I wrote the above, I came upon a Stack Exchange page initiated by Måns last July. This is presumably not the first time a paper stems from Stack Exchange, but this is a fairly interesting outcome: thanks to the debate on his question, Måns managed to get a coherent manuscript written. Great! (In a sense, this reminded me of the polymath experiments of Terry Tao, Timothy Gower and others. Meaning that maybe most contributors could have become coauthors to the paper!)

## estimating a constant

Posted in Books, Statistics with tags , , , , , , , , , on October 3, 2012 by xi'an

Paulo (a.k.a., Zen) posted a comment in StackExchange on Larry Wasserman‘s paradox about Bayesians and likelihoodists (or likelihood-wallahs, to quote Basu!) being unable to solve the problem of estimating the normalising constant c of the sample density, f, known up to a constant

$f(x) = c g(x)$

(Example 11.10, page 188, of All of Statistics)

My own comment is that, with all due respect to Larry!, I do not see much appeal in this example, esp. as a potential criticism of Bayesians and likelihood-wallahs…. The constant c is known, being equal to

$1/\int_\mathcal{X} g(x)\text{d}x$

If c is the only “unknown” in the picture, given a sample x1,…,xn, then there is no statistical issue whatsoever about the “problem” and I do not agree with the postulate that there exist estimators of c. Nor priors on c (other than the Dirac mass on the above value). This is not in the least a statistical problem but rather a numerical issue.That the sample x1,…,xn can be (re)used through a (frequentist) density estimate to provide a numerical approximation of c

$\hat c = \hat f(x_0) \big/ g(x_0)$

is a mere curiosity. Not a criticism of alternative statistical approaches: e.g., I could also use a Bayesian density estimate…

Furthermore, the estimate provided by the sample x1,…,xn is not of particular interest since its precision is imposed by the sample size n (and converging at non-parametric rates, which is not a particularly relevant issue!), while I could use importance sampling (or even numerical integration) if I was truly interested in c. I however find the discussion interesting for many reasons

1. it somehow relates to the infamous harmonic mean estimator issue, often discussed on the’Og!;
2. it brings more light on the paradoxical differences between statistics and Monte Carlo methods, in that statistics is usually constrained by the sample while Monte Carlo methods have more freedom in generating samples (up to some budget limits). It does not make sense to speak of estimators in Monte Carlo methods because there is no parameter in the picture, only “unknown” constants. Both fields rely on samples and probability theory, and share many features, but there is nothing like a “best unbiased estimator” in Monte Carlo integration, see the case of the “optimal importance function” leading to a zero variance;
3. in connection with the previous point, the fascinating Bernoulli factory problem is not a statistical problem because it requires an infinite sequence of Bernoullis to operate;
4. the discussion induced Chris Sims to contribute to StackExchange!