## Le Monde puzzle [#1085]

Posted in Books, Kids, R with tags , , , , , on February 18, 2019 by xi'an

A new Le Monde mathematical puzzle in the digit category:

Given 13 arbitrary relative integers chosen by Bo, Abigail can select any subset of them to be drifted by plus or minus one by Bo, repeatedly until Abigail reaches the largest possible number N of multiples of 5. What is the minimal possible value of N under the assumption that Bo tries to minimise it?

I got stuck on that one, as building a recursive functiion led me nowhere: the potential for infinite loop (add one, subtract one, add one, …) rather than memory issues forced me into a finite horizon for the R function, which then did not return anything substantial in a manageable time. Over the week and the swimming sessions, I thought of simplifying the steps, like (a) work modulo 5, (b) bias moves towards 1 or 4, away from 2 and 3, by keeping only one entry in 2 and 3, and all but one at 1 and 4, but could only produce five 0’s upon a sequence of attempts… With the intuition that only 3 entries should remain in the end, which was comforted by Le Monde solution the week after.

## Fisher’s lost information

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , on February 11, 2019 by xi'an

After a post on X validated and a good discussion at work, I came to the conclusion [after many years of sweeping the puzzle under the carpet] that the (a?) Fisher information obtained for the Uniform distribution U(0,θ) as θ⁻¹ is meaningless. Indeed, there are many arguments:

1. The lack of derivability of the indicator function for x=θ is a non-issue since the derivative is defined almost everywhere.
2. In many textbooks, the Fisher information θ⁻² is derived from the Fréchet-Darmois-Cramèr-Rao inequality, which does not apply for the Uniform U(0,θ) distribution.
3. One connected argument for the expression of the Fisher information as the expectation of the squared score is that it is the variance of the score, since its expectation is zero. Except that it is not zero for the Uniform U(0,θ) distribution.
4. For the same reason, the opposite of the second derivative of the log-likelihood is not equal to the expectation of the squared score. It is actually -θ⁻²!
5. Looking at the Taylor expansion justification of the (observed) Fisher information, expanding the log-likelihood around the maximum likelihood estimator does not work since the maximum likelihood estimator does not cancel the score.
6. When computing the Fisher information for an n-sample rather than a 1-sample, the information is n²θ⁻², rather than nθ⁻².
7. Since the speed of convergence of the maximum likelihood estimator is of order n⁻², the central limit theorem does not apply and the limiting variance of the maximum likelihood estimator is not the Fisher information.

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on February 8, 2019 by xi'an

As a reading suggestion for my (last) OxWaSP Bayesian course at Oxford, I included the classic 1973 Marginalisation paradoxes by Phil Dawid, Mervyn Stone [whom I met when visiting UCL in 1992 since he was sharing an office with my friend Costas Goutis], and Jim Zidek. Paper that also appears in my (recent) slides as an exercise. And has been discussed many times on this  ‘Og.

Reading the paper in the train to Oxford was quite pleasant, with a few discoveries like an interesting pike at Fraser’s structural (crypto-fiducial?!) distributions that “do not need Bayesian improper priors to fall into the same paradoxes”. And a most fascinating if surprising inclusion of the Box-Müller random generator in an argument, something of a precursor to perfect sampling (?). And a clear declaration that (right-Haar) invariant priors are at the source of the resolution of the paradox. With a much less clear notion of “un-Bayesian priors” as those leading to a paradox. Especially when the authors exhibit a red herring where the paradox cannot disappear, no matter what the prior is. Rich discussion (with none of the current 400 word length constraint), including the suggestion of neutral points, namely those that do identify a posterior, whatever that means. Funny conclusion, as well:

“In Stone and Dawid’s Biometrika paper, B1 promised never to use improper priors again. That resolution was short-lived and let us hope that these two blinkered Bayesians will find a way out of their present confusion and make another comeback.” D.J. Bartholomew (LSE)

and another

“An eminent Oxford statistician with decidedly mathematical inclinations once remarked to me that he was in favour of Bayesian theory because it made statisticians learn about Haar measure.” A.D. McLaren (Glasgow)

and yet another

“The fundamentals of statistical inference lie beneath a sea of mathematics and scientific opinion that is polluted with red herrings, not all spawned by Bayesians of course.” G.N. Wilkinson (Rothamsted Station)

Lindley’s discussion is more serious if not unkind. Dennis Lindley essentially follows the lead of the authors to conclude that “improper priors must go”. To the point of retracting what was written in his book! Although concluding about the consequences for standard statistics, since they allow for admissible procedures that are associated with improper priors. If the later must go, the former must go as well!!! (A bit of sophistry involved in this argument…) Efron’s point is more constructive in this regard since he recalls the dangers of using proper priors with huge variance. And the little hope one can hold about having a prior that is uninformative in every dimension. (A point much more blatantly expressed by Dickey mocking “magic unique prior distributions”.) And Dempster points out even more clearly that the fundamental difficulty with these paradoxes is that the prior marginal does not exist. Don Fraser may be the most brutal discussant of all, stating that the paradoxes are not new and that “the conclusions are erroneous or unfounded”. Also complaining about Lindley’s review of his book [suggesting prior integration could save the day] in Biometrika, where he was not allowed a rejoinder. It reflects on the then intense opposition between Bayesians and fiducialist Fisherians. (Funny enough, given the place of these marginalisation paradoxes in his book, I was mistakenly convinced that Jaynes was one of the discussants of this historical paper. He is mentioned in the reply by the authors.)

## Le Monde puzzle [#1083]

Posted in Books, Kids, R, Travel with tags , , , , , , on February 7, 2019 by xi'an

A Le Monde mathematical puzzle that seems hard to solve without the backup of a computer (and just simple enough to code on a flight to Montpellier):

Given the number N=2,019, find a decomposition of N as a sum of non-trivial powers of integers such that (a) the number of integers in the sum is maximal or (b) all powers are equal to 4.  Is it possible to write N as a sum of two powers?

It is straightforward to identify all possible terms in these sums by listing all powers of integers less than N

pool=(1:trunc(sqrt(2019)))^2
for (pow in 3:11)
pool=unique(c(pool,(2:trunc(2019^(1/pow)))^pow))


which leads to 57 distinct powers. Sampling at random from this collection at random produces a sum of 21 perfect powers:

 1+4+8+9+16+25+27+32+36+49+64+81+100+121+125+128+144+169+196+243+441

But looking at the 22 smallest numbers in the pool of powers leads to 2019, which is a sure answer. Restricting the terms to powers of 4 leads to the sequence

1⁴+2⁴+3⁴+5⁴+6⁴ = 2019

And starting from the pools of all possible powers in a decomposition of 2019 as the sum of two powers shows this is impossible.

## efficiency and the Fréchet-Darmois-Cramèr-Rao bound

Posted in Books, Kids, Statistics with tags , , , , , , , , , , , on February 4, 2019 by xi'an

Following some entries on X validated, and after grading a mathematical statistics exam involving Cramèr-Rao, or Fréchet-Darmois-Cramèr-Rao to include both French contributors pictured above, I wonder as usual at the relevance of a concept of efficiency outside [and even inside] the restricted case of unbiased estimators. The general (frequentist) version is that the variance of an estimator δ of [any transform of] θ with bias b(θ) is

I(θ)⁻¹ (1+b'(θ))²

while a Bayesian version is the van Trees inequality on the integrated squared error loss

(E(I(θ))+I(π))⁻¹

where I(θ) and I(π) are the Fisher information and the prior entropy, respectively. But this opens a whole can of worms, in my opinion since

• establishing that a given estimator is efficient requires computing both the bias and the variance of that estimator, not an easy task when considering a Bayes estimator or even the James-Stein estimator. I actually do not know if any of the estimators dominating the standard Normal mean estimator has been shown to be efficient (although there exist results for closed form expressions of the James-Stein estimator quadratic risk, including one of mine the Canadian Journal of Statistics published verbatim in 1988). Or is there a result that a Bayes estimator associated with the quadratic loss is by default efficient in either the first or second sense?
• while the initial Fréchet-Darmois-Cramèr-Rao bound is restricted to unbiased estimators (i.e., b(θ)≡0) and unable to produce efficient estimators in all settings but for the natural parameter in the setting of exponential families, moving to the general case means there exists one efficiency notion for every bias function b(θ), which makes the notion quite weak, while not necessarily producing efficient estimators anyway, the major impediment to taking this notion seriously;
• moving from the variance to the squared error loss is not more “natural” than using any [other] convex combination of variance and squared bias, creating a whole new class of optimalities (a grocery of cans of worms!);
• I never got into the van Trees inequality so cannot say much, except that the comparison between various priors is delicate since the integrated risks are against different parameter measures.

## L’enfant de poussière [book review]

Posted in Books, Kids with tags , , , , , , , , , on February 3, 2019 by xi'an

I read this book in French, as this was the language in which it was written and also because I was given a free copy for writing a review! This is a rather unusual book, the first volume of a series called the cycle of Syffe (where Syffe is both the main character and the name of a tribe), well-written by a young author, although the style is at time a wee bit heavy. As for instance in “Les mains sur les hanches, mes yeux balayèrent l’horizon qui semblait s’étaler de la pointe de mes bottes jusqu’au bout du monde.”

The story in itself borrows to some usual memes of the genre, from following a group of young people (very young in this case), forced into dramatic circumstances by the upheaval of their world, here the death of a king leading to a breakup of his kingdom, and meeting unexpected tutors who will turn them into heroes of sort, if they survive the training. The closest books I can think of are (my favourite) Elizabeth Moon’s Deed of Paksenarrion (without the über-religious aspects [so far!]) and Glen Cook’s Black Company, which both follow mercenary companies in a fragmented world at war. A little bit of Mark Lawrence’s Prince of Thorn as well, since in the later a young kid is driving a band of bandits. And not to forget Joe Abercrombie for the rather similar gritty style. (Gritty enough to make me decide after a few chapters that this was definitely not a young adult novel, as I had doubts about it first.)

The book, first of the cycle, thus follows the misadventures of a very young orphan, and I repeat “very young”, because this is an issue with the story, when 8 to 10 years old are shown in situations and with attitudes that do not sound likely. Even for orphans, even in a medieval world with short lifespans and plenty of economic reasons to turn kids into cheap labour. From spy, to stable boy, to child-soldier. Without turning to spoilers, there are also a bucketful of fortune reversals in the book, meaning that the surroundings and circumstances keep changing, sometimes really fast, sometimes quite slowly, as with the years when Syffe acquires fighting skills from an old mercenary from a tribe of free and deadly fighters. The pace is still good enough for the book to be a page-turner that I read in less than a week! And the few battle scenes are realistic in the Abercrombie referential, that is, with everyone scared and unclear why they are there. There is also some magic involved, which is always a risk in the plot, but apart from a lengthy passage on a malevolent Dream with much too real consequences (nothing to do with Tel’aran’rhiod in the Wheel of Time!), the author handles it quite well, maintaining an ambivalence in Syffe about his super-natural experiences, supported by one of his mentors’ freethinker ethics. As for the completeness of the background, i.e., the universe imagined by the author, it often feels too provincial, too local, with the incoming wars between the local lords sounding very much parochial, although the scope gets gradually wider, along with the maturation of Syffe and the darkening of the overall atmosphere. After finishing the book, I read that seven volumes in total are planned in the cycle!

## estimation exam [best of]

Posted in Books, Kids, Statistics with tags , , , , , , , , on January 29, 2019 by xi'an

Yesterday, I received a few copies of our CRC Press Handbook of Mixture Analysis, while grading my mathematical statistics exam 160 copies. Among the few goodies, I noticed the always popular magical equality

E[1/T]=1/E[T]

that must have been used in so many homeworks and exam handouts by now that it should become a folk theorem. More innovative is the argument that E[1/min{X¹,X²,…}] does not exist for iid U(0,θ) because it is the minimum and thus is the only one among the order statistics with the ability to touch zero. Another universal shortcut was the completeness conclusion that when the integral

$\int_0^\theta \varphi(x) x^k \text{d}x$

was zero for all θ’s then φ had to be equal to zero with no further argument (only one student thought to take the derivative). Plus a growing inability in the cohort to differentiate even simple functions… (At least, most students got the bootstrap right, as exemplified by their R code.) And three stars to the student who thought of completely gluing his anonymisation tag, on every one of his five sheets!, making identification indeed impossible, except by elimination of the 159 other names.