## Judith Rousseau gets Bernoulli Society Ethel Newbold Prize

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on July 31, 2015 by xi'an

As announced at the 60th ISI World Meeting in Rio de Janeiro, my friend, co-author, and former PhD student Judith Rousseau got the first Ethel Newbold Prize! Congrats, Judith! And well-deserved! The prize is awarded by the Bernoulli Society on the following basis

The Ethel Newbold Prize is to be awarded biannually to an outstanding statistical scientist for a body of work that represents excellence in research in mathematical statistics, and/or excellence in research that links developments in a substantive field to new advances in statistics. In any year in which the award is due, the prize will not be awarded unless the set of all nominations includes candidates from both genders.

and is funded by Wiley. I support very much this (inclusive) approach of “recognizing the importance of women in statistics”, without creating a prize restricted to women nominees (and hence exclusive).  Thanks to the members of the Program Committee of the Bernoulli Society for setting that prize and to Nancy Reid in particular.

Ethel Newbold was a British statistician who worked during WWI in the Ministry of Munitions and then became a member of the newly created Medical Research Council, working on medical and industrial studies. She was the first woman to receive the Guy Medal in Silver in 1928. Just to stress that much remains to be done towards gender balance, the second and last woman to get a Guy Medal in Silver is Sylvia Richardson, in 2009… (In addition, Valerie Isham, Nicky Best, and Fiona Steele got a Guy Medal in Bronze, out of the 71 so far awarded, while no woman ever got a Guy Medal in Gold.) Funny occurrences of coincidence: Ethel May Newbold was educated at Tunbridge Wells, the place where Bayes was a minister, while Sylvia is now head of the Medical Research Council biostatistics unit in Cambridge.

Posted in Books, pictures, Statistics, University life with tags , , , , , , on July 30, 2015 by xi'an

Ingmar Schuster, who visited Paris-Dauphine last Spring (and is soon to return here as a postdoc funded by Fondation des Sciences Mathématiques de Paris) has arXived last week a paper on gradient importance sampling. In this paper, he builds a sequential importance sampling (or population Monte Carlo) algorithm that exploits the additional information contained in the gradient of the target. The proposal or importance function being essentially the MALA move as its proposal, mixed across the elements of the previous population. When compared with our original PMC mixture of random walk proposals found in e.g. this paper, each term in the mixture thus involves an extra gradient, with a scale factor that decreases to zero as 1/t√t. Ingmar compares his proposal with an adaptive Metropolis, an adaptive MALTa and an HM algorithms, for two mixture distributions and the banana target of Haario et al. (1999) we also used in our paper. As well as a logistic regression. In each case, he finds both a smaller squared error and a smaller bias for the same computing time (evaluated as the number of likelihood evaluations). While we discussed this scheme when he visited, I remain intrigued as to why it works so well when compared with the other solutions. One possible explanation is that the use of the gradient drift is more efficient on a population of particles than on a single Markov chain, provided the population covers all modes of importance on the target surface: the “fatal” attraction of the local model is then much less of an issue…

## Bayesian model averaging in astrophysics

Posted in Books, Statistics, University life with tags , , , , , , , , , , on July 29, 2015 by xi'an

[A 2013 post that somewhat got lost in a pile of postponed entries and referee’s reports…]

In this review paper, now published in Statistical Analysis and Data Mining 6, 3 (2013), David Parkinson and Andrew R. Liddle go over the (Bayesian) model selection and model averaging perspectives. Their argument in favour of model averaging is that model selection via Bayes factors may simply be too inconclusive to favour one model and only one model. While this is a correct perspective, this is about it for the theoretical background provided therein. The authors then move to the computational aspects and the first difficulty is their approximation (6) to the evidence

$P(D|M) = E \approx \frac{1}{n} \sum_{i=1}^n L(\theta_i)Pr(\theta_i)\, ,$

where they average the likelihood x prior terms over simulations from the posterior, which does not provide a valid (either unbiased or converging) approximation. They surprisingly fail to account for the huge statistical literature on evidence and Bayes factor approximation, incl. Chen, Shao and Ibrahim (2000). Which covers earlier developments like bridge sampling (Gelman and Meng, 1998).

As often the case in astrophysics, at least since 2007, the authors’ description of nested sampling drifts away from perceiving it as a regular Monte Carlo technique, with the same convergence speed n1/2 as other Monte Carlo techniques and the same dependence on dimension. It is certainly not the only simulation method where the produced “samples, as well as contributing to the evidence integral, can also be used as posterior samples.” The authors then move to “population Monte Carlo [which] is an adaptive form of importance sampling designed to give a good estimate of the evidence”, a particularly restrictive description of a generic adaptive importance sampling method (Cappé et al., 2004). The approximation of the evidence (9) based on PMC also seems invalid:

$E \approx \frac{1}{n} \sum_{i=1}^n \dfrac{L(\theta_i)}{q(\theta_i)}\, ,$

is missing the prior in the numerator. (The switch from θ in Section 3.1 to X in Section 3.4 is  confusing.) Further, the sentence “PMC gives an unbiased estimator of the evidence in a very small number of such iterations” is misleading in that PMC is unbiased at each iteration. Reversible jump is not described at all (the supposedly higher efficiency of this algorithm is far from guaranteed when facing a small number of models, which is the case here, since the moves between models are governed by a random walk and the acceptance probabilities can be quite low).

The second quite unrelated part of the paper covers published applications in astrophysics. Unrelated because the three different methods exposed in the first part are not compared on the same dataset. Model averaging is obviously based on a computational device that explores the posteriors of the different models under comparison (or, rather, averaging), however no recommendation is found in the paper as to efficiently implement the averaging or anything of the kind. In conclusion, I thus find this review somehow anticlimactic.

## Egyptian fractions [Le Monde puzzle #922]

Posted in Books, Kids, R with tags , , , , , , , on July 28, 2015 by xi'an

For its summer edition, Le Monde mathematical puzzle switched to a lighter version with immediate solution. This #922 considers Egyptian fractions which only have distinct denominators (meaning the numerator is always 1) and can be summed. This means 3/4 is represented as ½+¼. Each denominator only appears once. As I discovered when looking on line, a lot of people are fascinated with this representation and have devised different algorithms to achieve decompositions with various properties. Including Fibonacci who devised a specific algorithm called the greedy algorithm in 1202 in the Liber Abaci. In the current Le Monde edition, the questions were somewhat modest and dealt with the smallest decompositions of 2/5, 5/12, and 50/77 under some additional constraint.

Since the issue was covered in so many places, I just spent one hour or so constructing a basic solution à la Fibonacci and then tried to improve it against a length criterion. Here are my R codes (using the numbers library):

osiris=function(a,b){
#can the fraction a/b be simplified
diva=primeFactors(a)
divb=primeFactors(b)
divc=c(unique(diva),unique(divb))
while (sum(duplicated(divc))>0){
n=divc[duplicated(divc)]
for (i in n){a=div(a,i);b=div(b,i)}
diva=primeFactors(a)
divb=primeFactors(b)
divc=c(unique(diva),unique(divb))
}
return(list(a=a,b=b))
}


presumably superfluous for simplifying fractions

horus=function(a,b,teth=NULL){
#simplification
anubis=osiris(a,b)
a=anubis$a;b=anubis$b
#decomposition by removing 1/b
isis=NULL
if (!(b %in% teth)){
a=a-1
isis=c(isis,b)
teth=c(teth,b)}
if (a>0){
#simplification
anubis=osiris(a,b)
bet=b;a=anubis$a;b=anubis$b
if (bet>b){ isis=c(isis,horus(a,b,teth))}else{
# find largest integer
k=ceiling(b/a)
while (k %in% teth) k=k+1
a=k*a-b
b=k*b
isis=c(isis,k,horus(a,b,teth=c(teth,k)))
}}
return(isis)}


which produces a Fibonacci solution (with the additional inclusion of the original denominator) and

nut=20
seth=function(a,b,isis=NULL){
#simplification
anubis=osiris(a,b)
a=anubis$a;b=anubis$b
if ((a==1)&(!(b %in% isis))){isis=c(isis,b)}else{
ra=hapy=ceiling(b/a)
if (max(a,b)<1e5) hapy=horus(a,b,teth=isis)
k=unique(c(hapy,ceiling(ra/runif(nut,min=.1,max=1))))
propa=propb=propc=propd=rep(NaN,le=length((k %in% isis)))
bastet=1
for (i in k[!(k %in% isis)]){
propa[bastet]=i*a-b
propb[bastet]=i*b
propc[bastet]=i
propd[bastet]=length(horus(i*a-b,i*b,teth=c(isis,i)))
bastet=bastet+1
}
k=propc[order(propd)[1]]
isis=seth(k*a-b,k*b,isis=c(isis,k))
}
return(isis)}


which compares solutions against their lengths. When calling those functions for the three fractions above the solutions are

> seth(2,5)
[1] 15 3
> seth(5,12)
[1] 12  3
> seth(50,77)
[1]   2 154   7


with no pretension whatsoever to return anything optimal (and with some like crashes when the magnitude of the entries grows, try for instance 5/121). For this latest counter-example, the alternative horus works quite superbly:

> horus(5,121)
[1] 121 31 3751 1876 7036876


## inflation, evidence and falsifiability

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on July 27, 2015 by xi'an

[Ewan Cameron pointed this paper to me and blogged about his impressions a few weeks ago. And then Peter Coles wrote a (properly) critical blog entry yesterday. Here are my quick impressions, as an add-on.]

“As the cosmological data continues to improve with its inevitable twists, it has become evident that whatever the observations turn out to be they will be lauded as \proof of inflation”.” G. Gubitosi et al.

In an arXive with the above title, Gubitosi et al. embark upon a generic and critical [and astrostatistical] evaluation of Bayesian evidence and the Bayesian paradigm. Perfect topic and material for another blog post!

“Part of the problem stems from the widespread use of the concept of Bayesian evidence and the Bayes factor (…) The limitations of the existing formalism emerge, however, as soon as we insist on falsifiability as a pre-requisite for a scientific theory (….) the concept is more suited to playing the lottery than to enforcing falsifiability: winning is more important than being predictive.” G. Gubitosi et al.

It is somehow quite hard not to quote most of the paper, because prose such as the above abounds. Now, compared with standards, the authors introduce an higher level than models, called paradigms, as collections of models. (I wonder what is the next level, monads? universes? paradises?) Each paradigm is associated with a marginal likelihood, obtained by integrating over models and model parameters. Which is also the evidence of or for the paradigm. And then, assuming a prior on the paradigms, one can compute the posterior over the paradigms… What is the novelty, then, that “forces” falsifiability upon Bayesian testing (or the reverse)?!

“However, science is not about playing the lottery and winning, but falsifiability instead, that is, about winning given that you have bore the full brunt of potential loss, by taking full chances of not winning a priori. This is not well incorporated into the Bayesian evidence because the framework is designed for other ends, those of model selection rather than paradigm evaluation.” G. Gubitosi et al.

The paper starts by a criticism of the Bayes factor in the point null test of a Gaussian mean, as overly penalising the null against the alternative being only a power law. Not much new there, it is well known that the Bayes factor does not converge at the same speed under the null and under the alternative… The first proposal of those authors is to consider the distribution of the marginal likelihood of the null model under the [or a] prior predictive encompassing both hypotheses or only the alternative [there is a lack of precision at this stage of the paper], in order to calibrate the observed value against the expected. What is the connection with falsifiability? The notion that, under the prior predictive, most of the mass is on very low values of the evidence, leading to concluding against the null. If replacing the null with the alternative marginal likelihood, its mass then becomes concentrated on the largest values of the evidence, which is translated as an unfalsifiable theory. In simpler terms, it means you can never prove a mean θ is different from zero. Not a tremendously item of news, all things considered…

“…we can measure the predictivity of a model (or paradigm) by examining the distribution of the Bayesian evidence assuming uniformly distributed data.”

The alternative is to define a tail probability for the evidence, i.e. the probability to be below an arbitrarily set bound. What remains unclear to me in this notion is the definition of a prior on the data, as it seems to be model dependent, hence prohibits comparison between models since this would involve incompatible priors. The paper goes further into that direction by penalising models according to their predictability, P, as exp{-(1-P²)/P²}. And paradigms as well.

“(…) theoretical matters may end up being far more relevant than any probabilistic issues, of whatever nature. The fact that inflation is not an unavoidable part of any quantum gravity framework may prove to be its greatest undoing.”

Establishing a principled way to weight models would certainly be a major step in the validation of posterior probabilities as a quantitative tool for Bayesian inference, as hinted at in my 1993 paper on the Lindley-Jeffreys paradox, but I do not see such a principle emerging from the paper. Not only because of the arbitrariness in constructing both the predictivity and the associated prior weight, but also because of the impossibility to define a joint predictive, that is a predictive across models, without including the weights of those models. This makes the prior probabilities appearing on “both sides” of the defining equation… (And I will not mention the issues of constructing a prior distribution of a Bayes factor that are related to Aitkin‘s integrated likelihood. And won’t obviously try to enter the cosmological debate about inflation.)

## Mýrin aka Jar City [book review]

Posted in Books, Mountains, pictures, Travel with tags , , , , , , , , , , on July 26, 2015 by xi'an

Mýrin (“The Bog”) is the third novel in the Inspector Erlendur series written by Arnaldur Indridason. It contains the major themes of the series, from the fascination for unexplained disappearances in Iceland to Elendur’s inability to deal with his family responsibilities, to domestic violence, to exhumations. The death that starts the novel takes place in the district of Norðurmýri, “the northern marsh”, not far from the iconic Hallgrimskirkja, and not far either from DeCODE, the genetic company I visited last June and which stores genetic information about close to a million Icelanders, the Íslendingabók. And which plays an important and nefarious role in the current novel. While this episode takes place mostly between Reykjavik and Keflavik, hence does not offer any foray into Icelandic landscapes, it reflects quite vividly on the cultural pressure still present in the recent years to keep rapes and sexual violence a private matter, hidden from an indifferent or worse police force. It also shows how the police misses (in 2001) the important genetic clues for being yet unaware of the immense and frightening possibilities of handling the genetic code of an entire population. (The English and French titles refer to the unauthorised private collections of body part accumulated [in jars] by doctors after autopsies, families being unaware of the fact.) As usual, solving the case is the least important part of the story, which tells about broken lifes and survivors against all odds.

## Le Monde puzzle [#920]

Posted in Books, Kids, R, Statistics, University life with tags , on July 23, 2015 by xi'an

A puzzling Le Monde mathematical puzzle (or blame the heat wave):

A pocket calculator with ten keys (0,1,…,9) starts with a random digit n between 0 and 9. A number on the screen can then be modified into another number by two rules:
1. pressing k changes the k-th digit v whenever it exists into (v+1)(v+2) where addition is modulo 10;
2. pressing 0k deletes the (k-1)th and (k+1)th digits if they both exist and are identical (otherwise nothing happens.
Which 9-digit numbers can always be produced whatever the initial digit?

I did not find an easy entry to this puzzle, in particular because it did not state what to do once 9 digits had been reached: would the extra digits disappear? But then, those to the left or to the right? The description also fails to explain how to handle n=000 000 004 versus n=4.

Instead, I tried to look at the numbers with less than 7 digits that could appear, using some extra rules of my own like preventing numbers with more than 9 digits. Rules which resulted in a sure stopping rule when applying both rules above at random:

leplein=rep(0,1e6)
for (v in 1:1e6){
x=as.vector(sample(1:9,1))
for (t in 1:1e5){
k=length(x) #as sequence of digits
if (k<3){

i=sample(rep(1:k,2),1)
x[i]=(x[i]+1)%%10
y=c(x[1:i],(x[i]+1)%%10)
if (i<k){ x=c(y,x[(i+1):k])}else{ x=y}
}else{

prop1=prop2=NULL
difs=(2:(k-1))[abs(x[-(1:2)]-x[-((k-1):k)])==0]
if (length(difs)>0) prop1=sample(rep(difs,2),1)
if (k<9) prop2=sample(rep(1:k,2),1)

if (length(c(prop1,prop2))>1){
if (runif(1)<.5){

x[prop2]=(x[prop2]+1)%%10
y=c(x[1:prop2],(x[prop2]+1)%%10)
if (prop2<k){ x=c(y,x[(prop2+1):k])}else{ x=y}
}else{
x=x[-c(prop1-1,prop1+1)]}
while ((length(x)>1)&(x[1]==0)) x=x[-1]}

if (length(c(prop1,prop2))==1){
if (is.null(prop2)){ x=x[-c(prop1-1,prop1+1)]
}else{
x[prop2]=(x[prop2]+1)%%10
y=c(x[1:prop2],(x[prop2]+1)%%10)
if (prop2<k){ x=c(y,x[(prop2+1):k])
}else{ x=y}
x=c(x[1:(prop2-1)],
(x[prop2]+1)%%10,
(x[prop2]+2)%%10,x[(prop2+1):k])}
while ((length(x)>1)&(x[1]==0)) x=x[-1]}

if (length(c(prop1,prop2))==0) break()
}

k=length(x)
if (k<7) leplein[sum(x*10^((k-1):0))]=
leplein[sum(x*10^((k-1):0))]+1
}}


code that fills an occupancy table for the numbers less than a million over 10⁶ iterations. The solution as shown below (with the number of zero entries over each column) is rather surprising in that it shows an occupancy that is quite regular over a grid. While it does not answer the original question…