**U**eli Steck was a Swiss climber renowned for breaking speed records on the hardest routes of the Alps. Including the legendary Eigerwand. And having been evacuated under death threats from the Everest base camp two years ago. I have been following on Instagram his preparation for another speed attempt at Everest the past weeks and it is a hug shock to learn he fell to his death on Nupse yesterday. Total respect to this immense Extrembergsteiger, who has now joined the sad cenacle of top climbers who did not make it back…

## Archive for the Books Category

## Ueli Steck dies on Nupse [Ueli Steck tödlich verunglückt]

Posted in Books, Mountains, Running with tags Bernese Alps, Eiger, Everest, Himalayas, mountaineering, Nupse, Switzerland, Ueli Steck on April 30, 2017 by xi'an## Bayes, reproducibility and the Quest for Truth

Posted in Books, Statistics, University life with tags Bayesian foundations, frequency properties, frequentist coverage, L'Aquila, Statistical Science, truth on April 27, 2017 by xi'anDon Fraser, Mylène Bédard, and three coauthors have written a paper with the above dramatic title in Statistical Science about the reproducibility of Bayesian inference in the framework of what they call a mathematical prior. Connecting with the earlier quick-and-dirty tag attributed by Don to Bayesian credible intervals.

“We provide simple (…) counter-examples to general claims that Bayes can offer accuracy for statistical inference. To obtain this accuracy with Bayes, more effort is required compared to recent likelihood methods (…) [and] accuracy beyond first order is routinely not available (…) An alternative is to view default Bayes as an exploratory technique and then ask does it do as it overtly claims? Is it reproducible as understood in contemporary science? (…) No one has answers although speculative claims abound.” (p. 1)

The early stages of the paper questions the nature of a prior distribution in terms of objectivity and reproducibility, which strikes me as a return to older debates on the nature of probability. And of a dubious insistence on the reality of a prior when the said reality is customarily and implicitly assumed for the sampling distribution. While we “can certainly ask how [a posterior] quantile relates to the true value of the parameter”, I see no compelling reason why the associated quantile should be endowed with a frequentist coverage meaning, i.e., be more than a normative indication of the deviation from the true value. (Assuming there is such a parameter.) To consider that the credible interval of interest can be “objectively” assessed by simulation experiments evaluating its coverage is thus doomed from the start (since there is not reason for the nominal coverage) and situated on the wrong plane since it stems from the hypothetical frequentist model for a range of parameter values. Instead I find simulations from (generating) models useful in a general ABC sense, namely by producing realisations from the predictive one can assess at which degree of roughness the data is compatible with the formal construct. To bind reproducibility to the frequentist framework thus sounds wrong [to me] as being model-based. In other words, I do not find the definition of reproducibility used in the paper to be objective (literally bouncing back from Gelman and Hennig Read Paper)

At several points in the paper, the legal consequences of using a subjective prior are evoked as legally binding and implicitly as dangerous. With the example of the L’Aquila expert trial. I have trouble seeing the relevance of this entry as an adverse lawyer is as entitled to attack the expert on her or his sampling model. More fundamentally, I feel quite uneasy about bringing this type of argument into the debate!

## marginal likelihoods from MCMC

Posted in Books, pictures, Statistics, University life with tags ABC, arXiv, Bayesian Methods in Cosmology, curse of dimensionality, evidence, INLA, k-nearest neighbour, marginal likelihood, nested sampling, Planck experiment, San Antonio, satellite on April 26, 2017 by xi'an**A** new arXiv entry on ways to approximate marginal likelihoods based on MCMC output, by astronomers (apparently). With an application to the 2015 Planck satellite analysis of cosmic microwave background radiation data, which reminded me of our joint work with the cosmologists of the Paris Institut d’Astrophysique ten years ago. In the literature review, the authors miss several surveys on the approximation of those marginals, including our San Antonio chapter, on Bayes factors approximations, but mention our ABC survey somewhat inappropriately since it is not advocating the use of ABC for such a purpose. (They mention as well variational Bayes approximations, INLA, powered likelihoods, if not nested sampling.)

The proposal of this paper is to identify the marginal *m* [actually denoted *a* there] as the normalising constant of an unnormalised posterior density. And to do so the authors estimate the posterior by a non-parametric approach, namely a k-nearest-neighbour estimate. With the additional twist of producing a sort of Bayesian posterior on the constant *m*. [And the unusual notion of number density, used for the unnormalised posterior.] The Bayesian estimation of m relies on a Poisson sampling assumption on the k-nearest neighbour distribution. (Sort of, since k is actually fixed, not random.)

If the above sounds confusing and imprecise it is because I am myself rather mystified by the whole approach and find it difficult to see the point in this alternative. The Bayesian numerics does not seem to have other purposes than producing a MAP estimate. And using a non-parametric density estimate opens a Pandora box of difficulties, the most obvious one being the curse of dimension(ality). This reminded me of the commented paper of Delyon and Portier where they achieve super-efficient convergence when using a kernel estimator, but with a considerable cost and a similar sensitivity to dimension.

## 16 avril 1917

Posted in Books, Kids, pictures with tags Chemin des Dames, Craonne, first World War, France, mutinies of 1917, WW I on April 16, 2017 by xi'an**T**oday is the Centenary of the battle of Le Chemin des Dames (April 16-25, 2017) during WW I, which ended up as a slaughter (271,000 French casualties and 163,000 Germans casualties) and a complete military disaster. Which led to a significant rise in mutinies (pretty much disconnected from the starting Russian revolution) and to British divisions taking over this district. While there are many other examples of an insane disregard of infantry troops by the war commanders, this place stuck in the French collective memory. I remember as a kid listening to my neighbour telling me about this place as his worst experience during the war. (While never mentioning the mutinies, which remained somewhat shameful for most of the Century.)

## ODOF, not Hodor [statlearn 2017]

Posted in Books, Kids, pictures, Statistics, University life with tags ABC, abcrf, Game of Thrones, George Martin, Kristian Nairn, Lyon, ODOF, random forests, Statlearn 2017 on April 15, 2017 by xi'an## optimultiplication [a riddle]

Posted in Books, Kids, R, Statistics with tags coding, conditional probability, FiveThirtyEight, mathematical puzzle, R, The Riddler on April 14, 2017 by xi'an**T**he riddle of this week is about an optimisation of positioning the four digits of a multiplication of two numbers with two digits each and is open to a coding resolution:

Four digits are drawn without replacement from {0,1,…,9}, one at a time. What is the optimal strategy to position those four digits, two digits per row, as they are drawn, toward minimising the average product?

Although the problem can be solved algebraically by computing **E**[X⁴|x¹,..] and **E**[X⁴X³|x¹,..] I wrote three R codes to “optimise” the location of the first three digits: the first digit ends up as a unit if it is 5 or more and a multiple of ten otherwise, on the first row. For the second draw, it is slightly more variable: with this R code,

second<-function(i,j,N=1e5){draw drew=matrix(0,N,2) for (t in 1:N) drew[t,]=sample((0:9)[-c(i+1,j+1)],2) conmean=(45-i-j)/8 conprod=mean(drew[,1]*drew[,2]) if (i<5){ #10*i pos=c((110*i+11*j)*conmean, 100*i*j+10*(i+j)*conmean+conprod, (100*i+j)*conmean+10*i*j+10*conprod)}else{ pos=c((110*j+11*i)*conmean, 10*i*j+(100*j+i)*conmean+10*conprod, 10*(i+j)*conmean+i*j+100*conprod) return(order(pos)[1])}

the resulting digit again ends up as a unit if it is 5 (except when x¹=7,8,9, where it is 4) or more and a multiple of ten otherwise, but on the second row. Except when x¹=0, x²=1,2,3,4, when they end up on the first row together, 0 obviously in front.

For the third and last open draw, there is only one remaining random draw, which mean that the decision only depends on x¹,x²,x³ and **E**[X⁴|x¹,x²,x³]=(45-x¹-x²-x³)/7. Attaching x³ to x² or x¹ will then vary monotonically in x³, depending on whether x¹>x² or x¹<x²:

fourth=function(i,j,k){ comean=(45-i-j-k)/7 if ((i<1)&(j<5)){ pos=c(10*comean+k,comean+10*k)} if ((i<5)&(j>4)){ pos=c(100*i*comean+k*j,j*comean+100*i*k)} if ((i>0)&(i<5)&(j<5)){ pos=c(i*comean+k*j,j*comean+i*k)} if ((i<7)&(i>4)&(j<5)){ pos=c(i*comean+100*k*j,j*comean+100*i*k)} if ((i<7)&(i>4)&(j>4)){ pos=c(i*comean+k*j,j*comean+i*k)} if ((i>6)&(j<4)){ pos=c(i*comean+100*k*j,j*comean+100*i*k)} if ((i>6)&(j>3)){ pos=c(i*comean+k*j,j*comean+i*k)} return(order(pos)[1])}

Running this R code for all combinations of x¹,x² shows that, except for the cases x¹≥5 and x²=0, for which x³ invariably remains in front of x¹, there are always values of x³ associated with each position.