Archive for the Books Category

Alex Honnold free solos Freeride (5.13a/7c+)

Posted in Books, Kids, Mountains, pictures, Travel with tags , , , , , on June 11, 2017 by xi'an

The Seven Pillars of Statistical Wisdom [book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on June 10, 2017 by xi'an

I remember quite well attending the ASA Presidential address of Stephen Stigler at JSM 2014, Boston, on the seven pillars of statistical wisdom. In connection with T.E. Lawrence’s 1926 book. Itself in connection with Proverbs IX:1. Unfortunately wrongly translated as seven pillars rather than seven sages.

As pointed out in the Acknowledgements section, the book came prior to the address by several years. I found it immensely enjoyable, first for putting the field in a (historical and) coherent perspective through those seven pillars, second for exposing new facts and curios about the history of statistics, third because of a literary style one would wish to see more often in scholarly texts and of a most pleasant design (and the list of reasons could go on for quite a while, one being the several references to Jorge Luis Borges!). But the main reason is to highlight the unified nature of Statistics and the reasons why it does not constitute a subfield of either Mathematics or Computer Science. In these days where centrifugal forces threaten to split the field into seven or more disciplines, the message is welcome and urgent.

Here are Stephen’s pillars (some comments being already there in the post I wrote after the address):

  1. aggregation, which leads to gain information by throwing away information, aka the sufficiency principle. One (of several) remarkable story in this section is the attempt by Francis Galton, never lacking in imagination, to visualise the average man or woman by superimposing the pictures of several people of a given group. In 1870!
  2. information accumulating at the √n rate, aka precision of statistical estimates, aka CLT confidence [quoting  de Moivre at the core of this discovery]. Another nice story is Newton’s wardenship of the English Mint, with musing about [his] potential exploiting this concentration to cheat the Mint and remain undetected!
  3. likelihood as the right calibration of the amount of information brought by a dataset [including Bayes’ essay as an answer to Hume and Laplace’s tests] and by Fisher in possible the most impressive single-handed advance in our field;
  4. intercomparison [i.e. scaling procedures from variability within the data, sample variation], from Student’s [a.k.a., Gosset‘s] t-test, better understood and advertised by Fisher than by the author, and eventually leading to the bootstrap;
  5. regression [linked with Darwin’s evolution of species, albeit paradoxically, as Darwin claimed to have faith in nothing but the irrelevant Rule of Three, a challenging consequence of this theory being an unobserved increase in trait variability across generations] exposed by Darwin’s cousin Galton [with a detailed and exhilarating entry on the quincunx!] as conditional expectation, hence as a true Bayesian tool, the Bayesian approach being more specifically addressed in (on?) this pillar;
  6. design of experiments [re-enters Fisher, with his revolutionary vision of changing all factors in Latin square designs], with an fascinating insert on the 18th Century French Loterie,  which by 1811, i.e., during the Napoleonic wars, provided 4% of the national budget!;
  7. residuals which again relate to Darwin, Laplace, but also Yule’s first multiple regression (in 1899), Fisher’s introduction of parametric models, and Pearson’s χ² test. Plus Nightingale’s diagrams that never cease to impress me.

The conclusion of the book revisits the seven pillars to ascertain the nature and potential need for an eight pillar.  It is somewhat pessimistic, at least my reading of it was, as it cannot (and presumably does not want to) produce any direction about this new pillar and hence about the capacity of the field of statistics to handle in-coming challenges and competition. With some amount of exaggeration (!) I do hope the analogy of the seven pillars that raises in me the image of the beautiful ruins of a Greek temple atop a Sicilian hill, in the setting sun, with little known about its original purpose, remains a mere analogy and does not extend to predict the future of the field! By its very nature, this wonderful book is about foundations of Statistics and therefore much more set in the past and on past advances than on the present, but those foundations need to move, grow, and be nurtured if the field is not to become a field of ruins, a methodology of the past!

Le Monde puzzle [#1011]

Posted in Books, Kids with tags , , , on June 9, 2017 by xi'an

An combinatoric Le Monde mathematical puzzle (with two independent parts):

Given the following grid,

  1. What is the longest path from A to B that does not use the same edge twice?
  2.  What is the probability that two minimal length paths from A to B [of length 13] share the same middle [7th] edge?

The first question can be solved by brute force simulation. I ran a very simple minded self-avoiding random walk starting from A and restarting each time a dead-end was reached. (The details are not of capital interest: I entered the above grid as an 8×7 matrix for the nodes and associated with each node a four bit number indicating which edge had been visited. Picking at random among those not yet visited.) The longest path I found along 10⁷ simulations is 51 edges long, confirmed by an additional exploration of the paths on both square grids, separately. The associated path is as follows, the irregular shape being obtained by jittering the node locations towards a better visualisation of the order of the visits.

The second puzzle can be solved directly by looking at the number of paths sharing the seventh edge, which is ¼ (as checked by a further simulation of minimal length random walks).

fast ε-free ABC

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on June 8, 2017 by xi'an

Last Fall, George Papamakarios and Iain Murray from Edinburgh arXived an ABC paper on fast ε-free inference on simulation models with Bayesian conditional density estimation, paper that I missed. The idea there is to approximate the posterior density by maximising the likelihood associated with a parameterised family of distributions on θ, conditional on the associated x. The data being then the ABC reference table. The family chosen there is a mixture of K Gaussian components, which parameters are then estimated by a (Bayesian) neural network using x as input and θ as output. The parameter values are simulated from an adaptive proposal that aims at approximating the posterior better and better. As in population Monte Carlo, actually. Except for the neural network part, which I fail to understand why it makes a significant improvement when compared with EM solutions. The overall difficulty with this approach is that I do not see a way out of the curse of dimensionality: when the dimension of θ increases, the approximation to the posterior distribution of θ does deteriorate, even in the best of cases, as any other non-parametric resolution. It would have been of (further) interest to see a comparison with a most rudimentary approach, namely the one we proposed based on empirical likelihoods.

from least squares to signal processing and particle filtering

Posted in Kids, Statistics, University life, Books with tags , , , , , , , , on June 6, 2017 by xi'an

Nozer Singpurwalla, Nick. Polson, and Refik Soyer have just arXived a remarkable survey on the history of signal processing, from Gauß, Yule, Kolmogorov and Wiener, to Ragazzini, Shanon, Kálmán [who, I was surprised to learn, died in Gainesville last year!], Gibbs sampling, and the particle filters of the 1990’s.

an increase of 18% a day?!

Posted in Books, Statistics with tags , , , , on June 3, 2017 by xi'an

A striking figure I saw earlier this week in a newspaper and confirmed by checking on the World Health Organisation (WHO) today:

…if the association of red meat and colorectal cancer were proven to be causal, data from the same studies suggest that the risk of colorectal cancer could increase by 17% for every 100 gram portion of red meat eaten daily…

The way I interpret this sentence and the every in it when I read it is that each time I eat a portion of 100g of red meat, my probability of getting a cancer increases by 17%. Actually the previous sentence in the report sounds even more dire:

An analysis of data from 10 studies estimated that every 50 gram portion of processed meat eaten daily increases the risk of colorectal cancer by about 18%.

Which means that eating a sausage a day would multiply the probability by about… 10²³! This cannot be: turning statistics into “plain” language  can be so confusing! Or else pardon my French!!!

Le Monde puzzle [#1010]

Posted in Books, Kids with tags , , , , on June 2, 2017 by xi'an

An arithmetic Le Monde mathematical puzzle (or two independent ones, again!):

  1. Take the integers from 1 to 19, pick two of them with identical parity at random and replace the pair with their average. Repeat 17 times to obtain a single integer. What are the values between 1 and 19 that cannot be achieved?
  2.  Take the integers from 1 to 19, pick four of them at random so that the average is an integer and replace the quadruplet with their average. Repeat 5 times to obtain a single integer. What are the values between 1 and 19 that can be achieved?

The first question seems pretty open to brute force simulation. Here is an R code I wrote

numbz=1:M
for (t in 2:M){
 numbz=sample(numbz);count=0
 while((count<100)&(sum(numbz[1:2])%%2>0)){
   numbz=sample(numbz);count=count+1}
if (count==100) break()
 numbz[1]=as.integer(mean(numbz[1:2]))
 numbz=numbz[-2]}

with the stopping rule resulting from the fact that the remaining two digits may sometimes be of opposite parity (a possibility omitted in the wording of the puzzle, along with a mistake in the number of repetitions). However, the outcome of this random exploration misses the extreme possible values. For instance, 10⁶ attempts produce the range

4 5 6 7 8 9 10 11 12 13 14 15 16 17

while the extremes should be 2 and 18 according to this scratch computation:

which appears to have too low a probability of occurring for being part of the 10⁶ instances. Running the code a mere (!) 10⁷ iterations managed to reach 3 as well. (Interestingly, the above sequence uses 2 the most and 19 the least, but weights 19 the most and 2 the least!)

The second puzzle is also open to random exploration with a very similar R code:

utcome=NULL
for (z in 1:1e6){
numbz=1:19
for (t in 1:6){
  numbz=sample(numbz);count=0
  while ((sum(numbz[1:4])%%4>0)&(count<100)){
    numbz=sample(numbz);count=count+1}
  if (count==100) break()
  numbz[1]=as.integer(mean(numbz[1:4]))
  numbz=numbz[-(2:4)]}
if (count<100) utcome=c(utcome,numbz)}

returning the values

4 7 10 13 16