Archive for list

the best books of the NYT readers

Posted in Books, Kids, Travel with tags , , , , , , , , , , , , , , , , , , , , , , on February 9, 2022 by xi'an

Two years after Le Monde reported on the list of the 101 favourite novels of [some of] its readers, which I found most fascinating as a sociological entry on said readers, rather than a meaningful ordering of literary monuments (!),  even though it led me to read Damasio’s La Horde du Contrevent, as well as Jean-Philippe Jaworski’s Gagner la Guerre [To the victors go the spoils], The New York Times did something similar to celebrate the Book Review’s 125th anniversary. If on a lesser scale, as it only produces

        1. To Kill a Mockingbird by Harper Lee
        2. The Fellowship of the Ring by J.R.R. Tolkien
        3. 1984 by George Orwell
        4. One Hundred Years of Solitude by Gabriel García Márquez
        5. Beloved by Toni Morrison

as the top five books of the last 125th years, Lee’s, Tolkien’s, and Garcia Márquez’s appearing in both lists, if with a different ranking. (The nomination rules were not exactly the same, though, with only novels for Le Monde and only “recent” books and only one per author for the New York Times.) Here is a longer list of the 25 top contenders, from which NYT readers voted [an opportunity I missed!]:

some of which I had never heard of. And not including a single Faulkner’s… Except for One Hundred Years of Solitude, first published as Cien años de soledad, all novels there were originally written in English. Sadly, the number one book, To Kill a Mockingbird, is also one of the most censored by school boards in the USA! (And so are books by Toni Morrison.)

the most important statistical ideas of the past 50 years

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , , , , , , on January 10, 2020 by xi'an

A grand building entrance near the train station in HelsinkiAki and Andrew are celebrating the New Year in advance by composing a list of the most important statistics ideas occurring (roughly) since they were born (or since Fisher died)! Like

  • substitution of computing for mathematical analysis (incl. bootstrap)
  • fitting a model with a large number of parameters, using some regularization procedure to get stable estimates and good predictions (e.g., Gaussian processes, neural networks, generative adversarial networks, variational autoencoders)
  • multilevel or hierarchical modelling (incl. Bayesian inference)
  • advances in statistical algorithms for efficient computing (with a long list of innovations since 1970, including ABC!), pointing out that a large fraction was of the  divide & conquer flavour (in connection with large—if not necessarily Big—data)
  • statistical decision analysis (e.g., Bayesian optimization and reinforcement learning, getting beyond classical experimental design )
  • robustness (under partial specification, misspecification or in the M-open world)
  • EDA à la Tukey and statistical graphics (and R!)
  • causal inference (via counterfactuals)

Now, had I been painfully arm-bent into coming up with such a list, it would have certainly been shorter, for lack of opinion about some of these directions (even the Biometrika deputeditoship has certainly helped in reassessing the popularity of different branches!), and I would have have presumably been biased towards Bayes as well as more mathematical flavours. Hence objecting to the witty comment that “theoretical statistics is the theory of applied statistics”(p.10) and including Ghosal and van der Vaart (2017) as a major reference. Also bemoaning the lack of long-term structure and theoretical support of a branch of the machine-learning literature.

Maybe also more space and analysis could have been spent on “debates remain regarding appropriate use and interpretation of statistical methods” (p.11) in that a major difficulty with the latest in data science is not so much the method(s) as the data on which they are based, which in a large fraction of the cases, is not representative and is poorly if at all corrected for this bias. The “replication crisis” is thus only one (tiny) aspect of the challenge.

Le Monde puzzle [#738]

Posted in R with tags , , , , , on September 2, 2011 by xi'an

The Friday puzzle in Le Monde this week is about “friendly perfect squares”, namely perfect squares x2>10 and y2>10 with the same number of digits and such that, when drifting all digits of x2 by the same value a (modulo 10), one recovers y2. For instance, 121 is “friend” with 676. Here is my R code:

xtrct=function(x){
  x=as.integer(x)
  digs=NULL
  for (i in 0:trunc(log(x,10))){
    digs[i+1]=trunc((x-sum(digs[1:i]*10^(trunc(log(x,10)):(trunc(log(x,10))-
    i+1))))/10^(trunc(log(x,10))-i))}
  return(digs)
  }

pdfct=(4:999)^2
for (t in 1:5){
  pfctsq=pdfct[(pdfct>=10^t)&(pdfct<10^(t+1))]
  rstrct=apply(as.matrix(pfctsq),1,xtrct)

  for (i in 1:(dim(rstrct)[2]-2)){

   dive=apply(matrix(rstrct[,(i+1):dim(rstrct)[2]]-
   rstrct[,i],nrow=t+1),2,unique)
    if (is.matrix(dive))
       dive=lapply(seq_len(ncol(dive)), function(i) dive[,i])
    dive=as.integer(lapply(dive,length))
    if (sum(dive==1)>0)
       print(c(pfctsq[i],pfctsq[
       ((i+1):dim(rstrct)[2])[(dive==1)]]))
    }
  }

which returns

[1] 121 676
[1] 1156 4489
[1] 2025 3136
[1] 13225 24336
[1] 111556 444889

namely the pairs (121,676), (1156,4489), (2025,3136), (13225,24336), and (111556,444889) as the solutions. The strange line of R code

    if (is.matrix(dive))
       dive=lapply(seq_len(ncol(dive)), function(i) dive[,i])

is due to the fact that, when the above result is a matrix, turning it into a list means each entry of the matrix is an entry of the list. After trying to solve the problem on my own for a long while (!), I found the above trick on stackoverflow. (As usual, the puzzle is used as an exercise in [basic] R programming. There always exists a neat mathematical solution!)