Archive for the Kids Category

Le Monde puzzle [#1112]

Posted in Books, Kids, R with tags , , , on October 3, 2019 by xi'an

Another low-key arithmetic problem as Le Monde current mathematical puzzle:

Find the 16 integers x¹,x²,x³,x⁴,y¹,y²,y³,y⁴,z¹,z²,z³,z⁴,w¹,w²,w³,w⁴ such that the groups x¹,y¹,z¹,w¹, &tc., are made of distinct positive integers, the sum of the x’s is 24, of the y’s 20, of the z’s 19 and of the w’s 17. Furthermore, x¹<max(y¹,z¹,w¹), x²<max(y²,z²,w²), x³<max(y³,z³,w³)=z³, and x⁴<max(y⁴,z⁴,w⁴), while x¹<x².

There are thus 4 x 3 unknowns, all bounded by either 20-3=17 or 19-3=16 for x³,y³,z³. It is then a case for brute force resolution since drawing all quadruplets by rmultinom until all conditions are satisfied

valid=function(x,y,z,w){
 (z[3]>max(y[3],w[3]))&(x[1]<x[2])&(sum(x)==24)&
 (sum(y)==20)&(sum(z)==19)&(sum(w)==17)}

returns quickly several solutions. Meaning I misread the question and missed the constraint that the four values at each step were the same up to a permutation, decreasing the number of unknowns to four, a,b,c,d (ordered). And then three because the sum of the four is 20, average of the four sums. It seems to me that the first sum of x’s being 24 and involving only a,b, and c implies that 4c is larger than 24, ie c>6, hence d>7, while a>0 and b>1, leaving only two degrees of freedom for choosing the four values, meaning that only

  1 2 7 10
1 2 8 9
1 3 7 9
1 4 7 8
2 3 7 8

are possible. Sampling at random across these possible choices and allocating the numbers at random to x,y,z, and w leads rather quickly to the solution

     [,1] [,2] [,3] [,4]
[1,]    3    7    7    7
[2,]    2    8    2    8
[3,]    7    2    8    2
[4,]    8    3    3    3

if you are looking for me today…

Posted in Kids, pictures, Travel, University life with tags , , , , on October 1, 2019 by xi'an

EU without education is like a bird without wings [apologies to ostriches]

Posted in Kids, University life with tags , , , , on September 21, 2019 by xi'an

[Reposting a call for keeping education and research in the EU commission denomination:]

The candidates for the new EU commissioners were presented last week. In the new commission the areas of education and research are not explicitly represented anymore and instead are subsumed under the “innovation and youth” title. This emphasizes economic exploitability (i.e. “innovation”) over its foundation, which is education and research, and it reduces “education” to “youth” while being essential to all ages.

We, as members of the scientific community of Europe, wish to address this situation early on and emphasize both to the general public, as well as to relevant politicians on the national and European Union level, that without dedication to education and research there will neither exist a sound basis for innovation in Europe, nor can we fulfill the promise of a high standard of living for the citizens of Europe in a fierce global competition.

President von der Leyen, in her mission letter to commissioner Gabriel, has emphasized that “education, research and innovation will be key to our competitiveness”.

With this open letter we demand that the EU commission revises the title for commissioner Gabriel to “Education, Research, Innovation and Youth” reflecting Europe’s dedication to all of these crucial areas. We also call upon the European Parliament to request this change in name before confirming the nominees for commissioner.

Please support this letter by signing up as a supporter via https://indico.uis.no/event/5/registrations/

Le Monde puzzle [#1110]

Posted in Books, Kids, R, Travel with tags , , , , , , , , , on September 16, 2019 by xi'an

A low-key sorting problem as Le Monde current mathematical puzzle:

If the numbers from 1 to 67 are randomly permuted and if the sorting algorithm consists in picking a number i with a position higher than its rank i and moving it at the correct i-th position, what is the maximal number of steps to sort this set of integers when the selected integer is chosen optimaly?

As the question does not clearly state what happens to the number j that stood in the i-th position, I made the assumption that the entire sequence from position i to position n is moved one position upwards (rather than having i and j exchanged). In which case my intuition was that moving the smallest moveable number was optimal, resulting in the following R code

sorite<-function(permu){ n=length(permu) p=0 while(max(abs(permu-(1:n)))>0){
    j=min(permu[permu<(1:n)])
    p=p+1
    permu=unique(c(permu[(1:n)<j],j,permu[j:n]))} 
  return(p)}

which takes at most n-1 steps to reorder the sequence. I however checked this insertion sort was indeed the case through a recursive function

resorite<-function(permu){ n=length(permu);p=0 while(max(abs(permu-(1:n)))>0){
    j=cand=permu[permu<(1:n)]
    if (length(cand)==1){
      p=p+1
      permu=unique(c(permu[(1:n)<j],j,permu[j:n]))
    }else{
      sol=n^3
      for (i in cand){
        qermu=unique(c(permu[(1:n)<i],i,permu[i:n]))
        rol=resorite(qermu)
        if (rol<sol)sol=rol}
      p=p+1+sol;break()}} 
  return(p)}

which did confirm my intuition.

9 pitfalls of data science [book review]

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 11, 2019 by xi'an

I received The 9 pitfalls of data science by Gary Smith [who has written a significant number of general public books on personal investment, statistics and AIs] and Jay Cordes from OUP for review a few weeks ago and read it on my trip to Salzburg. This short book contains a lot of anecdotes and what I would qualify of small talk on job experiences and colleagues’ idiosyncrasies…. More fundamentally, it reads as a sequence of examples of bad or misused statistics, as many general public books on statistics do, but with little to say on how to spot such misuses of statistics. Its title (It seems like the 9 pitfalls of… is a rather common début for a book title!) however started a (short) conversation with my neighbour on the train to Salzburg as she wanted to know if the job opportunities in data sciences were better in Germany than in Austria. A practically important question for which I had no clue. And I do not think the book would have helped either! (My neighbour in the earlier plane to München had a book on growing lotus, which was not particularly enticing for launching a conversation either.)

Chapter I “Using bad data” is made of examples of truncated or cherry picked data often associated with poor graphics. Only one dimensional outcome and also very US centric. Chapter II “Data before theory” highlights spurious correlations and post hoc predictions, criticism of data mining, some examples being quite standard. Chapter III “Worshiping maths” sounds like the perfect opposite of the previous cahpter: it discusses the fact that all models are wrong but some may be more wrong than others. And gives examples of over fitting, p-value hacking, regression applied to longitudinal data. With the message that (maths) assumptions are handy and helpful but not always realistic. Chapter IV “Worshiping computers” is about the new golden calf and contains rather standard stuff on trusting the computer output because it is a machine. However, the book is somewhat falling foul of the same mistake by trusting a Monte Carlo simulation of a shortfall probability for retirees since Monte Carlo also depends on a model! Computer simulations may be fine for Bingo night or poker tournaments but much more uncertain for complex decisions like retirement investments. It is also missing the biasing aspects in constructing recidivism prediction models pointed out in Weapons of math destruction. Until Chapter 9 at least. The chapter is also mentioning adversarial attacks if not GANs (!). Chapter V “Torturing data” mentions famous cheaters like Wansink of the bottomless bowl and pizza papers and contains more about p-hacking and reproducibility. Chapter VI “Fooling yourself” is a rather weak chapter in my opinion. Apart from Ioannidis take on Theranos’ lack of scientific backing, it spends quite a lot of space on stories about poker gains in the unregulated era of online poker, with boasts of significant gains that are possibly earned from compulsive gamblers playing their family savings, which is not particularly praiseworthy. And about Brazilian jiu-jitsu. Chapter VII “Correlation vs causation” predictably mentions Judea Pearl (whose book of why I just could not finish after reading one rant too many about statisticians being unable to get causality right! Especially after discussing the book with Andrew.). But not so much to gather from the chapter, which could have instead delved into deep learning and its ways to avoid overfitting. The first example of this chapter is more about confusing conditionals (what is conditional on what?) than turning causation around. Chapter VII “Regression to the mean” sees Galton’s quincunx reappearing here after Pearl’s book where I learned (and checked with Steve Stiegler) that the device was indeed intended for that purpose of illustrating regression to the mean. While the attractive fallacy is worth pointing out there are much worse abuses of regression that could be presented. CHANCE’s Howard Wainer also makes an appearance along SAT scores. Chapter IX “Doing harm” does engage into the issue that predicting social features like recidivism by a (black box) software is highly worrying (and just plain wrong) if only because of this black box nature. Moving predictably to chess and go with the right comment that this does not say much about real data problems. A word of warning about DNA testing containing very little about ancestry, if only because of the company limited and biased database. With further calls for data privacy and a rather useless entry on North Korea. Chapter X “The Great Recession“, which discusses the subprime scandal (as in Stewart’s book), contains a set of (mostly superfluous) equations from Samuelson’s paper (supposed to scare or impress the reader?!) leading to the rather obvious result that the expected concave utility of a weighted average of iid positive rvs is maximal when all the weights are equal, result that is criticised by laughing at the assumption of iid-ness in the case of mortgages. Along with those who bought exotic derivatives whose construction they could not understand. The (short) chapter keeps going through all the (a posteriori) obvious ingredients for a financial disaster to link them to most of the nine pitfalls. Except the second about data before theory, because there was no data, only theory with no connection with reality. This final chapter is rather enjoyable, if coming after the facts. And containing this altogether unnecessary mathematical entry. [Usual warning: this review or a revised version of it is likely to appear in CHANCE, in my book reviews column.]

Le Monde puzzle [#1109]

Posted in Kids, R with tags , , , , , on September 4, 2019 by xi'an

A digital problem as Le Monde current mathematical puzzle:

Noble numbers are such that they only involve different digits and are multiple of all their digits. What is the largest noble number?

Hmmmm…. Brute force? Since the maximal number of digits is 10, one may as well try:

k=soz=9
for (t in 1:1e3){
sol=1
while (sol<10^(k-1)){ 
 u=sample(0:8,k);i=digit2int(u) 
 if (max(i%%u[u>0])==0) soz=max(soz,sol<-i)}}

which returns 9875643120… (I made the conscious choice to exclude zero from the dividers. Which was not a choice spelled out in the original question.)

an independent sampler that maximizes the acceptance rate of the MH algorithm

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on September 3, 2019 by xi'an

An ICLR 2019 paper by Neklyudov, Egorov and Vetrov on an optimal choice of the proposal in an independent Metropolis algorithm I discovered via an X validated question. Namely whether or not the expected Metropolis-Hastings acceptance ratio is always one (which it is not when the support of the proposal is restricted). The paper mentions the domination of the Accept-Reject algorithm by the associated independent Metropolis-Hastings algorithm, which has actually been stated in our Monte Carlo Statistical Methods (1999, Lemma 6.3.2) and may prove even older. The authors also note that the expected acceptance probability is equal to one minus the total variation distance between the joint defined as target x Metropolis-Hastings proposal distribution and its time-reversed version. Which seems to suffer from the same difficulty as the one mentioned in the X validated question. Namely that it only holds when the support of the Metropolis-Hastings proposal is at least the support of the target (or else when the support of the joint defined as target x Metropolis-Hastings proposal distribution is somewhat symmetric. Replacing total variation with Kullback-Leibler then leads to a manageable optimisation target if the proposal is a parameterised independent distribution. With a GAN version when the proposal is not explicitly available. I find it rather strange that one still seeks independent proposals for running Metropolis-Hastings algorithms as the result will depend on the family of proposals considered and as performances will deteriorate with dimension (the authors mention a 10% acceptance rate, which sounds quite low). [As an aside, ICLR 2020 will take part in Addis Abeba next April.]