Archive for USA

my first parkrun [19:56,3/87,78.8%]

Posted in Kids, pictures, Running, Travel with tags , , , , , , , , , on January 19, 2020 by xi'an

This morning, I had my first parkrun race in Gainesville, before heading back to Paris. (Thanks to Florence Forbes who pointed out this initiative to me.) Which reminded me of the race I ran in Helsinki a few years ago. Without the “self-transcendance” topping…! While the route was very urban, it was a fun opportunity to run a race with a few other runners. My time of 19.56 is not my best by far but, excuses, excuses, I was not feeling too well and the temperature was quite high (21⁰) and I finished in the first three runners, just seconds behind two young fellows who looked like they were still in high school.  (I am now holding the record of that race for my age group as well!) Anyway, this is a great way to join races when travelling and not worry about registration, certificates, &tc.

Parkrun also provides an age-grade adjusted ranking (78.8%), which is interesting but statistically puzzling as this is the ratio of one’s time over the fastest time (ever?) in the age x gender category. Given that fastest times are extreme, this depends on one individual and hence has a high variability. Especially in higher (meaning older!) veteran categories. A quantile in the empirical distribution would sound better. I came across this somewhat statistical analysis of the grade,

Panch at the helm!

Posted in pictures, Travel, University life with tags , , , , , , , , , , , , on January 8, 2020 by xi'an

Reading somewhat by chance a Nature article on the new Director of the National Science Foundation (NSF) nominated by Trump (and yet to be confirmed by the Senate), I found that his name Sethuraman Panchanathan was the name of a friend of my wife 30⁺ years ago when they were both graduate students in image processing at the University of Ottawa, Department of Electrical Engineering… And looking further into the matter, I realised that this was indeed the very friend we knew from that time, with whom w shared laughs, dinners, and a few day trips together around Ottawa! While this is not the ultimate surprise, given that science administration is usually run by scientists, taken from a population pool that is not that large, as exemplified by earlier cases at the national or European level where I had some acquaintance with a then senior officer, it is nonetheless striking (and fun) to hear of a friend moving to a high visibility position after such a long gap. (When comparing NSF and ERC, the European Research Council, with French mathematician Jean-Pierre Bourguignon as current director also appearing in a recent Nature article, I was surprised to see that the ERC budget was more than twice the NSF budget.) Well, good luck to him for sailing these highly political waters!

off to BayesComp 20, Gainesville

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on January 7, 2020 by xi'an

BayesComp 2020 at a glance

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on December 18, 2019 by xi'an

wildlife photography of the year

Posted in Statistics with tags , , , , , , , on October 22, 2019 by xi'an

NeurIPS without visa

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 22, 2019 by xi'an


I came by chance upon this 2018 entry in Synced that NeurIPS now takes place in Canada between Montréal and Vancouver primarily because visas to Canada are easier to get than visas to the USA, even though some researchers still get difficulties in securing theirs. Especially researchers from some African countries, which is exposed  in the article as one of the reasons the next ICLR takes place in Addis Ababa. Which I wish I could attend! In the meanwhile, I will be taking part in an ABC workshop in Vancouver, December 08, prior to NeurIPS 2019, before visiting the Department of Statistics at UBC the day after. (My previous visit there was in 1990, I believe!) Incidentally but interestingly, the lottery entries for NeurIPS 2019 are open till September 25, to the public (those not contributing to the conference or any of its affiliated groups). This is certainly better than having bots buying all entries within 12 minutes of the opening time!

More globally, this entry makes me wonder how learned societies could invest in ensuring locations for their (international) meetings allow for a maximum inclusion in terms of these visa difficulties, but also ensuring freedom and safety for all members. Which may prove a de facto impossibility. For instance, Ethiopia has a rather poor record in terms of human rights and, in particular, homosexuality is criminalised there. An alternative would be to hold the conferences in parallel locations chosen to multiply the chances for this inclusion, but this could prove counter-productive [for inclusion] by creating groups that would never ever meet. An insolvable conundrum?

9 pitfalls of data science [book review]

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 11, 2019 by xi'an

I received The 9 pitfalls of data science by Gary Smith [who has written a significant number of general public books on personal investment, statistics and AIs] and Jay Cordes from OUP for review a few weeks ago and read it on my trip to Salzburg. This short book contains a lot of anecdotes and what I would qualify of small talk on job experiences and colleagues’ idiosyncrasies…. More fundamentally, it reads as a sequence of examples of bad or misused statistics, as many general public books on statistics do, but with little to say on how to spot such misuses of statistics. Its title (It seems like the 9 pitfalls of… is a rather common début for a book title!) however started a (short) conversation with my neighbour on the train to Salzburg as she wanted to know if the job opportunities in data sciences were better in Germany than in Austria. A practically important question for which I had no clue. And I do not think the book would have helped either! (My neighbour in the earlier plane to München had a book on growing lotus, which was not particularly enticing for launching a conversation either.)

Chapter I “Using bad data” is made of examples of truncated or cherry picked data often associated with poor graphics. Only one dimensional outcome and also very US centric. Chapter II “Data before theory” highlights spurious correlations and post hoc predictions, criticism of data mining, some examples being quite standard. Chapter III “Worshiping maths” sounds like the perfect opposite of the previous cahpter: it discusses the fact that all models are wrong but some may be more wrong than others. And gives examples of over fitting, p-value hacking, regression applied to longitudinal data. With the message that (maths) assumptions are handy and helpful but not always realistic. Chapter IV “Worshiping computers” is about the new golden calf and contains rather standard stuff on trusting the computer output because it is a machine. However, the book is somewhat falling foul of the same mistake by trusting a Monte Carlo simulation of a shortfall probability for retirees since Monte Carlo also depends on a model! Computer simulations may be fine for Bingo night or poker tournaments but much more uncertain for complex decisions like retirement investments. It is also missing the biasing aspects in constructing recidivism prediction models pointed out in Weapons of math destruction. Until Chapter 9 at least. The chapter is also mentioning adversarial attacks if not GANs (!). Chapter V “Torturing data” mentions famous cheaters like Wansink of the bottomless bowl and pizza papers and contains more about p-hacking and reproducibility. Chapter VI “Fooling yourself” is a rather weak chapter in my opinion. Apart from Ioannidis take on Theranos’ lack of scientific backing, it spends quite a lot of space on stories about poker gains in the unregulated era of online poker, with boasts of significant gains that are possibly earned from compulsive gamblers playing their family savings, which is not particularly praiseworthy. And about Brazilian jiu-jitsu. Chapter VII “Correlation vs causation” predictably mentions Judea Pearl (whose book of why I just could not finish after reading one rant too many about statisticians being unable to get causality right! Especially after discussing the book with Andrew.). But not so much to gather from the chapter, which could have instead delved into deep learning and its ways to avoid overfitting. The first example of this chapter is more about confusing conditionals (what is conditional on what?) than turning causation around. Chapter VII “Regression to the mean” sees Galton’s quincunx reappearing here after Pearl’s book where I learned (and checked with Steve Stiegler) that the device was indeed intended for that purpose of illustrating regression to the mean. While the attractive fallacy is worth pointing out there are much worse abuses of regression that could be presented. CHANCE’s Howard Wainer also makes an appearance along SAT scores. Chapter IX “Doing harm” does engage into the issue that predicting social features like recidivism by a (black box) software is highly worrying (and just plain wrong) if only because of this black box nature. Moving predictably to chess and go with the right comment that this does not say much about real data problems. A word of warning about DNA testing containing very little about ancestry, if only because of the company limited and biased database. With further calls for data privacy and a rather useless entry on North Korea. Chapter X “The Great Recession“, which discusses the subprime scandal (as in Stewart’s book), contains a set of (mostly superfluous) equations from Samuelson’s paper (supposed to scare or impress the reader?!) leading to the rather obvious result that the expected concave utility of a weighted average of iid positive rvs is maximal when all the weights are equal, result that is criticised by laughing at the assumption of iid-ness in the case of mortgages. Along with those who bought exotic derivatives whose construction they could not understand. The (short) chapter keeps going through all the (a posteriori) obvious ingredients for a financial disaster to link them to most of the nine pitfalls. Except the second about data before theory, because there was no data, only theory with no connection with reality. This final chapter is rather enjoyable, if coming after the facts. And containing this altogether unnecessary mathematical entry. [Usual warning: this review or a revised version of it is likely to appear in CHANCE, in my book reviews column.]