Archive for USA

9 pitfalls of data science [book review]

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 11, 2019 by xi'an

I received The 9 pitfalls of data science by Gary Smith [who has written a significant number of general public books on personal investment, statistics and AIs] and Jay Cordes from OUP for review a few weeks ago and read it on my trip to Salzburg. This short book contains a lot of anecdotes and what I would qualify of small talk on job experiences and colleagues’ idiosyncrasies…. More fundamentally, it reads as a sequence of examples of bad or misused statistics, as many general public books on statistics do, but with little to say on how to spot such misuses of statistics. Its title (It seems like the 9 pitfalls of… is a rather common début for a book title!) however started a (short) conversation with my neighbour on the train to Salzburg as she wanted to know if the job opportunities in data sciences were better in Germany than in Austria. A practically important question for which I had no clue. And I do not think the book would have helped either! (My neighbour in the earlier plane to München had a book on growing lotus, which was not particularly enticing for launching a conversation either.)

Chapter I “Using bad data” is made of examples of truncated or cherry picked data often associated with poor graphics. Only one dimensional outcome and also very US centric. Chapter II “Data before theory” highlights spurious correlations and post hoc predictions, criticism of data mining, some examples being quite standard. Chapter III “Worshiping maths” sounds like the perfect opposite of the previous cahpter: it discusses the fact that all models are wrong but some may be more wrong than others. And gives examples of over fitting, p-value hacking, regression applied to longitudinal data. With the message that (maths) assumptions are handy and helpful but not always realistic. Chapter IV “Worshiping computers” is about the new golden calf and contains rather standard stuff on trusting the computer output because it is a machine. However, the book is somewhat falling foul of the same mistake by trusting a Monte Carlo simulation of a shortfall probability for retirees since Monte Carlo also depends on a model! Computer simulations may be fine for Bingo night or poker tournaments but much more uncertain for complex decisions like retirement investments. It is also missing the biasing aspects in constructing recidivism prediction models pointed out in Weapons of math destruction. Until Chapter 9 at least. The chapter is also mentioning adversarial attacks if not GANs (!). Chapter V “Torturing data” mentions famous cheaters like Wansink of the bottomless bowl and pizza papers and contains more about p-hacking and reproducibility. Chapter VI “Fooling yourself” is a rather weak chapter in my opinion. Apart from Ioannidis take on Theranos’ lack of scientific backing, it spends quite a lot of space on stories about poker gains in the unregulated era of online poker, with boasts of significant gains that are possibly earned from compulsive gamblers playing their family savings, which is not particularly praiseworthy. And about Brazilian jiu-jitsu. Chapter VII “Correlation vs causation” predictably mentions Judea Pearl (whose book of why I just could not finish after reading one rant too many about statisticians being unable to get causality right! Especially after discussing the book with Andrew.). But not so much to gather from the chapter, which could have instead delved into deep learning and its ways to avoid overfitting. The first example of this chapter is more about confusing conditionals (what is conditional on what?) than turning causation around. Chapter VII “Regression to the mean” sees Galton’s quincunx reappearing here after Pearl’s book where I learned (and checked with Steve Stiegler) that the device was indeed intended for that purpose of illustrating regression to the mean. While the attractive fallacy is worth pointing out there are much worse abuses of regression that could be presented. CHANCE’s Howard Wainer also makes an appearance along SAT scores. Chapter IX “Doing harm” does engage into the issue that predicting social features like recidivism by a (black box) software is highly worrying (and just plain wrong) if only because of this black box nature. Moving predictably to chess and go with the right comment that this does not say much about real data problems. A word of warning about DNA testing containing very little about ancestry, if only because of the company limited and biased database. With further calls for data privacy and a rather useless entry on North Korea. Chapter X “The Great Recession“, which discusses the subprime scandal (as in Stewart’s book), contains a set of (mostly superfluous) equations from Samuelson’s paper (supposed to scare or impress the reader?!) leading to the rather obvious result that the expected concave utility of a weighted average of iid positive rvs is maximal when all the weights are equal, result that is criticised by laughing at the assumption of iid-ness in the case of mortgages. Along with those who bought exotic derivatives whose construction they could not understand. The (short) chapter keeps going through all the (a posteriori) obvious ingredients for a financial disaster to link them to most of the nine pitfalls. Except the second about data before theory, because there was no data, only theory with no connection with reality. This final chapter is rather enjoyable, if coming after the facts. And containing this altogether unnecessary mathematical entry. [Usual warning: this review or a revised version of it is likely to appear in CHANCE, in my book reviews column.]

FALL [book review]

Posted in Books, pictures, Travel with tags , , , , , , , , , , , , , , , , on August 30, 2019 by xi'an

The “last” book I took with me to Japan is Neal Stephenson’s FALL. With subtitle “Dodge in Hell”. It shares some characters with REAMDE but nothing prevents reading it independently as a single volume. Or not reading it at all! I am rather disappointed by the book and hence  sorry I had to carry it throughout Japan and back. And slightly X’ed at Nature writing such a positive review. And at The Guardian. (There is a theme there, as I took REAMDE for a trip to India with a similar feeling at the end. Maybe the sheer weight of the book is pulling my morale down…) The most important common feature to both books is the game industry, since the main (?) character is a game company manager, who is wealthy enough to ensure the rest of the story holds some financial likelihood. And whose training as a game designer impacts the construction of the afterlife that takes a good (or rather terrible) half of the heavy volume. The long minutes leading to his untimely death are also excruciatingly rendered (with none of the experimental nature of Leopold Bloom’s morning). With the side information that Dodge suffers from ocular migraine, a nuisance that visits me pretty regularly since my teenage years! The scientific aspects of the story are not particularly exciting either, since the core concept is that by registering the entire neuronal network of the brain of individuals after their death, a computer could revive them by simulating this network. With dead people keeping their personality if very little of their memories. And even more fanciful, interacting between them and producing a signal that can be understood by (living) humans. Despite having no sensory organs. The reconstruction of a world by the simulated NNs is unbearably slow and frankly uninteresting as it reproduces both living behaviours and borrows very heavily from the great myths, mostly Greek, with no discernible depth. The living side of the story is not much better, although with a little touch of the post-apocalyptic flavour I appreciated in Stephenson. But not enough to recover from the fall.

Among other things that set me off with the book, the complete lack of connection with the massive challenges currently facing humanity. Energy crisis? climate change? Nope. Keep taking an hydroplane to get from Seattle to islands on Puget Sound? Sure. Spending abyssal amounts of energy to animate this electronic Hades? By all means. More and more brittle democracies? Who cares, the Afterworld is a pantheon where gods clash and rule lower beings. Worse, the plot never reaches beyond America, from the heavily focused philosophical or religious background to the character life trajectories. Characters are surprisingly unidimensional, with no default until they become evil. Or die. Academics are not even unidimensional. For instance Sophie’s thesis defence is at best a chat in a café… And talks at a specialist workshop switch from impressive mathematical terms to a 3D representation of the activity of the simulated neuronal networks. Whille these few individuals keep impacting the whole World for their whole life. And beyond… By comparison, the Riverworld series of Phillip José Farmer (that I read forty years ago) is much more enjoyable as a tale of the Afterworld, even if one can object at “famous” people been central to the action. At least there are more of them and, judging from their (first) life, they may have interesting and innovative to say.

Denver snapshot [jatp]

Posted in pictures, Travel, Wines with tags , , , , , , , , , on July 28, 2019 by xi'an

off to Denver! [JSM2019]

Posted in Statistics with tags , , , , , , , , , on July 27, 2019 by xi'an

As off today, I am attending JSM 2019 in Denver, giving an “Introductory Overview Lecture” on The ABC of Approximate Bayesian Computation on Sunday afternoon and chairing an ABC session on Monday morning. As far as I know these are the only ABC sessions at JSM this year… And hence the only sessions I will be attending. (I have not been to Denver and the area since 1993, when I visited Kerrie Mengersen and Richard Tweedie in Fort Collins. And hiked up to Long Peak with Gerard. Alas, no time for climbing in the Rockies this time.)

The Long, Cruel History of the Anti-Abortion Crusade [reposted]

Posted in Books, Kids, Travel with tags , , , , , , , , on July 14, 2019 by xi'an

[Excerpts from an editorial in the NYT of John Irving, American author of the Cider House Rules novel we enjoyed reading 30 years ago]

“(…) I respect your personal reasons not to have an abortion — no one is forcing you to have one. I respect your choice. I’m pro-choice — often called pro-abortion by the anti-abortion crusaders, although no one is pro-abortion. What’s unequal about the argument is the choice; the difference between pro-life and pro-choice is the choice. Pro-life proponents have no qualms about forcing women to go through childbirth — they give women no choice (…)

I must remind the Roman Catholic Church of the First Amendment to the United States Constitution: “Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof.” In other words, we are free to practice the religion of our choice, and we are protected from having someone else’s religion practiced on us. Freedom of religion in the United States also means freedom from religion (…)

The prevailing impetus to oppose abortion is to punish the woman who doesn’t want the child. The sacralizing of the fetus is a ploy. How can “life” be sacred (and begin at six weeks, or at conception), if a child’s life isn’t sacred after it’s born? Clearly, a woman’s life is never sacred; as clearly, a woman has no reproductive rights (…)

Of an unmarried woman or girl who got pregnant, people of my grandparents’ generation used to say: “She is paying the piper.” Meaning, she deserves what she gets — namely, to give birth to a child. That cruelty is the abiding impetus behind the dishonestly named right-to-life movement. Pro-life always was (and remains) a marketing term. Whatever the anti-abortion crusaders call themselves, they don’t care what happens to an unwanted child — not after the child is born — and they’ve never cared about the mother.”

Stein’s method in machine learning [workshop]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , on April 5, 2019 by xi'an

There will be an ICML workshop on Stein’s method in machine learning & statistics, next July 14 or 15, located in Long Beach, CA. Organised by François-Xavier Briol (formerly Warwick), Lester Mckey, Chris Oates (formerly Warwick), Qiang Liu, and Larry Golstein. To quote from the webpage of the workshop

Stein’s method is a technique from probability theory for bounding the distance between probability measures using differential and difference operators. Although the method was initially designed as a technique for proving central limit theorems, it has recently caught the attention of the machine learning (ML) community and has been used for a variety of practical tasks. Recent applications include goodness-of-fit testing, generative modeling, global non-convex optimisation, variational inference, de novo sampling, constructing powerful control variates for Monte Carlo variance reduction, and measuring the quality of Markov chain Monte Carlo algorithms.

Speakers include Anima Anandkumar, Lawrence Carin, Louis Chen, Andrew Duncan, Arthur Gretton, and Susan Holmes. I am quite sorry to miss two workshops dedicated to Stein’s work in a row, the other one being at NUS, Singapore, around the Stein paradox.

BayesComp 20: call for contributed sessions!

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , on March 20, 2019 by xi'an

Just to remind readers of the incoming deadline for BayesComp sessions:

The deadline for providing a title and brief abstract that the session is April 1, 2019. Please provide the names and affiliations of the organizer and the three speakers (the organizer can be one of them). Each session lasts 90 minutes and each talk should be 30 minutes long including Q&A. Contributed sessions can also consist of tutorials on the use of novel software. Decisions will be made by April 15, 2019. Please send your proposals to Christian Robert, co-chair of the scientific committee. We look forward to seeing you at BayesComp 20!

In case you do not feel like organising a whole session by yourself, contact the ISBA section you feel affinity with and suggest it helps building this session together!