Archive for Toronto

Statistics versus Data Science [or not]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on October 13, 2017 by xi'an

Last week a colleague from Warwick forwarded us a short argumentation by Donald Macnaughton (a “Toronto-based statistician”) about switching the name of our field from Statistics to Data Science. This is not the first time I hear of this proposal and this is not the first time I express my strong disagreement with it! Here are the naughtonian arguments

  1. Statistics is (at least in the English language) endowed with several meanings from the compilation of numbers out of a series of observations to the field, to the procedures proposed by the field. This is argued to be confusing for laypeople. And missing the connection with data at the core of our field. As well as the indication that statistics gathers information from the data. Data science seems to convey both ideas… But it is equally vague in that most scientific fields if not all rely on data and observations and the structure exploitation of such data. Actually a lot of so-called “data-scientists” have specialised in the analysis of data from their original field, without voluntarily embarking upon a career of data-scientist. And not necessarily acquiring the proper tools for incorporating uncertainty quantification (aka statistics!).
  2. Statistics sounds old-fashioned and “old-guard” and “inward-looking” and unattractive to young talents, while they flock to Data Science programs. Which is true [that they flock] but does not mean we [as a field] must flock there as well. In five or ten years, who can tell this attraction of data science(s) will still be that strong. We already had to switch our Master names to Data Science or the like, this is surely more than enough.
  3. Data science is encompassing other areas of science, like computer science and operation research, but this is not an issue both in terms of potential collaborations and gaining the upper ground as a “key part” in the field. Which is more wishful thinking than a certainty, given the existing difficulties in being recognised as a major actor in data analysis. (As for instance in a recent grant evaluation in “Big Data” where the evaluation committee involved no statistician. And where we got rejected.)

snapshot from Toronto [guest picture]

Posted in pictures, Travel with tags , , , , on July 10, 2016 by xi'an

a maths mansion!

Posted in Books, Kids, pictures, Travel with tags , , , , , , , , , on October 11, 2015 by xi'an

I read in The Guardian today about James Stewart’s house being for sale. James Stewart was a prolific author of many college and high-school books on calculus and pre-calculus. I have trouble understanding how one can write so many books on the same topic, but he apparently managed, to the point of having this immense house designed by architects to his taste. Which sounds a bit passé in my opinion. Judging from the covers of the books, and from the shape of the house, he had a fascination for the integral sign (which has indeed an intrinsic beauty!). Still amazing considering it was paid by his royalties. Less amazing when checking the price of those books: they are about $250 a piece. Multiplied by hundreds of thousands of copies sold every year, it sums up to being able to afford such a maths mansion! (I am not so sure I can take over the undergrad market by recycling the Bayesian Choice..!)

Paris Machine Learning Meeting #10 Season 2

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , on June 17, 2015 by xi'an

Invalides, Paris, May 8, 2012

Tonight, I am invited to give a speed-presenting talk at the Paris Machine Learning last meeting of Season 2, with the themes of DL, Recovering Robots, Vowpal Wabbit, Predcsis, Matlab, and Bayesian test [by yours truly!] The meeting will take place in Jussieu, Amphi 25, Here are my slides for the meeting:

As it happened, the meeting  was quite crowded with talks and plagued with technical difficulties in transmitting talks from Berlin and Toronto, so I came to talk about three hours after the beginning, which was less than optimal for the most technical presentation of the evening. I actually wonder if I even managed to carry the main idea of replacing Bayes factors with posteriors of the mixture weight! [I had plenty of time to reflect upon this on my way back home as I had to wait for several and rare and crowded RER trains until one had enough room for me and my bike!]

bikes vs cars

Posted in Kids, pictures, Running, Travel with tags , , , , , , on May 9, 2015 by xi'an

Trailer for a film by Frederik Gertten about the poor situation of cyclists in most cities. Don’t miss Rob Ford, infamous ex-mayor of Toronto, and his justification for closing bike lanes in the city, comparing cycling to swimming with sharks… and siding with the sharks.

convergence speeds

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , on December 5, 2013 by xi'an

IMG_1964While waiting for Jean-Michel to leave a thesis defence committee he was part of, I read this recently arXived survey by Novak and Rudolf, Computation of expectations by Markov chain Monte Carlo methods. The first part hinted at a sort of Bernoulli factory problem: when computing the expectation of f against the uniform distribution on G,

For x ∈ G we can compute f (x) and G is given by a membership oracle, i.e. we are able to check whether any x is in G or not.

However, the remainder of the paper does not get (in) that direction but recalls instead convergence results for MCMC schemes under various norms. Like spectral gap and Cheeger’s inequalities. So useful for a quick reminder, e.g. to my Monte Carlo Statistical Methods class Master students, but altogether well-known. The paper contains some precise bounds on the mean square error of the Monte Carlo approximation to the integral. For instance, for the hit-and-run algorithm, the uniform bound (for functions f bounded by 1) is

9.5\cdot 10^{7}\dfrac{dr}{\sqrt{n}}+6.4\cdot 10^{15}\dfrac{d^2r^2}{n}

where d is the dimension of the space and r a scale of the volume of G. For the Metropolis-Hastings algorithm, with (independent) uniform proposal on G, the bound becomes


where C is an upper bound on the target density (no longer the uniform). [I rephrased Theorem 2 by replacing vol(G) with the containing hyper-ball to connect both results, αd being the proportionality constant.] The paper also covers the case of the random walk Metropolis-Hastings algorithm, with the deceptively simple bound

1089\dfrac{(d+1)\max\{\alpha,\sqrt{d+1}\}}{\sqrt{n}}+8.38\cdot 10^5\dfrac{(d+1)\max\{\alpha^2,d+1\}}{n}

but this is in the special case when G is the ball of radius d. The paper concludes with a list of open problems.

Unusual timing shows how random mass murder can be (or even less)

Posted in Books, R, Statistics, Travel with tags , , , , , , , , on November 29, 2013 by xi'an

This post follows the original one on the headline of the USA Today I read during my flight to Toronto last month. I remind you that the unusual pattern was about observing four U.S. mass murders happening within four days, “for the first time in at least seven years”. Which means that the difference between the four dates is at most 3, not 4!

I asked my friend Anirban Das Gupta from Purdue University are the exact value of this probability and the first thing he pointed out was that I used a different meaning of “within 4”. He then went into an elaborate calculation to find an upper bound on this probability, upper bound that was way above my Monte Carlo approximation and my rough calculation of last post. I rechecked my R code and found it was not achieving the right approximation since one date was within 3 days of three other days, at least… I thus rewrote the following R code

for (t in 1:T){
  day=sort(sample(1:365,30,rep=TRUE)) #30 random days
  day=c(day,day[day>363]-365) #account for toric difference

[checked it was ok for two dates within 1 day, resulting in the birthday problem probability] and found 0.070214, which is much larger than the earlier value and shows it takes an average 14 years for the “unlikely” event to happen! And the chances that it happens within seven years is 40%.

Another coincidence relates to this evaluation, namely the fact that two elderly couples in France committed couple suicide within three days, last week. I however could not find the figures for the number of couple suicides per year. Maybe because it is extremely rare. Or undetected…