## snapshot from Toronto [guest picture]

Posted in pictures, Travel with tags , , , , on July 10, 2016 by xi'an

## a maths mansion!

Posted in Books, Kids, pictures, Travel with tags , , , , , , , , , on October 11, 2015 by xi'an

I read in The Guardian today about James Stewart’s house being for sale. James Stewart was a prolific author of many college and high-school books on calculus and pre-calculus. I have trouble understanding how one can write so many books on the same topic, but he apparently managed, to the point of having this immense house designed by architects to his taste. Which sounds a bit passé in my opinion. Judging from the covers of the books, and from the shape of the house, he had a fascination for the integral sign (which has indeed an intrinsic beauty!). Still amazing considering it was paid by his royalties. Less amazing when checking the price of those books: they are about \$250 a piece. Multiplied by hundreds of thousands of copies sold every year, it sums up to being able to afford such a maths mansion! (I am not so sure I can take over the undergrad market by recycling the Bayesian Choice..!)

## Paris Machine Learning Meeting #10 Season 2

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , on June 17, 2015 by xi'an

Tonight, I am invited to give a speed-presenting talk at the Paris Machine Learning last meeting of Season 2, with the themes of DL, Recovering Robots, Vowpal Wabbit, Predcsis, Matlab, and Bayesian test [by yours truly!] The meeting will take place in Jussieu, Amphi 25, Here are my slides for the meeting:

As it happened, the meeting  was quite crowded with talks and plagued with technical difficulties in transmitting talks from Berlin and Toronto, so I came to talk about three hours after the beginning, which was less than optimal for the most technical presentation of the evening. I actually wonder if I even managed to carry the main idea of replacing Bayes factors with posteriors of the mixture weight! [I had plenty of time to reflect upon this on my way back home as I had to wait for several and rare and crowded RER trains until one had enough room for me and my bike!]

## bikes vs cars

Posted in Kids, pictures, Running, Travel with tags , , , , , , on May 9, 2015 by xi'an

Trailer for a film by Frederik Gertten about the poor situation of cyclists in most cities. Don’t miss Rob Ford, infamous ex-mayor of Toronto, and his justification for closing bike lanes in the city, comparing cycling to swimming with sharks… and siding with the sharks.

## convergence speeds

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , on December 5, 2013 by xi'an

While waiting for Jean-Michel to leave a thesis defence committee he was part of, I read this recently arXived survey by Novak and Rudolf, Computation of expectations by Markov chain Monte Carlo methods. The first part hinted at a sort of Bernoulli factory problem: when computing the expectation of f against the uniform distribution on G,

For x ∈ G we can compute f (x) and G is given by a membership oracle, i.e. we are able to check whether any x is in G or not.

However, the remainder of the paper does not get (in) that direction but recalls instead convergence results for MCMC schemes under various norms. Like spectral gap and Cheeger’s inequalities. So useful for a quick reminder, e.g. to my Monte Carlo Statistical Methods class Master students, but altogether well-known. The paper contains some precise bounds on the mean square error of the Monte Carlo approximation to the integral. For instance, for the hit-and-run algorithm, the uniform bound (for functions f bounded by 1) is

$9.5\cdot 10^{7}\dfrac{dr}{\sqrt{n}}+6.4\cdot 10^{15}\dfrac{d^2r^2}{n}$

where d is the dimension of the space and r a scale of the volume of G. For the Metropolis-Hastings algorithm, with (independent) uniform proposal on G, the bound becomes

$\dfrac{2C\alpha_dr^d}{n}+\dfrac{4C^2\alpha_d^2r^{2d}}{n^2}\,,$

where C is an upper bound on the target density (no longer the uniform). [I rephrased Theorem 2 by replacing vol(G) with the containing hyper-ball to connect both results, αd being the proportionality constant.] The paper also covers the case of the random walk Metropolis-Hastings algorithm, with the deceptively simple bound

$1089\dfrac{(d+1)\max\{\alpha,\sqrt{d+1}\}}{\sqrt{n}}+8.38\cdot 10^5\dfrac{(d+1)\max\{\alpha^2,d+1\}}{n}$

but this is in the special case when G is the ball of radius d. The paper concludes with a list of open problems.

## Unusual timing shows how random mass murder can be (or even less)

Posted in Books, R, Statistics, Travel with tags , , , , , , , , on November 29, 2013 by xi'an

This post follows the original one on the headline of the USA Today I read during my flight to Toronto last month. I remind you that the unusual pattern was about observing four U.S. mass murders happening within four days, “for the first time in at least seven years”. Which means that the difference between the four dates is at most 3, not 4!

I asked my friend Anirban Das Gupta from Purdue University are the exact value of this probability and the first thing he pointed out was that I used a different meaning of “within 4”. He then went into an elaborate calculation to find an upper bound on this probability, upper bound that was way above my Monte Carlo approximation and my rough calculation of last post. I rechecked my R code and found it was not achieving the right approximation since one date was within 3 days of three other days, at least… I thus rewrote the following R code

T=10^6
four=rep(0,T)
for (t in 1:T){
day=sort(sample(1:365,30,rep=TRUE)) #30 random days
day=c(day,day[day>363]-365) #account for toric difference
tem=outer(day,day,"-")
four[t]=(max(apply(((tem>-1)&(tem<4)),1,sum)>3))
}
mean(four)


[checked it was ok for two dates within 1 day, resulting in the birthday problem probability] and found 0.070214, which is much larger than the earlier value and shows it takes an average 14 years for the “unlikely” event to happen! And the chances that it happens within seven years is 40%.

Another coincidence relates to this evaluation, namely the fact that two elderly couples in France committed couple suicide within three days, last week. I however could not find the figures for the number of couple suicides per year. Maybe because it is extremely rare. Or undetected…

## Unusual timing shows how random mass murder can be (or not)

Posted in Books, R, Statistics, Travel with tags , , , , , , , , on November 4, 2013 by xi'an

This was one headline in the USA Today I picked from the hotel lobby on my way to Pittsburgh airport and then Toronto this morning. The unusual pattern was about observing four U.S. mass murders happening within four days, “for the first time in at least seven years”. The article did not explain why this was unusual. And reported one mass murder expert’s opinion instead of a statistician’s…

Now, there are about 30 mass murders in the U.S. each year (!), so the probability of finding at least four of those 30 events within 4 days of one another should be related to von Mises‘ birthday problem. For instance, Abramson and Moser derived in 1970 that the probability that at least two people (among n) have birthday within k days of one another (for an m days year) is

$p(n,k,m) = 1 - \dfrac{(m-nk-1)!}{m^{n-1}(m-nk-n)!}$

but I did not find an extension to the case of the four (to borrow from Conan Doyle!)… A quick approximation would be to turn the problem into a birthday problem with 364/4=91 days and count the probability that four share the same birthday

${30 \choose 4} \frac{90^{26}}{91^{29}}=0.0273$

which is surprisingly large. So I checked with a R code in the plane:

T=10^5
four=rep(0,T)
for (t in 1:T){
day=sample(1:365,30,rep=TRUE)
four[t]=(max(apply((abs(outer(day,day,"-"))<4),1,sum))>4)}
mean(four)


and found 0.0278, which means the above approximation is far from terrible! I think it may actually be “exact” in the sense that observing exactly four murders within four days of one another is given by this probability. The cases of five, six, &tc. murders are omitted but they are also highly negligible. And from this number, we can see that there is a 18% probability that the case of the four occurs within seven years. Not so unlikely, then.