Archive for the Statistics Category

statistics: a data science for the 21st century

Posted in Statistics with tags , , , , , on May 15, 2018 by xi'an

Peter

rage against the [Nature] Machine [Intelligence]

Posted in Books, Statistics, University life with tags , , , , , , , , , on May 15, 2018 by xi'an

Yesterday evening, my friend and colleague Pierre Alquier (CREST-ENSAE) got interviewed (for a few seconds on-line!, around minute 06) by the French national radio, France Culture, about the recent call to boycott the incoming Nature Machine Intelligence electronic journal. Call to the machine learning community, based on the lack of paying journals among the major machine learnings journals, like JMLR. Meaning that related conferences like AISTATS and NIPS also get their accepted papers available on-line for free. As noted in the call

“Machine learning has been at the forefront of the movement for free and open access to research. For example, in 2001 the Editorial Board of the Machine Learning Journal resigned en masse to form a new zero-cost open access journal, the Journal of Machine Learning Research (JMLR).”

a Swiss summer school on data assimilation

Posted in Books, Kids, Mountains, pictures, Statistics, Travel, University life with tags , , , , , on May 14, 2018 by xi'an

My friend Antonietta Mira sent me the announcement of a combined summer school and workshop on “Data Assimilation” that will take place from September 11th to 15th in Lugano, Switzerland. With Tamara Broderick, Philippe Moireau, and Andrew Stuart as teachers. (Registration, incl. lunches, is 120 CHF for the whole week.)

just in case your summer of British conferences is not yet fully-booked…

Posted in Statistics with tags , , , , , , , , , , , on May 11, 2018 by xi'an

the riddle of the stands

Posted in Books, Kids, R with tags , , , , , on May 11, 2018 by xi'an

The simple riddle of last week on The Riddler, about the minimum number of urinals needed for n men to pee if the occupation rule is to stay as far as possible from anyone there and never to stand next to another man,  is quickly solved by an R code:

ocupee=function(M){
 ok=rep(0,M)
 ok[1]=ok[M]=1
 ok[trunc((1+M/2))]=1
 while (max(diff((1:M)[ok!=0])>2)){
  i=order(-diff((1:M)[ok!=0]))[1]
  ok[(1:M)[ok!=0][i]+trunc((diff((1:M)[ok!=0])[i]/2))]=1
  }
 return(sum(ok>0))
 }

with maximal occupation illustrated by the graph below:

Meaning that the efficiency of the positioning scheme is not optimal when following the sequential positioning, requiring N+2^{\lceil log_2(N-1) \rceil} urinals. Rather than one out of two, requiring 2N-1 urinals. What is most funny in this simple exercise is the connection exposed in the Riddler with an Xkcd blag written a few years go about the topic.

linear Diophantine equations

Posted in Statistics with tags , , , , , , on May 10, 2018 by xi'an

When re-expressed in maths terms, the current Riddler is about finding a sequence x⁰,x¹,…,x⁷ of integers such that

x⁰=7x¹+1
6x¹=7x²+1

6x⁶=7x⁷+1
6x⁷=7x⁸

which turns into a linear equation with integer valued solutions, or a system of linear Diophantine equation. Which can be easily solved by brute-force R coding:

A=matrix(0,7,7)
for (i in 1:7) A[i,i]=6
for (i in 1:6) A[i,i+1]=-7
for (x in 1:1e6){
  zol=solve(a=A,b=c(rep(1,6),7*x))
  if (max(abs(zol-round(zol)))<1e-3) print(x)}
x=39990 #x8=5.6.31.43
7*solve(a=A,b=c(rep(1,6),7*x))[1]+1 #x0

which produces x⁰=823537. But it would be nicer to directly solve the linear system under the constraint. For instance, the inverse of the matrix A above is an upper triangular matrix with (upper-)diagonals

1/6, 7/6², 7²/6³,…,7⁶/6⁷

but this does not help considerably, except for x⁸ to be solutions to 7 equations involving powers of 6 and 7… This system of equations can be solved by successive substitutions but this still feels very pedestrian!

 

deaths at sea and a workshop

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on May 9, 2018 by xi'an

For several years, actually from the beginning of the Syrian revolution, I have been looking for data and for statisticians working on migrant deaths resulting from crossing the Mediterranean. With very little success, either because the researchers I met had poor and fragmented data, or because the agencies I contacted showed no (good) will into returning these statistics. Frontex being the most blatant example. I thus read with a lot of interest this article “Uncounted: Invisible Deaths on Europe’s Borders” which analyses the reasons for not producing statistics on the deaths at sea linked with desperate migrants crossing the sea in ill-suited boats.

In connection with this pressing issue, Kerrie Mengersen, Pierre Pudlo and myself organise next November a small workshop on Young Bayesians and Big Data for social good, at CIRM, Marseille, France. It will take place on the weekend before our main conference, Bayesian statistics in the Big Data era, that is, on 23-26 November 2018. Registration is free (and on site accomodation is cheap) but the number of attendees is limited, so apply asap! Senior participants include at this stage Tamara Broderick (MIT), Julien Cornebise (Element AI, TBC), David Corliss (Peace Work), Ruth King (Edinburgh), Cody Ross (UCSD, TBC), and the workshop aims at bringing participants to work together on methodological challenges and characteristic datasets. The outcome of the workshop will be presented at the beginning of the Bayesian statistics in the Big Data era, conference, on Monday 26 November.