Here is the fifth and last set of slides for my third year statistics course, trying to introduce Bayesian statistics in the most natural way and hence starting with… Rasmus’ socks and ABC!!! This is an interesting experiment as I have no idea how my students will react. Either they will see the point besides the anecdotal story or they’ll miss it (being quite unhappy so far about the lack of mathematical rigour in my course and exercises…). We only have two weeks left so I am afraid the concept will not have time to seep through!
Archive for Université Paris Dauphine
reflections on the probability space induced by moment conditions with implications for Bayesian Inference [slides]Posted in Books, Statistics, University life with tags ABC, Arnold Zellner, Christian Gouriéroux, conference, empirical likelihood, fiducial distribution, measure theory, method of moments, Paris, R.A. Fisher, slides, structural model, Université Paris Dauphine on December 4, 2014 by xi'an
Here are the slides of my incoming discussion of Ron Gallant’s paper, tomorrow.
Today was the second session of our Reading Classics Seminar for the academic year 2014-2015. I have not reported on this seminar so far because it has had starting problems, namely hardly any student present on the first classes and therefore several re-starts until we reach a small group of interested students. Actually, this is the final year for my TSI Master at Paris-Dauphine, as it will become integrated within the new MASH Master next year. The latter started this year and drew away half of our potential applicants, presumably because of the wider spectrum between machine-learning, optimisation, programming and a tiny bit of statistics… If we manage to salvage [within the new Master] our speciality of offering the only Bayesian Statistics training in France, this will not be a complete disaster!
Anyway, the first seminar was about the great 1939 Biometrika paper by Pitman about the best invariant estimator appearing magically as a Bayes estimator! Alas, the student did not grasp the invariance part and hence focussed on less relevant technical parts, which was not a great experience (and therefore led me to abstain from posting the slides here). The second paper was not on my list but was proposed by another student as of yesterday when he realised he was to present today! This paper, entitled “The Counter-intuitive Non-informative Prior for the Bernoulli Family”, was published in the Journal of Statistics Education in 2004 by Zu and Liu, I had not heard of the paper (or of the journal) previously and I do not think it is worth advertising any further as it gives a very poor entry to non-informative priors in the simplest of settings, namely for Bernoulli B(p) observations. Indeed, the stance of the paper is to define a non-informative prior as one returning the MLE of p as its posterior expectation (missing altogether the facts that such a definition is parameterisation-invariant and that, given the modal nature of the MLE, a posterior mode would be much more appropriate, leading to the uniform prior of p as a solution) and that the corresponding prior was made of two Dirac masses at 0 and 1! Which again misses several key points like defining properly convergence in a space of probability distributions and using an improper prior differently from a proper prior. Esp. since in the next section, the authors switch to Haldane’s prior being the Be(0,0) distribution..! A prior that cannot be used since the posterior is not defined when all the observations are identical. Certainly not a paper to make it to the list! (My student simply pasted pages from this paper as his slides and so I see again no point in reposting them here. )
Here is the fourth set of slides for my third year statistics course, trying to build intuition about the likelihood surface and why on Earth would one want to find its maximum?!, through graphs. I am yet uncertain whether or not I will reach the point where I can teach more asymptotics so maybe I will also include asymptotic normality of the MLE under regularity conditions in this chapter…
In the past years, I have see a construction grow and grow under my office windows in Paris-Dauphine, ruining my views of the towers of La Défense, as seen on the above picture. This huge building designed by architect Frank Gehry has now opened as the Fondation Louis Vuitton museum, exposing artworks owned by LVMH and Bernard Arnault. Since I am very close to it and could not get an idea of what it looked like from my office, I took a Vélib yesterday and biked the kilometer between Porte Dauphine and the museum. As it had just opened, it was fairly crowded but I could still take a few pictures of this elegant sail-boat made of glass panels, without entering the art gallery itself…
Once the novelty has worn out and the crowds thinned down, I will be back to look at the exhibits. In the meanwhile, I for sure will not forget its presence…!
There is an open call of the Fondation Sciences Mathématiques de Paris (FSMP) about a postdoctoral funding program with 18 position-years available for staying in Université Paris-Dauphine (and other participating universities). The net support is quite decent (wrt French terms and academic salaries) and the application form easy to fill. So, if you are interested in coming to Paris to work on ABC, MCMC, Bayesian model choice, &tc., feel free to contact me (or another Parisian statistician) and to apply! The deadline is December 01, 2014. And the decision will be made by January 15, 2015. The starting date for the postdoc is October 01, 2015.
Information about social entities is often spread across multiple large databases, each degraded by noise, and without unique identifiers shared across databases.Entity resolution—reconstructing the actual entities and their attributes—is essential to using big data and is challenging not only for inference but also for computation.
In this talk, I motivate entity resolution by the current conflict in Syria. It has been tremendously well documented, however, we still do not know how many people have been killed from conflict-related violence. We describe a novel approach towards estimating death counts in Syria and challenges that are unique to this database. We first introduce computational speed-ups to avoid all-to-all record comparisons based upon locality-sensitive hashing from the computer science literature. We then introduce a novel approach to entity resolution by discovering a bipartite graph, which links manifest records to a common set of latent entities. Our model quantifies the uncertainty in the inference and propagates this uncertainty into subsequent analyses. Finally, we speak to the success and challenges of solving a problem that is at the forefront of national headlines and news.
This is joint work with Rob Hall (Etsy), Steve Fienberg (CMU), and Anshu Shrivastava (Cornell University).
[Note that Rebecca will visit the maths department in Paris-Dauphine for two weeks and give a short course in our data science Master on data confidentiality, privacy and statistical disclosure (syllabus).]