Conditional love [guest post]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on August 4, 2015 by xi'an

[When Dan Simpson told me he was reading Terenin’s and Draper’s latest arXival in a nice Bath pub—and not a nice bath tub!—, I asked him for a blog entry and he agreed. Here is his piece, read at your own risk! If you remember to skip the part about Céline Dion, you should enjoy it very much!!!]

Probability has traditionally been described, as per Kolmogorov and his ardent follower Katy Perry, unconditionally. This is, of course, excellent for those of us who really like measure theory, as the maths is identical. Unfortunately mathematical convenience is not necessarily enough and a large part of the applied statistical community is working with Bayesian methods. These are unavoidably conditional and, as such, it is natural to ask if there is a fundamentally conditional basis for probability.

Bruno de Finetti—and later Richard Cox and Edwin Jaynes—considered conditional bases for Bayesian probability that are, unfortunately, incomplete. The critical problem is that they mainly consider finite state spaces and construct finitely additive systems of conditional probability. For a variety of reasons, neither of these restrictions hold much truck in the modern world of statistics.

In a recently arXiv’d paper, Alexander Terenin and David Draper devise a set of axioms that make the Cox-Jaynes system of conditional probability rigorous. Furthermore, they show that the complete set of Kolmogorov axioms (including countable additivity) can be derived as theorems from their axioms by conditioning on the entire sample space.

This is a deep and fundamental paper, which unfortunately means that I most probably do not grasp it’s complexities (especially as, for some reason, I keep reading it in pubs!). However I’m going to have a shot at having some thoughts on it, because I feel like it’s the sort of paper one should have thoughts on. Continue reading

Laplace great⁶-grand child!

Posted in Kids, pictures, Statistics, University life with tags , , , , , , , , , on August 3, 2015 by xi'an

eulerchild1laplacechildLooking at the Family Tree application (I discovered via Peter Coles’ blog), I just found out that I was Laplace’s [academic] great-great-great-great-great-great-great-grand-child! Through Poisson and Chasles. Going even further, as Simeon Poisson was also advised by Lagrange, my academic lineage reaches Euler and the Bernoullis. Pushing always further, I even found William of Ockham along one of the “direct” branches! Amazing ancestry, to which my own deeds pay little homage if any… (However, I somewhat doubt the strength of the links for the older names, since pursuing them ends up at John the Baptist!)

I wonder how many other academic descendants of Laplace are alive today. Too bad Family Tree does not seem to offer this option! Given the longevity of both Laplace and Poisson, they presumably taught many students, which means a lot of my colleagues and even of my Bayesian colleagues should share the same illustrious ancestry. For instance, I share part of this ancestry with Gérard Letac. And Jean-Michel Marin. Actually, checking with the Mathematics Genealogy Project, I see that Laplace had… one student!, but still a grand total of [at least] 85,738 descendants… Incidentally, looking at the direct line, most of those had very few [recorded] descendants.

weird bug

Posted in Kids, pictures with tags , , , , , on August 2, 2015 by xi'an


new tomatoes [update]

Posted in Kids, pictures with tags , , on August 1, 2015 by xi'an


Judith Rousseau gets Bernoulli Society Ethel Newbold Prize

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on July 31, 2015 by xi'an

As announced at the 60th ISI World Meeting in Rio de Janeiro, my friend, co-author, and former PhD student Judith Rousseau got the first Ethel Newbold Prize! Congrats, Judith! And well-deserved! The prize is awarded by the Bernoulli Society on the following basis

The Ethel Newbold Prize is to be awarded biannually to an outstanding statistical scientist for a body of work that represents excellence in research in mathematical statistics, and/or excellence in research that links developments in a substantive field to new advances in statistics. In any year in which the award is due, the prize will not be awarded unless the set of all nominations includes candidates from both genders.

and is funded by Wiley. I support very much this (inclusive) approach of “recognizing the importance of women in statistics”, without creating a prize restricted to women nominees (and hence exclusive).  Thanks to the members of the Program Committee of the Bernoulli Society for setting that prize and to Nancy Reid in particular.

Ethel Newbold was a British statistician who worked during WWI in the Ministry of Munitions and then became a member of the newly created Medical Research Council, working on medical and industrial studies. She was the first woman to receive the Guy Medal in Silver in 1928. Just to stress that much remains to be done towards gender balance, the second and last woman to get a Guy Medal in Silver is Sylvia Richardson, in 2009… (In addition, Valerie Isham, Nicky Best, and Fiona Steele got a Guy Medal in Bronze, out of the 71 so far awarded, while no woman ever got a Guy Medal in Gold.) Funny occurrences of coincidence: Ethel May Newbold was educated at Tunbridge Wells, the place where Bayes was a minister, while Sylvia is now head of the Medical Research Council biostatistics unit in Cambridge.

gradient importance sampling

Posted in Books, pictures, Statistics, University life with tags , , , , , , on July 30, 2015 by xi'an

from my office, La Défense & Bois de Boulogne, Paris, May 15, 2012Ingmar Schuster, who visited Paris-Dauphine last Spring (and is soon to return here as a postdoc funded by Fondation des Sciences Mathématiques de Paris) has arXived last week a paper on gradient importance sampling. In this paper, he builds a sequential importance sampling (or population Monte Carlo) algorithm that exploits the additional information contained in the gradient of the target. The proposal or importance function being essentially the MALA move as its proposal, mixed across the elements of the previous population. When compared with our original PMC mixture of random walk proposals found in e.g. this paper, each term in the mixture thus involves an extra gradient, with a scale factor that decreases to zero as 1/t√t. Ingmar compares his proposal with an adaptive Metropolis, an adaptive MALTa and an HM algorithms, for two mixture distributions and the banana target of Haario et al. (1999) we also used in our paper. As well as a logistic regression. In each case, he finds both a smaller squared error and a smaller bias for the same computing time (evaluated as the number of likelihood evaluations). While we discussed this scheme when he visited, I remain intrigued as to why it works so well when compared with the other solutions. One possible explanation is that the use of the gradient drift is more efficient on a population of particles than on a single Markov chain, provided the population covers all modes of importance on the target surface: the “fatal” attraction of the local model is then much less of an issue…

Bayesian model averaging in astrophysics

Posted in Books, Statistics, University life with tags , , , , , , , , , , on July 29, 2015 by xi'an

[A 2013 post that somewhat got lost in a pile of postponed entries and referee’s reports…]

In this review paper, now published in Statistical Analysis and Data Mining 6, 3 (2013), David Parkinson and Andrew R. Liddle go over the (Bayesian) model selection and model averaging perspectives. Their argument in favour of model averaging is that model selection via Bayes factors may simply be too inconclusive to favour one model and only one model. While this is a correct perspective, this is about it for the theoretical background provided therein. The authors then move to the computational aspects and the first difficulty is their approximation (6) to the evidence

P(D|M) = E \approx \frac{1}{n} \sum_{i=1}^n L(\theta_i)Pr(\theta_i)\, ,

where they average the likelihood x prior terms over simulations from the posterior, which does not provide a valid (either unbiased or converging) approximation. They surprisingly fail to account for the huge statistical literature on evidence and Bayes factor approximation, incl. Chen, Shao and Ibrahim (2000). Which covers earlier developments like bridge sampling (Gelman and Meng, 1998).

As often the case in astrophysics, at least since 2007, the authors’ description of nested sampling drifts away from perceiving it as a regular Monte Carlo technique, with the same convergence speed n1/2 as other Monte Carlo techniques and the same dependence on dimension. It is certainly not the only simulation method where the produced “samples, as well as contributing to the evidence integral, can also be used as posterior samples.” The authors then move to “population Monte Carlo [which] is an adaptive form of importance sampling designed to give a good estimate of the evidence”, a particularly restrictive description of a generic adaptive importance sampling method (Cappé et al., 2004). The approximation of the evidence (9) based on PMC also seems invalid:

E \approx \frac{1}{n} \sum_{i=1}^n \dfrac{L(\theta_i)}{q(\theta_i)}\, ,

is missing the prior in the numerator. (The switch from θ in Section 3.1 to X in Section 3.4 is  confusing.) Further, the sentence “PMC gives an unbiased estimator of the evidence in a very small number of such iterations” is misleading in that PMC is unbiased at each iteration. Reversible jump is not described at all (the supposedly higher efficiency of this algorithm is far from guaranteed when facing a small number of models, which is the case here, since the moves between models are governed by a random walk and the acceptance probabilities can be quite low).

The second quite unrelated part of the paper covers published applications in astrophysics. Unrelated because the three different methods exposed in the first part are not compared on the same dataset. Model averaging is obviously based on a computational device that explores the posteriors of the different models under comparison (or, rather, averaging), however no recommendation is found in the paper as to efficiently implement the averaging or anything of the kind. In conclusion, I thus find this review somehow anticlimactic.


Get every new post delivered to your Inbox.

Join 893 other followers