As we were working on the Handbook of mixture analysis with Sylvia Früwirth-Schnatter and Gilles Celeux today, near Saint-Germain des Près, I realised that there was a mistake in our 1990 mixture paper with Jean Diebolt [published in 1994], in that when we are proposing to use improper “Jeffreys” priors under the restriction that no component of the Gaussian mixture is “empty”, meaning that there are at least two observations generated from each component, the likelihood needs to be renormalised to be a density for the sample. This normalisation constant only depends on the weights of the mixture, which means that, when simulating from the full conditional distribution of the weights, there should be an extra-acceptance step to account for this correction. Of course, the term is essentially equal to one for a large enough sample but this remains a mistake nonetheless! It is funny that it remained undetected for so long in my most cited paper. Checking on Larry’s 1999 paper exploring the idea of excluding terms from the likelihood to allow for improper priors, I did not spot him using a correction either.
Archive for Royal Statistical Society
I received this news from the RSS today that all the RSS journals are turning 100% electronic. No paper version any longer! I deeply regret this move on which, as an RSS member, I would have appreciated to be consulted as I find much easier to browse through the current issue when it arrives in my mailbox, rather than being t best reminded by an email that I will most likely ignore and erase. And as I consider the production of the journals the prime goal of the Royal Statistical Society. And as I read that only 25% of the members had opted so far for the electronic format, which does not sound to me like a majority. In addition, moving to electronic-only journals does not bring the perks one would expect from electronic journals:
- no bonuses like supplementary material, code, open or edited comments
- no reduction in the subscription rate of the journals and penalty fees if one still wants a paper version, which amounts to a massive increase in the subscription price
- no disengagement from the commercial publisher, whose role become even less relevant
- no access to the issues of the years one has paid for, once one stops subscribing.
“The benefits of electronic publishing include: faster publishing speeds; increased content; instant access from a range of electronic devices; additional functionality; and of course, environmental sustainability.”
The move is sold with typical marketing noise. But I do not buy it: publishing speeds will remain the same as driven by the reviewing part, I do not see where the contents are increased, and I cannot seriously read a journal article from my phone, so this range of electronic devices remains a gadget. Not happy!
Éric Marchand from Sherbrooke, Québec [historical birthplace of MCMC, since Adrian Smith gave his first talk on his Gibbs sampler there, in June 1989], noticed my recent posts about the approximation of e by Monte Carlo methods and sent me a paper he wrote in The Mathematical Gazette of November 1995 [full MCMC era!] about original proofs on the expectation of some stopping rules being e, like the length of increasing runs. And Gnedenko’s uniform summation until exceeding one. Amazing that this simple problem generated so much investigation!!!
While in Warwick this week, I borrowed a recent issue (Oct. 08, 2015) of Nature from Tom Nichols and read it over diners in a maths house. Its featured topic was reproducibility, with a long initial (or introductory) article about “Fooling ourselves”, starting with an illustration from Andrew himself who had gotten a sign wrong in one of those election studies that are the basis of Red State, Blue State. While this article is not bringing radically new perspectives on the topic, there is nothing shocking about it and it even goes on mentioning Peter Green and his Royal Statistical Society President’s tribune about the Sally Clark case and Eric-Jan Wagenmakers with a collaboration with competing teams that sounded like “putting one’s head on a guillotine”. Which relates to a following “comment” on crowdsourcing research or data analysis.
I however got most interested by another comment by MacCoun and Perlmutter, where they advocate a systematic blinding of data to avoid conscious or unconscious biases. While I deem the idea quite interesting and connected with anonymisation techniques in data privacy, I find the presentation rather naïve in its goals (from a statistical perspective). Indeed, if we consider data produced by a scientific experiment towards the validation or invalidation of a scientific hypothesis, it usually stands on its own, with no other experiment of a similar kind to refer to. Add too much noise and only noise remains. Add too little and the original data remains visible. This means it is quite difficult to calibrate the blinding mechanisms in order for the blinded data to remain realistic enough to be analysed. Or to be different enough from the original data for different conclusions to be drawn. The authors suggest blinding being done by a software, by adding noise, bias, label switching, &tc. But I do not think this blinding can be done blindly, i.e., without a clear idea of what the possible models are, so that the perturbed datasets created out of the original data favour more one of the models under comparison. And are realistic for at least one of those models. Thus, some preliminary analysis of the original or of some pseudo-data from each of the proposed models is somewhat unavoidable to calibrate the blinding machinery towards realistic values. If designing a new model is part of the inferential goals, this may prove impossible… Again, I think having several analyses run in parallel with several perturbed datasets quite a good idea to detect the impact of some prior assumptions. But this requires statistically savvy programmers. And possibly informative prior distributions.
Today is October 20, World Statistics Day as launched by the UN. And supported by local and international societies. In connection with that day, among many events, the RSS will be hosting a reception, China will hold a seminar in… Xi’an, how appropriate!, my friend Kerrie Mengersen will give a talk at the Queensland University of Technology (QUT) on The power and promise of immersive virtual environments. (Bringing her pet crocodile to the talk, hopefully!)
And I will also give a talk in Louvain-la-Neuve, Belgium, on Le délicat dilemme des tests d’hypothèse et de leur résolution bayésienne. At ISBA, which stands for Institute of Statistics, Biostatistics and Actuarial Sciences and not for the Bayesian society!. within UCL, which stands for Université Catholique de Louvain and not for University College London! (And which is not to be confused with the Katholieke Universiteit Leuven, in Leuven, where I was last year for MCqMC. About 25 kilometers away.) In case this is not confusing enough, here are my slides (in English, while the talk will be in French):
As announced at the 60th ISI World Meeting in Rio de Janeiro, my friend, co-author, and former PhD student Judith Rousseau got the first Ethel Newbold Prize! Congrats, Judith! And well-deserved! The prize is awarded by the Bernoulli Society on the following basis
The Ethel Newbold Prize is to be awarded biannually to an outstanding statistical scientist for a body of work that represents excellence in research in mathematical statistics, and/or excellence in research that links developments in a substantive field to new advances in statistics. In any year in which the award is due, the prize will not be awarded unless the set of all nominations includes candidates from both genders.
and is funded by Wiley. I support very much this (inclusive) approach of “recognizing the importance of women in statistics”, without creating a prize restricted to women nominees (and hence exclusive). Thanks to the members of the Program Committee of the Bernoulli Society for setting that prize and to Nancy Reid in particular.
Ethel Newbold was a British statistician who worked during WWI in the Ministry of Munitions and then became a member of the newly created Medical Research Council, working on medical and industrial studies. She was the first woman to receive the Guy Medal in Silver in 1928. Just to stress that much remains to be done towards gender balance, the second and last woman to get a Guy Medal in Silver is Sylvia Richardson, in 2009… (In addition, Valerie Isham, Nicky Best, and Fiona Steele got a Guy Medal in Bronze, out of the 71 so far awarded, while no woman ever got a Guy Medal in Gold.) Funny occurrences of coincidence: Ethel May Newbold was educated at Tunbridge Wells, the place where Bayes was a minister, while Sylvia is now head of the Medical Research Council biostatistics unit in Cambridge.
As a coincidence, I received my copy of JRSS Series B with the Read Paper by Mathieu Gerber and Nicolas Chopin on sequential quasi Monte Carlo just as I was preparing an arXival of a few discussions on the paper! Among the [numerous and diverse] discussions, a few were of particular interest to me [I highlighted members of the University of Warwick and of Université Paris-Dauphine to suggest potential biases!]:
- Mike Pitt (Warwick), Murray Pollock et al. (Warwick) and Finke et al. (Warwick) all suggested combining quasi Monte Carlo with pseudomarginal Metropolis-Hastings, pMCMC (Pitt) and Rao-Bklackwellisation (Finke et al.);
- Arnaud Doucet pointed out that John Skilling had used the Hilbert (ordering) curve in a 2004 paper;
- Chris Oates, Dan Simpson and Mark Girolami (Warwick) suggested combining quasi Monte Carlo with their functional control variate idea;
- Richard Everitt wondered about the dimension barrier of d=6 and about possible slice extensions;
- Zhijian He and Art Owen pointed out simple solutions to handle a random number of uniforms (for simulating each step in sequential Monte Carlo), namely to start with quasi Monte Carlo and end up with regular Monte Carlo, in an hybrid manner;
- Hans Künsch points out the connection with systematic resampling à la Carpenter, Clifford and Fearnhead (1999) and wonders about separating the impact of quasi Monte Carlo between resampling and propagating [which vaguely links to one of my comments];
- Pierre L’Ecuyer points out a possible improvement over the Hilbert curve by a preliminary sorting;
- Frederik Lindsten and Sumeet Singh propose using ABC to extend the backward smoother to intractable cases [but still with a fixed number of uniforms to use at each step], as well as Mateu and Ryder (Paris-Dauphine) for a more general class of intractable models;
- Omiros Papaspiliopoulos wonders at the possibility of a quasi Markov chain with “low discrepancy paths”;
- Daniel Rudolf suggest linking the error rate of sequential quasi Monte Carlo with the bounds of Vapnik and Ĉervonenkis (1977).
The arXiv document also includes the discussions by Julyan Arbel and Igor Prünster (Turino) on the Bayesian nonparametric side of sqMC and by Robin Ryder (Dauphine) on the potential of sqMC for ABC.