Éric Marchand from Sherbrooke, Québec [historical birthplace of MCMC, since Adrian Smith gave his first talk on his Gibbs sampler there, in June 1989], noticed my recent posts about the approximation of e by Monte Carlo methods and sent me a paper he wrote in The Mathematical Gazette of November 1995 [full MCMC era!] about original proofs on the expectation of some stopping rules being e, like the length of increasing runs. And Gnedenko’s uniform summation until exceeding one. Amazing that this simple problem generated so much investigation!!!
Archive for Royal Statistical Society
While in Warwick this week, I borrowed a recent issue (Oct. 08, 2015) of Nature from Tom Nichols and read it over diners in a maths house. Its featured topic was reproducibility, with a long initial (or introductory) article about “Fooling ourselves”, starting with an illustration from Andrew himself who had gotten a sign wrong in one of those election studies that are the basis of Red State, Blue State. While this article is not bringing radically new perspectives on the topic, there is nothing shocking about it and it even goes on mentioning Peter Green and his Royal Statistical Society President’s tribune about the Sally Clark case and Eric-Jan Wagenmakers with a collaboration with competing teams that sounded like “putting one’s head on a guillotine”. Which relates to a following “comment” on crowdsourcing research or data analysis.
I however got most interested by another comment by MacCoun and Perlmutter, where they advocate a systematic blinding of data to avoid conscious or unconscious biases. While I deem the idea quite interesting and connected with anonymisation techniques in data privacy, I find the presentation rather naïve in its goals (from a statistical perspective). Indeed, if we consider data produced by a scientific experiment towards the validation or invalidation of a scientific hypothesis, it usually stands on its own, with no other experiment of a similar kind to refer to. Add too much noise and only noise remains. Add too little and the original data remains visible. This means it is quite difficult to calibrate the blinding mechanisms in order for the blinded data to remain realistic enough to be analysed. Or to be different enough from the original data for different conclusions to be drawn. The authors suggest blinding being done by a software, by adding noise, bias, label switching, &tc. But I do not think this blinding can be done blindly, i.e., without a clear idea of what the possible models are, so that the perturbed datasets created out of the original data favour more one of the models under comparison. And are realistic for at least one of those models. Thus, some preliminary analysis of the original or of some pseudo-data from each of the proposed models is somewhat unavoidable to calibrate the blinding machinery towards realistic values. If designing a new model is part of the inferential goals, this may prove impossible… Again, I think having several analyses run in parallel with several perturbed datasets quite a good idea to detect the impact of some prior assumptions. But this requires statistically savvy programmers. And possibly informative prior distributions.
Today is October 20, World Statistics Day as launched by the UN. And supported by local and international societies. In connection with that day, among many events, the RSS will be hosting a reception, China will hold a seminar in… Xi’an, how appropriate!, my friend Kerrie Mengersen will give a talk at the Queensland University of Technology (QUT) on The power and promise of immersive virtual environments. (Bringing her pet crocodile to the talk, hopefully!)
And I will also give a talk in Louvain-la-Neuve, Belgium, on Le délicat dilemme des tests d’hypothèse et de leur résolution bayésienne. At ISBA, which stands for Institute of Statistics, Biostatistics and Actuarial Sciences and not for the Bayesian society!. within UCL, which stands for Université Catholique de Louvain and not for University College London! (And which is not to be confused with the Katholieke Universiteit Leuven, in Leuven, where I was last year for MCqMC. About 25 kilometers away.) In case this is not confusing enough, here are my slides (in English, while the talk will be in French):
As announced at the 60th ISI World Meeting in Rio de Janeiro, my friend, co-author, and former PhD student Judith Rousseau got the first Ethel Newbold Prize! Congrats, Judith! And well-deserved! The prize is awarded by the Bernoulli Society on the following basis
The Ethel Newbold Prize is to be awarded biannually to an outstanding statistical scientist for a body of work that represents excellence in research in mathematical statistics, and/or excellence in research that links developments in a substantive field to new advances in statistics. In any year in which the award is due, the prize will not be awarded unless the set of all nominations includes candidates from both genders.
and is funded by Wiley. I support very much this (inclusive) approach of “recognizing the importance of women in statistics”, without creating a prize restricted to women nominees (and hence exclusive). Thanks to the members of the Program Committee of the Bernoulli Society for setting that prize and to Nancy Reid in particular.
Ethel Newbold was a British statistician who worked during WWI in the Ministry of Munitions and then became a member of the newly created Medical Research Council, working on medical and industrial studies. She was the first woman to receive the Guy Medal in Silver in 1928. Just to stress that much remains to be done towards gender balance, the second and last woman to get a Guy Medal in Silver is Sylvia Richardson, in 2009… (In addition, Valerie Isham, Nicky Best, and Fiona Steele got a Guy Medal in Bronze, out of the 71 so far awarded, while no woman ever got a Guy Medal in Gold.) Funny occurrences of coincidence: Ethel May Newbold was educated at Tunbridge Wells, the place where Bayes was a minister, while Sylvia is now head of the Medical Research Council biostatistics unit in Cambridge.
As a coincidence, I received my copy of JRSS Series B with the Read Paper by Mathieu Gerber and Nicolas Chopin on sequential quasi Monte Carlo just as I was preparing an arXival of a few discussions on the paper! Among the [numerous and diverse] discussions, a few were of particular interest to me [I highlighted members of the University of Warwick and of Université Paris-Dauphine to suggest potential biases!]:
- Mike Pitt (Warwick), Murray Pollock et al. (Warwick) and Finke et al. (Warwick) all suggested combining quasi Monte Carlo with pseudomarginal Metropolis-Hastings, pMCMC (Pitt) and Rao-Bklackwellisation (Finke et al.);
- Arnaud Doucet pointed out that John Skilling had used the Hilbert (ordering) curve in a 2004 paper;
- Chris Oates, Dan Simpson and Mark Girolami (Warwick) suggested combining quasi Monte Carlo with their functional control variate idea;
- Richard Everitt wondered about the dimension barrier of d=6 and about possible slice extensions;
- Zhijian He and Art Owen pointed out simple solutions to handle a random number of uniforms (for simulating each step in sequential Monte Carlo), namely to start with quasi Monte Carlo and end up with regular Monte Carlo, in an hybrid manner;
- Hans Künsch points out the connection with systematic resampling à la Carpenter, Clifford and Fearnhead (1999) and wonders about separating the impact of quasi Monte Carlo between resampling and propagating [which vaguely links to one of my comments];
- Pierre L’Ecuyer points out a possible improvement over the Hilbert curve by a preliminary sorting;
- Frederik Lindsten and Sumeet Singh propose using ABC to extend the backward smoother to intractable cases [but still with a fixed number of uniforms to use at each step], as well as Mateu and Ryder (Paris-Dauphine) for a more general class of intractable models;
- Omiros Papaspiliopoulos wonders at the possibility of a quasi Markov chain with “low discrepancy paths”;
- Daniel Rudolf suggest linking the error rate of sequential quasi Monte Carlo with the bounds of Vapnik and Ĉervonenkis (1977).
The arXiv document also includes the discussions by Julyan Arbel and Igor Prünster (Turino) on the Bayesian nonparametric side of sqMC and by Robin Ryder (Dauphine) on the potential of sqMC for ABC.
Cristiano Varin, Manuela Cattelan and David Firth (Warwick) have written a paper on the statistical analysis of citations and index factors, paper that is going to be Read at the Royal Statistical Society next May the 13th. And hence is completely open to contributed discussions. Now, I have written several entries on the ‘Og about the limited trust I set to citation indicators, as well as about the abuse made of those. However I do not think I will contribute to the discussion as my reservations are about the whole bibliometrics excesses and not about the methodology used in the paper.
The paper builds several models on the citation data provided by the “Web of Science” compiled by Thompson Reuters. The focus is on 47 Statistics journals, with a citation horizon of ten years, which is much more reasonable than the two years in the regular impact factor. A first feature of interest in the descriptive analysis of the data is that all journals have a majority of citations from and to journals outside statistics or at least outside the list. Which I find quite surprising. The authors also build a cluster based on the exchange of citations, resulting in rather predictable clusters, even though JCGS and Statistics and Computing escape the computational cluster to end up in theory and methods along Annals of Statistics and JRSS Series B.
In addition to the unsavoury impact factor, a ranking method discussed in the paper is the eigenfactor score that starts with a Markov exploration of articles by going at random to one of the papers in the reference list and so on. (Which shares drawbacks with the impact factor, e.g., in that it does not account for the good or bad reason the paper is cited.) Most methods produce the Big Four at the top, with Series B ranked #1, and Communications in Statistics A and B at the bottom, along with Journal of Applied Statistics. Again, rather anticlimactic.
The major modelling input is based on Stephen Stigler’s model, a generalised linear model on the log-odds of cross citations. The Big Four once again receive high scores, with Series B still much ahead. (The authors later question the bias due to the Read Paper effect, but cannot easily evaluate this impact. While some Read Papers like Spiegelhalter et al. 2002 DIC do generate enormous citation traffic, to the point of getting re-read!, other journals also contain discussion papers. And are free to include an on-line contributed discussion section if they wish.) Using an extra ranking lasso step does not change things.
In order to check the relevance of such rankings, the authors also look at the connection with the conclusions of the (UK) 2008 Research Assessment Exercise. They conclude that the normalised eigenfactor score and Stigler model are more correlated with the RAE ranking than the other indicators. Which means either that the scores are good predictors or that the RAE panel relied too heavily on bibliometrics! The more global conclusion is that clusters of journals or researchers have very close indicators, hence that ranking should be conducted with more caution that it is currently. And, more importantly, that reverting the indices from journals to researchers has no validation and little information.
As posted a few days ago, Mathieu Gerber and Nicolas Chopin will read this afternoon a Paper to the Royal Statistical Society on their sequential quasi-Monte Carlo sampling paper. Here are some comments on the paper that are preliminaries to my written discussion (to be sent before the slightly awkward deadline of Jan 2, 2015).
Quasi-Monte Carlo methods are definitely not popular within the (mainstream) statistical community, despite regular attempts by respected researchers like Art Owen and Pierre L’Écuyer to induce more use of those methods. It is thus to be hoped that the current attempt will be more successful, it being Read to the Royal Statistical Society being a major step towards a wide diffusion. I am looking forward to the collection of discussions that will result from the incoming afternoon (and bemoan once again having to miss it!).
“It is also the resampling step that makes the introduction of QMC into SMC sampling non-trivial.” (p.3)
At a mathematical level, the fact that randomised low discrepancy sequences produce both unbiased estimators and error rates of order
means that randomised quasi-Monte Carlo methods should always be used, instead of regular Monte Carlo methods! So why is it not always used?! The difficulty stands [I think] in expressing the Monte Carlo estimators in terms of a deterministic function of a fixed number of uniforms (and possibly of past simulated values). At least this is why I never attempted at crossing the Rubicon into the quasi-Monte Carlo realm… And maybe also why the step had to appear in connection with particle filters, which can be seen as dynamic importance sampling methods and hence enjoy a local iid-ness that relates better to quasi-Monte Carlo integrators than single-chain MCMC algorithms. For instance, each resampling step in a particle filter consists in a repeated multinomial generation, hence should have been turned into quasi-Monte Carlo ages ago. (However, rather than the basic solution drafted in Table 2, lower variance solutions like systematic and residual sampling have been proposed in the particle literature and I wonder if any of these is a special form of quasi-Monte Carlo.) In the present setting, the authors move further and apply quasi-Monte Carlo to the particles themselves. However, they still assume the deterministic transform
which the q-block on which I stumbled each time I contemplated quasi-Monte Carlo… So the fundamental difficulty with the whole proposal is that the generation from the Markov proposal
has to be of the above form. Is the strength of this assumption discussed anywhere in the paper? All baseline distributions there are normal. And in the case it does not easily apply, what would the gain bw in only using the second step (i.e., quasi-Monte Carlo-ing the multinomial simulation from the empirical cdf)? In a sequential setting with unknown parameters θ, the transform is modified each time θ is modified and I wonder at the impact on computing cost if the inverse cdf is not available analytically. And I presume simulating the θ’s cannot benefit from quasi-Monte Carlo improvements.
The paper obviously cannot get into every detail, obviously, but I would also welcome indications on the cost of deriving the Hilbert curve, in particular in connection with the dimension d as it has to separate all of the N particles, and on the stopping rule on m that means only Hm is used.
Another question stands with the multiplicity of low discrepancy sequences and their impact on the overall convergence. If Art Owen’s (1997) nested scrambling leads to the best rate, as implied by Theorem 7, why should we ever consider another choice?
In connection with Lemma 1 and the sequential quasi-Monte Carlo approximation of the evidence, I wonder at any possible Rao-Blackwellisation using all proposed moves rather than only those accepted. I mean, from a quasi-Monte Carlo viewpoint, is Rao-Blackwellisation easier and is it of any significant interest?
What are the computing costs and gains for forward and backward sampling? They are not discussed there. I also fail to understand the trick at the end of 4.2.1, using SQMC on a single vector instead of (t+1) of them. Again assuming inverse cdfs are available? Any connection with the Polson et al.’s particle learning literature?
Last questions: what is the (learning) effort for lazy me to move to SQMC? Any hope of stepping outside particle filtering?