As mentioned here a few days ago, I have been revising my paper on the Jeffreys-Lindley’s paradox paper for Philosophy of Science. It came as a bit of a (very pleasant) surprise that this journal was ready to consider a revised version of the paper given that I have no formal training in philosophy and that the (first version of the) paper was rather hurriedly made of a short text written for the 95th birthday of Dennis Lindley and of my blog post on Aris Spanos’ “Who should be afraid of the Jeffreys-Lindley paradox?“, recently published in Philosophy of Science. So I found both reviewers very supportive and I am grateful for their suggestions to improve both the scope and the presentation of the paper. It has been resubmitted and rearXived, and I am now waiting for the decision of the editorial team with the appropriate philosophical sense of detachment…
Archive for paradoxes
On the last/my day of the ISBA meeting in Varanasi, I attended a few talks before being kindly driven to the airport (early, too early, but with the unpredictable traffic there, it was better to err on the cautionary side!). In the dynamical model session, Simon Wilson presented a way to approximate posteriors for HMMs based on Chib’s (or Bayes’!) formula, while Jonathan Stroud exposed another approach to state-space model approximation involving a move of the state parameter based on a normal approximation of its conditional given the observable, approximation which seemed acceptable for the cloud analysis model he was processing. Nicolas Chopin then gave a quick introduction to particle MCMC, all the way to SMC². (As a stern chairmain of the session, I know Nicolas felt he did not have enough time but he did a really good job of motivating those different methods, in particular in explaining why the auxiliary variable approach makes the unbiased estimator of the likelihood a valid MCMC method.) Peter Green’s plenary talk was about a emission tomography image analysis whose statistical processing turned into a complex (Bernstein-von Mises) convergence theorem (whose preliminary version I saw in Bristol during Natalia Bochkina’s talk).
Overall, as forewarned by and expected from the program, this ISBA meeting was of the highest scientific quality. (I only wish I had had hindi god abilities to duplicate and attend several parallel sessions at the same time!) Besides, much besides!, the wamr attention paid to everyone by the organisers was just simply un-be-lie-vable! The cultural program went in par with the scientific program. The numerous graduate students and faculty involved in the workshop organisation had a minute knowledge of our schedules and locations, and were constantly anticipating our needs and moves. Almost to a fault, i.e. to a point that was close to embarassing for our cultural habits. I am therefore immensely grateful [personally and as former ISBA president] to all those people that contributed to the success of this ISBA meeting and first and foremost to Professor Satyanshu Upadhyay who worked relentlessly towards this goal during many months! (As a conference organiser, I realise I was and am simply unable to provide this level of welcome to the participants, even for much smaller meetings… The contrast with my previous conference in Berlin could not be more extreme as, for a much higher registration fee, the return was very, very limited.) I will forever (at least until my next reincarnation!) keep the memory of this meeting as a very special one, quite besides giving me the opportunity of my first visit to India…
A second full day at the ISBA meeting in Varanasi: I attended a non-parametric session with Sonia Petrone talking about mixtures of regressions (more precisely, piecewise linear functions) and Ramses Mena defining stationary processes via a Gibbs-like construction (which I would have liked to have more time to fully understand). Then Jamie Robbins gave a talk related to the paradox raised by Robbins and Ritov and discussed recently by Chris Sims. (Jamie asked for my opinion at the end of the talk, but I had none, considering the problem to be more of an epiphenomenon than a genuine statistical difficulty… I may comment more on this question later, almost feel compelled to by Jamie’s interpelation, but I had not much to say at this stage! It sounds like another of those infinite dimensional problems where the Bayesian solution can get stranded.) I then attended Murray Aitkin’s talk, where he reanalysed the Berkof et al. (2003) dataset using his integrated likelihood. The afternoon was a succession of plenary talks by Susie Bayarri, Fabrizio Ruggeri and Peter Muller. (It could have been called the afternoon of the ISBA past-presidents, as I also talked in this series!) Susie introduced a new notion of effective sample size, call TESS, not in the importance sampling sense of independent-sample-equivalent used in simulation, but in the model comparison sense of information criterion penalising and prior scaling factor. This was the first time I heard about this notion and I found it definitely worth pursuing, in particular in search of a connection with the g-prior. (Nice name too!, connecting to a great book with a quote from Hardy about Tess being the victim of her beauty…) The day ended with a group excursion on boats up the Ganges for attending the sunset (Ganga Aarti, आरती) ceremony at Dasaswamedh Ghat, a ceremony that remained rather esoteric [for me] without the proper explanation.
(I received the following set of comments from Mark Chang after publishing a review of his book on the ‘Og. Here they are, verbatim, except for a few editing and spelling changes. It’s a huge post as Chang reproduces all of my comments as well.)
Professor Christian Robert reviewed my book: “Paradoxes in Scientific Inference”. I found that the majority of his criticisms had no foundation and were based on his truncated way of reading. I gave point-by-point responses below. For clarity, I kept his original comments.
Robert’s Comments: This CRC Press book was sent to me for review in CHANCE: Paradoxes in Scientific Inference is written by Mark Chang, vice-president of AMAG Pharmaceuticals. The topic of scientific paradoxes is one of my primary interests and I have learned a lot by looking at Lindley-Jeffreys and Savage-Dickey paradoxes. However, I did not find a renewed sense of excitement when reading the book. The very first (and maybe the best!) paradox with Paradoxes in Scientific Inference is that it is a book from the future! Indeed, its copyright year is 2013 (!), although I got it a few months ago. (Not mentioning here the cover mimicking Escher’s “paradoxical” pictures with dices. A sculpture due to Shigeo Fukuda and apparently not quoted in the book. As I do not want to get into another dice cover polemic, I will abstain from further comments!)
Thank you, Robert for reading and commenting on part of my book. I had the same question on the copyright year being 2013 when it was actually published in previous year. I believe the same thing had happened to my other books too. The incorrect year causes confusion for future citations. The cover was designed by the publisher. They gave me few options and I picked the one with dices. I was told that the publisher has the copyright for the art work. I am not aware of the original artist. Continue reading
As mentioned in my review of Paradoxes in Scientific Inference I was a bit confused by this presentation of the likelihood principle and this led me to ponder for a week or so whether or not there was an issue with Birnbaum’s proof (or, much more likely, with my vision of it!). After reading again Birnbaum’s proof, while sitting down in a quiet room at ICERM for a little while, I do not see any reason to doubt it. (Keep reading at your own risk!)
My confusion was caused by mixing sufficiency in the sense of Birnbaum’s mixed experiment with sufficiency in the sense of our ABC model choice PNAS paper, namely that sufficient statistics are not always sufficient to select the right model. The sufficient statistics in the proof reduces the (2,x2) observation from Model 2 to (1,x1) from Model 1 when there is an observation x1 that produces a likelihood proportional to the likelihood for x2 and the statistic is indeed sufficient: the distribution of (2,x2) given (1,x1) does not depend on the parameter θ. Of course, the statistic is not sufficient (most of the time) for deciding between Model 1 and Model 2, but this model choice issue is foreign to Birnbaum’s construction.
This CRC Press book was sent to me for review in CHANCE: Paradoxes in Scientific Inference is written by Mark Chang, vice-president of AMAG Pharmaceuticals. The topic of scientific paradoxes is one of my primary interests and I have learned a lot by looking at Lindley-Jeffreys and Savage-Dickey paradoxes. However, I did not find a renewed sense of excitement when reading the book. The very first (and maybe the best!) paradox with Paradoxes in Scientific Inference is that it is a book from the future! Indeed, its copyright year is 2013 (!), although I got it a few months ago. (Not mentioning here the cover mimicking Escher’s “paradoxical” pictures with dices. A sculpture due to Shigeo Fukuda and apparently not quoted in the book. As I do not want to get into another dice cover polemic, I will abstain from further comments!)
Now, getting into a deeper level of criticism (!), I find the book very uneven and overall quite disappointing. (Even missing in its statistical foundations.) Esp. given my initial level of excitement about the topic!
First, there is a tendency to turn everything into a paradox: obviously, when writing a book about paradoxes, everything looks like a paradox! This means bringing into the picture every paradox known to man and then some, i.e., things that are either un-paradoxical (e.g., Gödel’s incompleteness result) or uninteresting in a scientific book (e.g., the birthday paradox, which may be surprising but is far from a paradox!). Fermat’s theorem is also quoted as a paradox, even though there is nothing in the text indicating in which sense it is a paradox. (Or is it because it is simple to express, hard to prove?!) Similarly, Brownian motion is considered a paradox, as “reconcil[ing] the paradox between two of the greatest theories of physics (…): thermodynamics and the kinetic theory of gases” (p.51) For instance, the author considers the MLE being biased to be a paradox (p.117), while omitting the much more substantial “paradox” of the non-existence of unbiased estimators of most parameters—which simply means unbiasedness is irrelevant. Or the other even more puzzling “paradox” that the secondary MLE derived from the likelihood associated with the distribution of a primary MLE may differ from the primary. (My favourite!)
“When the null hypothesis is rejected, the p-value is the probability of the type I error.” Paradoxes in Scientific Inference (p.105)
“The p-value is the conditional probability given H0.” Paradoxes in Scientific Inference (p.106)
Second, the depth of the statistical analysis in the book is often found missing. For instance, Simpson’s paradox is not analysed from a statistical perspective, only reported as a fact. Sticking to statistics, take for instance the discussion of Lindley’s paradox. The author seems to think that the problem is with the different conclusions produced by the frequentist, likelihood, and Bayesian analyses (p.122). This is completely wrong: Lindley’s (or Lindley-Jeffreys‘s) paradox is about the lack of significance of Bayes factors based on improper priors. Similarly, when the likelihood ratio test is introduced, the reference threshold is given as equal to 1 and no mention is later made of compensating for different degrees of freedom/against over-fitting. The discussion about p-values is equally garbled, witness the above quote which (a) conditions upon the rejection and (b) ignores the dependence of the p-value on a realized random variable. Continue reading
The second day of our workshop on computational statistics at the ICMS started with a terrific talk by Xiao-Li Meng. Although this talk related with his Inception talk in Paris last summer, and of the JCGS discussion paper, he brought new geometric aspects to the phenomenon (managing a zero correlation and hence i.i.d.-ness in the simulation of a Gaussian random effect posterior distribution). While I was reflecting about the difficulty to extend the perspective beyond normal models, he introduced a probit example where exact null correlation cannot be found but an adaptive scheme allows to explore the range of correlation coefficients. This made me somehow think of a possible version in this approach in a tempering perspective, where different data augmentation schemes would be merged into an “optimal” geometric mixture, rather than via interweaving.
As an aside, Xiao-Li mentioned the idea of Bayesian sufficiency and Bayesian ancilarity in the construction of his data augmentation schemes. He then concluded that sufficiency is identical in classical and Bayesian approaches, while ancilarity could be defined in several ways. I have already posted on that, but it seems to me that sufficiency is a weaker notion in the Bayesian perspective in the sense that all that matters is that the posterior is the same given the observation y and given the observed statistics, rather than uniformly over all possible values of the random variable Y as in the classical sense. As for ancilarity, it is also natural to consider that an ancillary statistics does not bring information on the parameter, i.e. that the prior and the posterior distributions are the same given the observed ancillary statistics. Going further to define ancilarity as posterior independence between “true” parameters and auxiliary variables, as Xiao-Li suggested, does not seem very sound as it leads to the paradoxes Basu liked so much!
Today, the overlap with the previous meetings in Bristol and in Banff was again limited: Arnaud Doucet rewrote his talk towards less technicity, which means I got the idea much more clearly than last week. The idea of having a sequence of pseudo-parameters with the same pseudo-prior seems to open a wide range of possible adaptive schemes. Faming Liang also gave a talk fairly similar to the one he presented in Banff. And David van Dyk as well, which led me to think anew about collapsed Gibbs samplers in connection with ABC and a project I just started here in Edinburgh.
Otherwise, the intense schedule of the day saw us through eleven talks. Daniele Impartato called for distributions (in the physics or Laurent Schwarz’ meaning of the term!) to decrease the variance of Monte Carlo estimations, an approach I hope to look further as Schwarz’ book is the first math book I ever bought!, an investment I tried to capitalize once in writing a paper mixing James-Stein estimation and distributions for generalised integration by part, paper that was repeatedly rejected until I gave up! Jim Griffin showed us improvements brought in the exploration of large number of potential covariates in linear and generalised linear models. Natesh Pillai tried to drag us through several of his papers on covariance matrix estimation, although I fear he lost me along the way! Let me perversely blame the schedule (rather than an early rise to run around Arthur’s Seat!) for falling asleep during Alex Beskos’ talk on Hamiltonian MCMC for diffusions, even though I was looking forward this talk. (Apologies to Alex!) Then Simon Byrne gave us a quick tour of differential geometry in connection with orthogonalization for Hamiltonian MCMC. Which brought me back very briefly to this early time I was still considering starting a PhD in differential geometry and then even more briefly played with the idea of mixing differential geometry and statistics à la Shun’ichi Amari…. Ian Murray and Simo Sarkka completed the day with a cartoonesque talk on latent Gaussians that connected well with Xiao-Li’s and a talk on Gaussian approximations to diffusions with unknown parameters, which kept within the main theme of the conference, namely inference on partly observed diffusions.
As written above, this was too intense a day, with hardly any free time to discuss about the talks or the ongoing projects, which makes me prefer the pace adopted in Bristol or in Banff. Having to meet a local student on leave from Dauphine for a year here did not help of course!)