## paradoxes in scientific inference: a reply from the author

*(I received the following set of comments from Mark Chang after publishing a review of his book on the ‘Og. Here they are, verbatim, except for a few editing and spelling changes. It’s a huge post as Chang reproduces all of my comments as well.)*

Professor Christian Robert reviewed my book: “*Paradoxes in Scientific Inference*”. I found that the majority of his criticisms had no foundation and were based on his truncated way of reading. I gave point-by-point responses below. For clarity, I kept his original comments.

Robert’s Comments: This CRC Press book was sent to me for review in CHANCE:Paradoxes in Scientific Inferenceis written by Mark Chang, vice-president of AMAG Pharmaceuticals. The topic of scientific paradoxes is one of my primary interests and I have learned a lot by looking at Lindley-Jeffreys and Savage-Dickey paradoxes. However, I did not find a renewed sense of excitement when reading the book. The very first (and maybe the best!) paradox withParadoxes in Scientific Inferenceis that it is a book from the future! Indeed, its copyright year is 2013 (!), although I got it a few months ago. (Not mentioning here the cover mimicking Escher’s “paradoxical” pictures with dices. A sculpture due to Shigeo Fukuda and apparently not quoted in the book. As I do not want to get into another dice cover polemic, I will abstain from further comments!)

Thank you, Robert for reading and commenting on part of my book. I had the same question on the copyright year being 2013 when it was actually published in previous year. I believe the same thing had happened to my other books too. The incorrect year causes confusion for future citations. The cover was designed by the publisher. They gave me few options and I picked the one with dices. I was told that the publisher has the copyright for the art work. I am not aware of the original artist.

Robert’s Comments: Now, getting into a deeper level of criticism (!), I find the book very uneven and overall quite disappointing. (Even missing in its statistical foundations.) Esp. given my initial level of excitement about the topic!

The book is intended for broad audience beyond statisticians which also includes general scientists. I have noticed the unevenness of the presentation despite my huge and sincere effort. One of the manuscript reviewers who had the experience on writing such type of a book (paradox) shared the same views. The nature of paradoxes concerns several subject fields and several layers of difficulties.

Re: Your comment – “Even missing in its statistical foundations”? – Unfortunately, I have found many of the comments are simple misinterpretations of what said in the book. Though, the comments appear to be very specific I am afraid that it is coming from a truncated way of reading and lacks in depth understanding.

Robert’s Comments: First, there is a tendency to turn everything into a paradox: obviously, when writing a book aboutparadoxes, everything looks like a paradox! This means bringing into the picture every paradox known to man and then some, i.e., things that are either un-paradoxical (e.g., Gödel’s incompleteness result) or uninteresting in a scientific book (e.g., the birthday paradox, which may be surprising but is far from a paradox!).Fermat’s theorem is also quoted as a paradox, even though there is nothing in the text indicating in which sense it is a paradox. (Or is it because it is simple to express, hard to prove?!) Similarly, Brownian motion is considered a paradox, as “reconcil[ing] the paradox between two of the greatest theories of physics (…): thermodynamics and the kinetic theory of gases” (p.51) For instance, the author considers the MLE being biased to be a paradox (p.117), while omitting the much more substantial “paradox” of the non-existence of unbiased estimators of most parameters—which simply means unbiasedness is irrelevant. Or the other even more puzzling “paradox” that the secondary MLE derived from the likelihood associated with the distribution of a primary MLE may differ from the primary. (My favourite!)

As defined in the first paragraph of the preface: Paradox is…. or something counter-intuitive. Some paradoxes are more attractive to certain readers but not others and vice versa. The paradoxes are carefully selected to balance between different readers. A large number of paradoxes were screened out. For instance, majority of paradoxes in math, biology, physics, and chemistry were not included. More than 50% paradoxes in the book “Paradoxes from A to Z” were not included (about 85 in the total), For statistics, the majority of paradoxes in “Paradoxes in Probability Theory and Mathematical Statistics”, which involve statistical details/formulations, were not included. For the same reason, I only took a few examples from “Counterexamples in Probability”. On the other hand, I included Fermat’s theorem because it was a conjecture and so counter-intuitive that people tried for centuries in vain to prove/ disprove it. Nevertheless, it is one of the most non-paradoxical (specially, after it becomes a theorem) ones in the book. Robert misread (again a superficial interpretation due to cursory reading !!) why Brownian motion is a paradox. I didn’t say it is called paradox because of “reconciling the paradox between…” In fact, in the next paragraph on p.50-51, it is clearly stated: “Brownian motion has some fantastic paradoxical properties, and here are three:” (1) it is everywhere continuous but nowhere differentiable – I cannot image how to draw such a curve manually, (2) it is a one-dimensional and also two-dimensional motion, and (3) it has fractal self-similarity property.

Everyone is so familiar with biasedness of some MLE. But it is so paradoxical (counter-intuitive) to me (even if there is only one biased MLE in the world) because: How could something real in the world most likely (maximum likelihood) occur in a biased way? How can thing occur most likely not in its true way? How can a truth be biased from the truth, even if there is only one such thing? The question is not about how many MLEs are biased!

Robert’s Comments: “When the null hypothesis is rejected, the p-value is the probability of the type I error.”Paradoxes in Scientific Inference(p.105)

“The p-value is the conditional probability given H.”Paradoxes in Scientific Inference(p.106)

Second, the depth of the statistical analysis in the book is often found missing. For instance, Simpson’s paradox is not analysed from a statistical perspective, only reported as a fact. Sticking to statistics, take for instance the discussion of Lindley’s paradox. The author seems to think that the problem is with the different conclusions produced by the frequentist, likelihood, and Bayesian analyses (p.122). This is completely wrong: Lindley’s (or Lindley-Jeffreys‘s) paradox is about the lack of significance of Bayes factors based on improper priors. Similarly, when the likelihood ratio test is introduced, the reference threshold is given as equal to 1 and no mention is later made of compensating for different degrees of freedom/against over-fitting. The discussion about p-values is equally garbled, witness the above quote which (a) conditions upon the rejection and (b) ignores the dependence of the p-value on a realized random variable.

It is true that the mathematical/statistical details were reduced to minimum as I stated in the preface: “Although I have tried to keep mathematical formulations minimal, I have not totally eliminated them, so as to avoid mathematical anxiety that might result from either approach”. Apparently I cannot avoid but cause such an anxiety !!

There are two different definitions of Lindley’s paradox people talk about: the first one (the one I use) is about the conclusions from frequentists and Bayesianists, and second one is about lack of significance of Bayes factors based on improper priors (Robert’s definition). The first definition is more popular than the second one in my opinion. Let me just list a few sources for the first definition: Glenn Shafer (professor from Stanford University), Journal of the American Statistical Association Vol. 77, No. 378 (Jun., 1982), pp. 325-334, Green and Elgersma (2010) cited on p.124 of the book, and Wikipedia.com. I also searched on google.com for the definition, over 30 initial links used the first definition that I have used, however none of them used the second one (Robert’s definition).

Regarding p-value, I did mention (on p.105 of the book) the dependence on a realization of random variable: “… p-value, which is defined as the (least upper bound of the) probability of getting data the same as or more extreme than the observed data when the null hypothesis Ho is true.” Again, Robert read this in a truncated way. The actual text reads: “…we can measure the strength of the evidence against Ho using the p-value, which is defined as the (least upper bound of the) probability of getting data the same as or more extreme than the observed data when the null hypothesis Ho is true. The p-value will be compared with a nominal threshold α (e.g., 0.05) to determine if the null hypothesis should be rejected. When the null hypothesis is rejected, the p-value is the probability of the type I error (or type I error rate).” My intention is to emphasize (1) the “true” false positive (type-I) error can only happen when the null hypothesis is true and rejected; (2) p-value can be associated with “true” type-I error rate if we reject the null hypothesis based on the p-value calculated from the observed data ; x (the probability of getting data the same or more extreme than this x when the null hypothesis is true and the experiment is repeated); that is, the false rejection rate (Type-I error rate) is equal to the p-value. Such an interpretation is important in practice because when a clinical trial is done and the null hypothesis is rejected, we are often asked by physicians and alike: what does the p-value (e.g., 0.001) mean in terms of type-I error rate? What is the difference in terms of type-I error between e.g., p-value 0.001 and 0.01? On the other hand, if the p-value is larger (e.g., 0.4) and the null hypothesis is not rejected, the type-I error is not a concern (does not actually occur). I have tried hard in the book to make it is easy to non-statistical readers and accurate enough for statistically sophisticated readers too. Apparently, what I have done is not enough. Especially, some of the phrases when taken out of the context it can be misleading.

Robert’s Comments: “The peaks of the likelihood function indicate (on average) something other than the distribution associated with the drawn sample. As such, how can we say the likelihood is evidence supporting the distribution?”Paradoxes in Scientific Inference(p.119)

The chapter on statistical controversies actually focus on the opposition between frequentist, likelihood, and Bayesian paradigms. The author seems to have studied Mayo and Spanos’Error and Inferenceto great lengths. (As I did, as I did!) He spends around twenty pages in Chapter 3 on this opposition and on the conditionality, sufficiency, and likelihood principles that were reunited by Birnbaum and recently deconstructed by Mayo. In my opinion, Chang makes a mess of describing the issues at stake in this debate and leaves the reader more bemused at the end than at the beginning of the chapter. For instance, the conditionality principle is confused with the p-value being computed conditional on the null (hypothesis) model (p.110). Or the selected experiment being unknown (p.110). The likelihood function is considered as a sufficient statistic (p.137). The “paradox” of an absence of non-trivial sufficient statistics in all models but exponential families (the Pitman-Koopman lemma) is not mentioned. The fact that ancillary statistics bring information about the precision of a sufficient statistic is presented as a paradox (p.112). Having the same physical parameter θ is confused with having the same probability distribution indexed by θ, which is definitely not the same thing (p.115)! The likelihood principle is confused with the likelihood ratio test (p.117) and with the maximum likelihood estimation (witness the above quote). The dismissal of Mayo’s rejection of Birnbaum’s proof—a rejection I fail to understand—is not any clearer: “her statement about the sufficient statistic under a mixed distribution (a fixed distribution) is irrelevant” (p.138). This actually made me think of another interpretation of Mayo’s argument that could prove her right! More on that in another post.

The criticism about my confusion about the likelihood principle with the p-value is again simply wrong. Think, for example, the paradox of conditionality principle on p.110 is in the mixed experiment setting as late consistently described in detail on p.137 about the Birnhaum experiment (which Robert is familiar with). The different experiments E’ and E’’ indeed can be corresponding to the Ho and Ha, respectively. This is a philosophical point and needs clarification: different values of a physical parameter always associate different experiments, at least for frequentist (parameter is considered a fixed value for a frequentist). For a Bayesian scientist, a parameter (indexed by θ) that is studied with an experiment can have distribution . I would call such parameter θ is a representation of physical parameter in the knowledge space or simply the experimenter’s knowledge of the corresponding physical parameter. Because it is knowledge of a parameter, it incorporates explicitly or implicitly knowledge from other things (e.g., results from experiments with different populations/subjects) and in this sense, it is sourced from different physical parameters. Such knowledge pooling is covered by what I call causal space in the book – when we make any statement with probability, the statement can be viewed as a statement about an aggregative property of a group of similar things in a causal space.

The phrase “… since the likelihood itself is a sufficient statistic” on p.137 was obviously unintentionally left there. I apologize for the error and the inconvenience caused to the reader(s). Thanks for pointing it out.

There is no confusion about same physical parameter θ and having the same probability distribution indexed by θ. In fact, the paradox is the discussion of the controversies about the same distribution indexed by θ in the likelihood principle. Let me make some further clarification: The meaning of the same probability distribution indexed by θ is not well defined because we consider virtually any two different distributions as the same distribution indexed by θ. For instance, suppose f(θ) and g(θ) are two distributions (hypothesis testing, Ho: θ=0 and Ha: θ=1), we can say f(θ) and g(θ) are the same distribution F(θ) indexed by θ, where F(θ) = θ f(θ)+(1- θ)g(θ)+ θ(1- θ)k(θ), where k(θ) is an appropriate random function. The example given on p.115-116 of the book involves binomial distribution f(θ) and negative binomial distribution g(θ). Furthermore, the θ does need to be a related physical parameter when we use the likelihood principle. Otherwise, we can substrate an arbitrary constant from θ in g(θ), so that the new F(θ) = θ f(θ)+(1- θ)g(θ+c)+ θ(1- θ)k(θ), in which the index θ in f(θ) and g(θ+c) could have completely different meaning.

I don’t know where the reviewer got the impression that I was confused likelihood ratio test with likelihood principle. I did not say or imply anywhere that MLE or likelihood ratio test is a consequence of likelihood principle. I have explicitly written under separate subsections: Section 3.2.3 Likelihood principle (p.113) and 3.2.4 Law of likelihood (p.117). The reviewer may have confused the likelihood principle with the law of likelihood, which are two totally distinct things. I used the law of likelihood (not likelihood principle!) to introduce the likelihood ratio test. In the paradox of likelihood principle (p.117) and the paradox of law of likelihood (p.117), I have challenged (at least in some cases) the popular interpretation that a likelihood function represents the relative plausibility and use biased MLEs as the discussion point to further challenge the likelihood principle (not in the way most frequentists did from type-I point of view!). I did not use MLE directly to challenge the likelihood principle since they are irrelevant in a sense, but use the biasedness of some MLE to raise the controversies. Specifically, in the text on p.117 and 119, the paradox of likelihood principle and paradox of law of likelihood raise the controversy: The peak of a likelihood function (corresponding MLE) does not associate with unbiased estimate for biased estimators, thus the likelihood function does not appear to be (at least in the cases of biased estimators) the relative plausibility of supporting different values of the parameter, which in turns, the likelihood principle (which concerns the ratio of the likelihood functions – not likelihood test!) is questionable.

It is really disappointing to know that the reviewer thought that I was confused about such basic, fundamental and simple concepts of MLE, likelihood ratio test and likelihood principle.

Robert’s Comments: “From a single observation x from a normal distribution with unknown mean μ and standard deviation σ it is possible to create a confidence interval on μ with finite length.”Paradoxes in Scientific Inference(p.103)

One of the first paradoxes in the statistics chapter is the one endorsed by the above quote. I found it intriguing that this interval could be of the form x±η|x| with η only depending on the confidence coverage… Then I checked and saw that the confidence coverage was defined by default, i.e., the actual coverage is at least the nominal coverage, which is much less exciting (and much less paradoxical).

“One of the proudest accomplishments of my childhood was creating an electric bell, though later I found it was just a reinvention. Other reinventions I remember are discovering some of the interesting properties of the number 9 and the solution for a general quadratic equation.“Paradoxes in Scientific Inference(p.24)

The book abounds in quotes like the above, where the author does not shy away from promoting himself. For instance, on page 2, he adds his own quotes to a list of aphorisms from major figures like Montaigne, Lao-Tzu, or Picasso. Take also the gem “I will feel so rewarded if this book can help a young reader in some way to become a thinker” (p.viii) The author further claims several times to bring a unification of the frequentist and Bayesian perspectives, even though I fail to see how he did it. E.g., “whether frequentist or Bayesian, concepts of probability are based on the collection of similar phenomena or experiments” (p.63) does not bring a particularly clear answer. Similarly, the murky discussion of the Monty Hall dilemma does not characterise the distinction between frequentist and Bayesian reasoning (if anything, this is a frequentist setting). A last illustration is the ‘paradox of posterior distributions’ (p.124) where Cheng got it plain wrong about the sequential update of a prior distribution not being equal to the final posterior (see, e.g., Section 1.4inThe Bayesian Choice). A nice quote is recycled from my book though (a completely irrelevant anecdote is that George Casella actually hated this quote!):

“If you believe anything happens (…) for a reason, then samples may never be independent, else there would be no randomness. Just as T. Hilberman [sic] put it (Robert 1994): “From where we stand, the rain seems random. If we could stand somewhere else, we would see the order in it.”Paradoxes in Scientific Inference(p.140)

I think this criticism is totally out of context and uncalled for. I feel a little bit harsh since this criticism is at a personal level. I admit that I do some self-promoting as many people do. Publishing a book is considered a self-promoting. Sometimes, I announce my upcoming books at conferences. That’s self-promoting!! But I don’t see how the text on p.24 (Robert quoted above) can be self-promoting. During the 10-year “Cultural Revolution” in China, anyone who dared to do any research should proud of himself/herself because it was against the social norm or authority and could be punished (there was no law at that time). How can someone who reinvented something as simple as an electric bell and a solution to quadratic equations that were invented or solved hundreds/ thousands of years ago is self-promoting? If you read the sentence that followed on the same paragraph, you would know what I am really proud of: “…However, I am still proud of myself even knowing they are just reinventions because I did them all in the “dark age” of China: the 10 years of the Chinese Cultural Revolution.” I was just reliving the moment some 40 years ago.The quote “Help… become thinker” may or may not be considered as self-promoting in the context. I fully understand where the criticism came from. I should have phrasing it differently. Working in the industry, productivity is often over-emphasized. As a result, many young industry colleagues don’t have enough opportunities or time to think in their work and be more creative. “A statistician should be a thinker not a sample size calculator”- was my quote to them. Hence, I hope now you probably know by what I meant here by using the term “thinker”.

The quotes I put on page 2 are what I found interesting and most relevant. They are not necessarily from aphorisms from major figures, you would know if you saw the name Shaw (Marvin C. Shaw, I found the quote on the Web). I don’t know who Shaw is even after I did research again a few days ago soon after I read Robert’s comments. All I know up to now he/she has published two books about religions on Amazon. No affiliation is provided. I have put my full name (instead of just last name) under quotes to clearly identify myself. Readers would not be such naïve that they think I am at par with such great personalities and leaders just because my name appears alongside them in my own book – I would not expect my readers to be at this low level of intelligence. Quoting oneself in your own book is a funny way of writing, but not necessarily self-promoting.

Regarding the criticism about the sequential update of a prior distribution not being equal to the final posterior, it is not specific enough for me to make comments (my name is misspelled here as Cheng instead of Chang). By the way, Robert cites his book twice here, but by no means is considered as self-promoting. In fact, I read his book (1st Ed., 1994). It is a nice book and made me think (not I am a thinker yet).

Regarding the unification of different statistical paradigms, it is mainly the concept of causal space as briefly touched early. Whether I have indeed contributed anything on this or it is an overstatement, I look forward to receiving more comments from other readers.

Robert’s Comments: Most surprisingly, the book contains exercises in every chapter, whose purpose is lost on me. What is the point in asking to students “Write an essay on the role of the Barber’s Paradox in developing modern set theory” or “How does the story of Achilles and the tortoise address the issues of the sum of infinite numbers of arbitrarily small numbers”..?! Not to mention the top one: “Can you think of any applications from what you have learned from this chapter?” Erm…frankly, no!

I quite understand the criticism. The exercises were added in the last minute as one of the initial reviewers suggested. Some of exercises are interesting than others depending on the reader’s interest. Real world applications of paradoxes in the field of accounting, computer science, transportation, electronic network design, etc. are clearly presented in the book. Please be aware that this book is intended for broader audience beyond just the statisticians. The level of mathematics and statistics was considerably reduced from the initial draft to incorporate the comments from the manuscript reviewers.

Finally, I truly thank Professor Robert for reading the book and straightforward criticisms. I feel it would have been more helpful had he read more carefully and had not misinterpreted several central concepts mentioned in this book.

January 7, 2013 at 1:41 am

I’d like to make one more clarification on the likelihood principle.

Like the Lindley paradox, the strong likelihood principle (SLP) might have been read (can possibly be read) in two ways: in a restricted sense and an unrestricted sense.

(1) In the restricted sense, the parameter of interest in the two experiments concerns the same physical parameter.

(2) In the unrestricted sense, the parameter of interest concerns the statistical model parameter, which may concern the same or different physical parameters.

The restricted-sense SLP is most intuitive and is often discussed (see the familiar example on p.115 of the book). However, in this understanding, with the SLP as a “general principle”, it cannot even apply to the parameter inference for some simple cases such as the mixed experiment mentioned on p.136 of the book, because the mixed experiment concerns two different physical parameters.

If one takes SLP in the unrestricted sense (considering it a general/fundamental principle, i would prefer to read it this way), we would be confronted with the paradox raised on p.115 (the third par) in my book.

Furthermore, whether from the restricted or unrestricted SPL, I raised the controversy on the validity of SLP by pointing out the existence of a biased MLE (on p.111 and in previous discussion).

January 13, 2013 at 10:02 pm

The MLE being (almost always) biased does not make an argument against or in favour of the likelihood principle. It is a completely unrelated issue, if any. (All Bayes estimates are biased as well.)

December 27, 2012 at 7:58 pm

[…] made a brief comment on a blatant error in Mark Chang’s treatment of my Birnbaum disproof on Xi’an’s Og. Chang is responding to Christian Robert’s critical review of his book, Paradoxes in Scientific […]

December 27, 2012 at 9:08 am

I have only gotten to look at Mark Chang’s book a few days ago. I have many concerns regarding his treatment of points from Mayo and Spanos (2010), in particular the chapters by Cox and Mayo (2010) and Mayo (2010). Notably, having set out, nearly verbatim (but without quotes), my first variation of Birnbaum’s argument (Mayo 2010, 309), Chang takes, as evidence that “Mayo’s disproof is faulty”, assertions that I make only concerning the second variation of the Birnbaum argument (310-11). Chang has written (Chang, 138) the first version in detail, but obviously doesn’t understand it. The problem with the first version is that the two premises cannot both be true at the same time (the crucial term shifts its meaning in the two premises). The second formulation, by contrast, allows both premises to be true. I label the two premises of the second variation as (1) and (2)’. The problem in the second formulation is: “The antecedent of premise (1) is the denial of the antecedent of premise (2)’.”(Mayo 2010, 311). (Note the prime on (2)’. )These are both conditional claims, hence they have antecedents. Chang gives this quote, but has missed its reference. I might mention that I don’t see the relevance of Chang’s point about sufficiency to either variations of Birnbaum’s proof (bottom para, Chang 138).

A less informal and clearer treatment may be found in a recent paper: http://www.phil.vt.edu/dmayo/conference_2010/9-18-12MayoBirnbaum.pdf.

I am inviting comments for posting (some time in January) as explained at this link: http://errorstatistics.com/2012/10/31/u-phil-blogging-the-likelihood-principle-new-summary/.

I invite Chang to contribute, perhaps with a newly clarified attempt to reject my disproof of Birnbaum.

January 8, 2013 at 12:57 am

In Mayo’s paper (Mayo 2010, p.305-314), the symbol * is used to denote the result when the likelihoods of the two experiments are proportional.

My understanding is that Mayo’s disproof versions 1 and 2 are essentially based on the same contradiction: (antecedent of) premise (1) is based on the unconditional formulation and premise (2) or antecedent of premise (2)’ is based on the conditional formulation.

In my view, premise (1) does not contradict with premise (2) or (2)’ (Mayo 2010, p.309) for the following reasons:

Premise (1) says in the case of * results, the conditional and unconditional results should be the same. This does not contradict with the statement “inference should be conditional on the experiment actually performed.” In other words, premises (1) and (2) can be both based on conditional formulations, but in the case of * results, premise (1) asserts that the conditional and unconditional results should be the same.

On the other hand, did Birnbaum prove anything meaningful? My answer is “No”. With the adaptation of the conditional principle (CP) and the strong likelihood principle (SLP), there is still plenty room for one to choose a different inferential procedure. What Birnbaum did was to report the same result (TBB) when the two likelihoods were proportional (neither SP nor CP requires one to report the result this way!) – This is what SLP wants. Therefore, What Birnbaum actually did was use the SLP to prove the SLP – as simple as that!

I will be happy to log onto Mayo’s blog for further discussion later.

January 30, 2013 at 2:55 am

“Therefore, What Birnbaum actually did was use the SLP to prove the SLP – as simple as that!”

Yes, that is called a circular proof, and my arguments were intended to show that Birnbaum’s arguments fail because they are circular. Therefore, Chang was very mistaken in dismissing my argument in his book as he does! If he wants to write to me, we can converse further. I do not have his e-mail. We happen to be discussing the SLP on my blog at current.

February 7, 2013 at 2:29 am

Therefore, Birnbaum didn’t prove that CP+SP will necessarily lead to SLP. And Mayo didn’t provide a convincing disproof of Birnbaum’s “proof” in her first and second variations, even though her conclusion was correct.

To disprove Birnbaum’s “proof”, the following (very short) argument will be sufficient:

“Without violating the CP and SP, we can report the results (make inference) like the way Birnbaum did (follow the SLP) or report the results as a frequentist would do. Therefore, SLP is not a necessary result of CP+SP.”

Dr Mayo, I have provided an email through your blog. Talk to you soon via email.

December 26, 2012 at 10:16 pm

It’s tempting to reply to a reply, but I think in general the best is to trust on readers. The only thing I’d like to hear more from you is about the Lindley Paradox.

I only took one class in Bayesian Inference, and as far as I remember, the issue was not discussed. So, It’s hard for a reader like me, who took a few classes in statistics (graduate level) but not many ones, to judge on parts like that.

Maybe I can found something about it in your book? I’d have bought it earlier, but I still think the e-book is quite expensive (about US$ 45,00 on amazon for an e-book. But the marginal cost is almost zero!).

December 26, 2012 at 10:58 pm

Thanks, Manoel! The Lindley paradox has indeed two facets, one stating that the p-value and the posterior probabilities can differ to the extreme (0 versus 1, essentially). The other one, that I find more interesting, is that the Bayes factor may have no definite limit when the prior variance under the alternative goes to infinity …

December 26, 2012 at 9:18 am

I considered writing a reply to the reply, then thought better of it. I may remove the part about self-promotion in the CHANCE review, since the author feels so strongly about it, but I stand by the remainder of the review. The discussion about the mixture F(θ) in the above is a further argument (in my opinion) of the confusion entertained by the author about the likelihood principle. Readers can judge by themselves.