## informative hypotheses (book review)

**T**he title of this book *Informative Hypotheses* somehow put me off from the start: the author, Hebert Hoijtink, seems to distinguish between informative and uninformative (deformative? disinformative?) hypotheses. Namely, something like

H0: μ_{1}=μ_{2}=μ_{3}=μ_{4}

is “very informative” and unrealistic, and the alternative Ha is completely uninformative, while the “alternative null”

H1: μ_{1<}μ_{2}=μ_{3<}μ_{4}

is informative. (Hence the < signs on the cover. One of my book reviews idiosyncrasies is to find hidden meaning behind the cover design…) The idea is thus to have the researcher give some input in the construction of the null hypothesis (as if hypothesis tests usually were not about questions that mattered….).

**I**n fact, this distinction put me off so much that I only ended up reading chapters 1 (an introduction), 3 (an introduction [to the Bayesian processing of such hypotheses]) and 10 (on Bayesian foundations of testing informative hypotheses). Hence a very biased review of *Informative Hypotheses* that follows….

**G**iven an existing (but out of print?) reference like Robertson, Wright and Dykjstra (1988), that I particularly enjoyed when working on isotonic regression in the mid 90’s, I do not see much of an added value in the present book. The important references are mostly centred on works by the author and his co-authors or students (often Unpublished or In Press), which gives me the impression the book was hurriedly gathered from those papers.

“The Bayes factor (…) is default, objective, based on an appropriate quantification of complexity.” (p.197)

**T**he first chapter of *Informative Hypotheses* is a motivation for the study of those informative hypotheses, with a focus on ANOVA models. There is not much in the chapter that explains what is so special about those ordering (null) hypotheses and why a whole book is required to cover their processing. A noteworthy specificity of the approach, nonetheless, is that point null hypotheses seem to be replaced with “about equality constraints” (p.9), |μ_{2}-μ_{3}|<d, where d is specified by the researcher as significant. This chapter also gives illustrations of ordered (or informative) hypotheses in the settings of analysis of covariance (ANCOVA) and regression models, but does not indicate (yet) how to run the tests. The concluding section is about the epistemological focus of the book, quoting Popper, Sober and Carnap, although I do not see much of a support in those quotes.

“Objective means that Bayes factors based on this prior distribution are essentially independent of this prior distribution.” (p.53)

**C**hapter 3 starts the introduction to Bayesian statistics with the strange idea of calling the likelihood the “density of the data”. It is indeed the probability density of the model evaluated at the data but… it conveys a confusing meaning since it is not a density when plotted against the parameters (as in Figure 1, p. 44, where, incidentally the exact probability model is not specified). The prior distribution is defined as a normal x inverse chi-square distribution on the vector of the means (in the ANOVA model) and the common variance. Due to the classification of the variance as a nuisance parameter, the author can get away with putting an improper prior on this parameter (p.46). The normal prior is chosen to be “neutral”, i.e. to give the same prior weight to the null and the alternative hypotheses. This seems logical at some initial level, but constructing such a prior for convoluted hypotheses may simply be impossible… Because the null hypothesis has a positive mass (maybe .5) under the “unconstrained prior” (p.48), the author can also get away with projecting this prior onto the constrained space of the null hypothesis. Even when setting the prior variance to oo (p.50). The Bayes factor is then the ratio of the (posterior and prior) normalising constants over the constrained parameter space. The book still mentions the Lindley-Bartlett paradox (p.60) in the case of the about equality hypotheses. The appendix to this chapter mentions the issue of improper priors and the need for accommodating infinite mass with training samples, providing a minimum training sample solution using mixtures that sound fairly *ad hoc* to me.

“Bayes factors for the evaluation of informative hypotheses have a simple form.” (p. 193)

**C**hapter 10 is the final chapter of *Informative Hypotheses*, on “Foundations of Bayesian evaluation of informative hypotheses”, and I was expecting a more in-depth analysis of those special hypotheses, but it is mostly a repetition of what is found in Chapter 3, the wider generality being never exploited to a useful depth. There is also this gem quoted above that, because Bayes factors are the ratio of two (normalising) constants, f_{m}/c_{m}, they have a “simple form”. The reference to Carlin and Chib (1995) for computing other cases then sounds pretty obscure. (Another tiny gem is that I spotted the R software contingency spelled with three different spellings.) The book mentions the Savage-Dickey representation of the Bayes factor, but I could not spot the connection from the few lines (p.193) dedicated to this ratio. More generally, I do not find the generality of this chapter particularly convincing, most of it replicating the notions found in Chapter 3., like the use of posterior priors. The numerical approximation of Bayes factors is proposed via simulation from the unconstrained prior and posterior (p.207) then via a stepwise decomposition of the Bayes factor (p.208) and a Gibbs sampler that relies on inverse cdf sampling.

**O**verall, I feel that this book came out too early, without a proper basis and dissemination of the ideas of the author: to wit, a large number of references are connected to the author, some In Press, other Unpublished (which leads to a rather abstract *“see Hoijtink (Unpublished) for a related theorem”* (p.195)). From my incomplete reading, I did not gather a sense of novel perspective but rather of a topic that seemed too narrow for a whole book.

September 20, 2013 at 2:41 pm

It is good to see the book reviewed here. I’ve bought it myself and liked it more than you did. In particular, Hoijtink calculates the Bayes factor for order-restricted models by the ratio of prior to posterior samples consistent with the restriction. This is a very simple trick to obtain Bayes factors that obviates the need for integration. I think it is in this sense that the Bayes factor has “a simple form”. So I don’t think that fm/cm is the ratio of two (normalising) constants. Instead, these are the proportions of prior and posterior samples that obey the restriction. And that’s neat and helpful. When the restriction becomes an exact equality the trick converges to the Savage-Dickey test. This is discussed in Wetzels, R., Grasman, R. P. P. P., & Wagenmakers, E.-J. (2010). An encompassing prior generalization of the Savage-Dickey density ratio. Computational Statistics & Data Analysis, 54, 2094-2102.The paper is on my website, I hope the url displays correctly:

Cheers,

E.J.

September 20, 2013 at 8:18 pm

Thanks Eric! I accept the appeal of using the proportion of simulations from prior and posterior that obey the restriction. I am however a bit wary of the Savage-Dickey limit for reasons exposed in a paper of ours. My overall and only major criticism of the book is that it feels too narrow a topic for a whole book.