## from down-under, Lake Menteith upside-down

Posted in Books, R, Statistics with tags , , , , on January 23, 2013 by xi'an

The dataset used in Bayesian Core for the chapter on image processing is a Landsat picture of Lake of Menteith in Scotland (close to Loch Lomond). (Yes, Lake of Menteith, not Loch Menteith!) Here is the image produced in the book. I just got an email from Matt Moores at QUT that the image is both rotated and flipped:

The image of Lake Mentieth in figure 8.6 of Bayesian Core is upside-down and back-to-front, so to speak. Also, I recently read a paper by Lionel Cucala & J-M Marin that has the same error.

This is due to the difference between matrix indices and image coordinates: matrices in R are indexed by [row,column] but image coordinates are [x,y]. Also, y=1 is the first row of the matrix, but the bottom row of pixels in an image.

Only a one line change to the R code is required to display the image in the correct orientation:

image(1:100,1:100,t(as.matrix(lm3)[100:1,]),col=gray(256:1/256),xlab="",ylab="")


As can be checked on Googlemap, the picture is indeed rotated by a -90⁰ angle and the transpose correction does the job!

## yet more questions about Monte Carlo Statistical Methods

Posted in Books, Statistics, University life with tags , , , , , , , , , , on December 8, 2011 by xi'an

As a coincidence, here is the third email I this week about typos in Monte Carlo Statistical Method, from Peng Yu this time. (Which suits me well in terms of posts as  I am currently travelling to Provo, Utah!)

I’m reading the section on importance sampling. But there are a few cases in your book MCSM2 that are not clear to me.

On page 96: “Theorem 3.12 suggests looking for distributions g for which |h|f/g is almost constant with finite variance.”

What is the precise meaning of “almost constant”? If |h|f/g is almost constant, how come its variance is not finite?

“Almost constant” is not a well-defined property, I am afraid. By this sentence on page 96 we meant using densities g that made |h|f/g as little varying as possible while being manageable. Hence the insistence on the finite variance. Of course, the closer |h|f/g is to a constant function the more likely the variance is to be finite.

“It is important to note that although the finite variance constraint is not necessary for the convergence of (3.8) and of (3.11), importance sampling performs quite poorly when (3.12) ….”

It is not obvious to me why when (3.12) importance sampling performs poorly. I might have overlooked some very simple facts. Would you please remind me why it is the case? From the previous discussion in the same section, it seems that h(x) is missing in (3.12). I think that (3.12) should be (please compare with the first equation in section 3.3.2)

$\int h^2(x) f^2(x) / g(x) \text{d}x = + \infty$

The preference for a finite variance of f/g and against (3.12) is that we would like the importance function g to work well for most integrable functions h. Hence a requirement that the importance weight f/g itself behaves well. It guarantees some robustness across the h‘s and also avoids checking for the finite variance (as in your displayed equation) for all functions h that are square-integrable against g, by virtue of the Cauchy-Schwarz inequality.

## confusing errata in Monte Carlo Statistical Methods

Posted in Books, Statistics, University life with tags , , , , , on December 7, 2011 by xi'an

Following the earlier errata on Monte Carlo Statistical Methods, I received this email from Nick Foti:

I have a quick question about example 8.2 in Monte Carlo Statistical Methods which derives a slice sampler for a truncated N(-3,1) distribution (note, the book says it is a N(3,1) distribution, but the code and derivation are for a N(-3,1)). My question is that the slice set A(t+1) is described as

$\{y : f_1(y) \geq u f_1(x^{(t)}) \};$

which makes sense if u ~ U(0,1) as it corresponds to the previously described algorithm. However, in the errata for the book it says that u should actually be u(t+1) which according to the algorithm should be distributed as U(0,f1(x)). So unless something clever is being done with ratios of the f1‘s, it seems like the u(t+1) should be U(0,1) in this case, right?

There is indeed a typo in Example 8.4: the mean 3 should be -3… As for the errata, it addresses the missing time indices. Nick is right in assuming that those uniforms are indeed on (0,1), rather than on (0,f1(x)) as in Algorithm A.31. Apologies to all those confused by the errata!

## new typos in Monte Carlo Statistical Methods

Posted in Books, Statistics, University life with tags , , , , , , , , on December 7, 2011 by xi'an

Thanks to Jay Bartroff for pointing out those typos after teaching from Monte Carlo Statistical Methods:

• On page 52, the gamma Ga(α, β) distribution uses β as a rate parameter while in other places it is a scale parameter, see, e.g. eqn (2.2) [correct, I must say the parameterisation of the gamma distribution is a pain and, while we tried to homogenise the notation with the second parameter being the rate, there are places like this where either the rate convention (as in the exponential distribution) or the scale convention (as in the generation) is the natural one…]
• Still on page 52, in Example 2.20, truncated normals are said to be discussed after Example 1.5, but they’re not. [There is a mention made of constrained parameters right after but this is rather cryptic!]
• On page 53, the ratio f/gα following the second displayed eqn is missing some terms [or, rather, the equality sign should be a proportional sign]
• Still on page 53, in eqn (2.11), the whole expression, rather than the square root, should be divided by 2 [yes, surprising typo given that it was derived correctly in the original paper!]
• On page 92, the exact constraint is that supp(g) actually needs only contain the intersection of supp(f) and supp(h), such as when approximating tail probabilities [correct if the importance sampling method is only used for a single function h, else the condition stands as is]
• On page 94, fY does not need that integral in the denominator [correct, we already corrected for the truncation by subtracting 4.5 in the exponential]
• On page 114, Problem 3.22, ωi is missing a factor of 1/n [correct]
• On page 218, in Example 6.24, P00=0 [indeed, our remark that Pxx>0 should start with x=1. Note that this does not change the aperiodicity, though]
• On page 282, the log α after the 2nd displayed equation should be eα [correct, this was pointed out in a previous list of typos, but clearly not corrected in the latest printing!]
• On page 282, in the 5th displayed equation there are missing factors π(α’|b)/π(α0|b) in rejection probability [actually, no, because, those terms being both proposals and priors, they cancel in the ratio. We could add a sentence to this effect to explain why, though.]
• On page 634, the reference page for exponential distribution is mistakenly given as 99 [wow, very thorough reading! There is an exponential distribution involved on page 99 but I agree this is not the relevant page…]

## more typos in Monte Carlo statistical methods

Posted in Books, Statistics, University life with tags , , , , , , , , , on October 28, 2011 by xi'an

Jan Hanning kindly sent me this email about several difficulties with Chapters 3, Monte Carlo Integration, and  5, Monte Carlo Optimization, when teaching out of our book Monte Carlo Statistical Methods [my replies in italics between square brackets, apologies for the late reply and posting, as well as for the confusion thus created. Of course, the additional typos will soon be included in the typo lists on my book webpage.]:

1. I seem to be unable to reproduce Table 3.3 on page 88 – especially the chi-square column does not look quite right. [No, they definitely are not right: the true χ² quantiles should be 2.70, 3.84, and 6.63, at the levels 0.1, 0.05, and 0.01, respectively. I actually fail to understand how we got this table that wrong…]
2. The second question  I have is the choice of the U(0,1) in this Example 3.6. It  feels to me that a choice of Beta(23.5,18.5) for p1 and Beta(36.5,5.5) for p2 might give a better representation based on the data we have. Any comments? [I am plainly uncertain about this… Yours is the choice based on the posterior Beta coefficient distributions associated with Jeffreys prior, hence making the best use of the data. I wonder whether or not we should remove this example altogether… It is certainly “better” than the uniform. However, in my opinion, there is no proper choice for the distribution of the pi‘s because we are mixing there a likelihood-ratio solution with a Bayesian perspective on the predictive distribution of the likelihood-ratio. If anything, this exposes the shortcomings of a classical approach, but it is likely to confuse the students! Anyway, this is a very interesting problem.]
3. My students discovered that Problem 5.19 has the following typos, copying from their e-mail: “x_x” should be “x_i” [sure!]. There are a few “( )”s missing here and there [yes!]. Most importantly, the likelihood/density seems incorrect. The normalizing constant should be the reciprocal of the one showed in the book [oh dear, indeed, the constant in the exponential density did not get to the denominator…]. As a result, all the formulas would differ except the ones in part (a). [they clearly need to be rewritten, sorry about this mess!]
4. I am unsure about the if and only if part of the Theorem 5.15 [namely that the likelihood sequence is stationary if and only if the Q function in the E step has reached a stationary point]. It appears to me that a condition for the “if part” is missing [the “only if” part is a direct consequence of Jensen’s inequality]. Indeed Theorem 1 of Dempster et al 1977 has an extra condition [note that the original proof for convergence of EM has a flaw, as discussed here]. Am I missing something obvious? [maybe: it seems to me that, once Q reaches a fixed point, the likelihood L does not change… It is thus tautological, not a proof of convergence! But the theorem says a wee more, so this needs investigating. As Jan remarked, there is no symmetry in the Q function…]
5. Should there be a (n-m) in the last term of formula (5.17)? [yes, indeed!, multiply the last term by (n-m)]
6. Finally, I am a bit confused about the likelihood in Example 5.22 [which is a capture-recapture model]. Assume that Hij=k [meaning the animal i is in state k at time j]. Do you assume that you observed Xijr [which is the capture indicator for animal i at time j in zone k: it is equal to 1 for at most one k] as a Binomial B(n,pr) even for r≠k? [no, we observe all Xijr‘s with r≠k equal to zero]  The nature of the problem seems to suggest that the answer is no [for other indices, Xijr is always zero, indeed] If that is the case I do not see where the power on top of (1-pk) in the middle of the page 185 comes from [when the capture indices are zero, they do not contribute to the sum, which explains for this condensed formula. Therefore, I do not think there is anything wrong with this over-parameterised representation of the missing variables.]
7. In Section 5.3.4, there seems to be a missing minus sign in the approximation formula for the variance [indeed, shame on us for missing the minus in the observed information matrix!]
8. I could not find the definition of $\mathbb{N}^*$ in Theorem 6.15. Is it all natural numbers or all integers? May be it would help to include it in Appendix B. [Surprising! This is the set of all positive integers, I thought this was a standard math notation…]
9. In Definition 6.27, you probably want to say covering of A and not X. [Yes, we were already thinking of the next theorem, most likely!]
10. In Proposition 6.33 –   all x in A instead of all x in X. [Yes, again! As shown in the proof. Even though it also holds for all x in X]

Thanks a ton to Jan and to his UNC students (and apologies for leading them astray with those typos!!!)

## yet another typo

Posted in Books, Statistics, University life with tags , , on August 21, 2011 by xi'an

An email from Stefan Webb pointed out an embarrassing typo in the Appendix of The Bayesian Choice:

There is a type on page 522 in the definition of the density function of the hypergeometric distribution (A.14). It should read “pN choose x”, not “pn choose x” in the numerator of f since you have reparameterized the standard form with m = pN.

Indeed, the density of the hypergeometric distribution should be

$f(x|p)=\dfrac{{pN\choose x}{(1-p)N\choose n-x}}{{N\choose n}}\mathbb{I}_{\{n-(1-p)N,\ldots,pN\}}(x) \mathbb{I}_{\{0,1,\ldots,n\}}(x)$

$f(x|p)=\dfrac{{pn\choose x}{(1-p)N\choose n-x}}{{N\choose n}}\mathbb{I}_{\{n-(1-p)N,\ldots,pN\}}(x) \mathbb{I}_{\{0,1,\ldots,n\}}(x)$

## Sufficiency [BC]

Posted in Books, Statistics with tags , , , , on May 10, 2011 by xi'an

Here is an email I received about The Bayesian Choice a few days ago:

I am an undergraduate student in Japan. I am self-studying your classical book The Bayesian Choice. The book is wonderful with many instructive examples. Although it is a little bit hard for me right now, I think it will be very useful for my future research.

There is one point that I do not understand in Example 1.3.2 (p.14-15). I know a standard result that the sample mean and sample variance are independent, with the sample mean follows

$\mathcal{N}(\mu,(1/n)\sigma^2)$

while $s^2/\sigma^2$follows a chi-square of n-1 degree of freedom. In this example is it correct that one must factorize the likelihood function to $g(T(x)|\theta)$ which must be the product of these two normal and chi-square densities, and $h(x|T(x))$ which is free of $\theta$ ?

In the book I do not see why $g(T(x) | \theta)$ is the product of normal and chi-square densities. The first part correctly corresponds to the density of $\mathcal{N}(\mu,(1/n)*\sigma^2)$ . But the second part is not the density of n-1 degree of freedom chi-square of $s^2/\sigma^2$.

The example, as often, skips a lot of details, meaning that when one starts from the likelihood

$\sigma^{-n} e^{-(\bar x-\theta)^2 n/2\sigma^2} \, e^{-s^2/2\sigma^2} / (2\pi)^n,$

this expression only depends on T(x). Furthermore, it involves the normal density on $\bar x$ and part of the chi-square density on . One can then plug in the missing power of to make $g(T(x)|\theta)$ appear. The extra terms are then canceled by a function we can call $h(x|T(x))$However, there is a typo in this example in that $\sigma^n$ in the chi-square density should be $\sigma^{n-1}$!