Archive for Statistics Forum

scale matters [maths as well]

Posted in pictures, R, Statistics with tags , , , , , , , , on June 2, 2021 by xi'an

A question from X validated on why an independent Metropolis sampler of a three component Normal mixture based on a single Normal proposal was failing to recover the said mixture…

When looking at the OP’s R code, I did not notice anything amiss at first glance (I was about to drive back from Annecy, hence did not look too closely) and reran the attached code with a larger variance in the proposal, which returned the above picture for the MCMC sample, close enough (?) to the target. Later, from home, I checked the code further and noticed that the Metropolis ratio was only using the ratio of the targets. Dividing by the ratio of the proposals made a significant (?) to the representation of the target.

More interestingly, the OP was fundamentally confused between independent and random-walk Rosenbluth algorithms, from using the wrong ratio to aiming at the wrong scale factor and average acceptance ratio, and furthermore challenged by the very notion of Hessian matrix, which is often suggested as a default scale.

golden Bayesian!

Posted in Statistics with tags , , , , , , , , , on November 11, 2017 by xi'an

simulation under zero measure constraints

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , on November 17, 2016 by xi'an

A theme that comes up fairly regularly on X validated is the production of a sample with given moments, either for calibration motives or from a misunderstanding of the difference between a distribution mean and a sample average. Here are some entries on that topic:

In most of those questions, the constraint in on the sum or mean of the sample, which allows for an easy resolution by a change of variables. It however gets somewhat harder when the constraint involves more moments or, worse, an implicit solution to an equation. A good example of the later is the quest for a sample with a given maximum likelihood estimate in the case this MLE cannot be derived analytically. As for instance with a location-scale t sample…

Actually, even when the constraint is solely on the sum, a relevant question is the production of an efficient simulation mechanism. Using a Gibbs sampler that changes one component of the sample at each iteration does not qualify, even though it eventually produces the proper sample. Except for small samples. As in this example

n=3;T=1e4
s0=.5 #fixed average
sampl=matrix(s0,T,n)
for (t in 2:T){
 sampl[t,]=sampl[t-1,]
 for (i in 1:(n-1)){
  sampl[t,i]=runif(1,
  min=max(0,n*s0-sum(sampl[t,c(-i,-n)])-1),
  max=min(1,n*s0-sum(sampl[t,c(-i,-n)])))
 sampl[t,n]=n*s0-sum(sampl[t,-n])}}

For very large samples, I figure that proposing from the unconstrained density can achieve a sufficient efficiency, but the in-between setting remains an interesting problem.

The foundations of Statistics [reply]

Posted in Books, R, Statistics, University life with tags , , , , , , , on July 19, 2011 by xi'an

Shravan Vasishth has written a response to my review both published on the Statistics Forum. His response is quite straightforward and honest. In particular, he acknowledges not being a statistician and that he “should spend more time studying statistics”. I also understand the authors’ frustration at trying “to recruit several statisticians (at different points) to join [them] as co-authors for this book, in order to save [them] from [them]selves, so to speak. Nobody was willing to do join in.” (Despite the kind proposal to join as a co-author to a new edition, I  would be rather unwilling as well, mostly because of the concept to avoid calculus at all cost… I will actually meet with Shravan at the end of the month to discuss specifics of the statistical flaws in this book.)

However, I still do not understand why the book was published without a proper review from a statistician. Springer is a/my serious scientific editor and book proposals usually go through several reviews, prior to and after redaction. Shravan Vasishth asks for alternative references, which I personally cannot provide for lack of teaching at this level, but this is somehow besides the point: even if a book at the intended level and for the intended audience did not exist, this would not justify the publication of a book on statistics (and only statistics) by authors not proficient enough in the topic.

One point of the response I do not get is the third item about the blog and letting my “rage get the better of [myself] (the rage is no doubt there for good reason)”. Indeed, while I readily acknowledge the review is utterly negative, I have tried to stick to facts, either statistical flaws (like the unbiasedness of s) or presentation defects. The reference to a blog in the book could be a major incentive to adopt the book, so if the blog does not live as a blog, it is both a disappointment to the reader and a sort of a breach of advertising. I perfectly understand the many reasons for not maintaining a blog (!), but then the site should have been advertised as a site rather than a blog. This was the meaning of the paragraph

The authors advertise a blog about the book that contains very little information. (The last entry is from December 2010: “The book is out”.) This was a neat idea, had it been implemented.

that does not sound full of rage to me… Anyway, this is a minor point.

The foundations of Statistics: a simulation-based approach

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on July 12, 2011 by xi'an

“We have seen that a perfect correlation is perfectly linear, so an imperfect correlation will be `imperfectly linear’.” page 128

This book has been written by two linguists, Shravan Vasishth and Michael Broe, in order to teach statistics “in  areas that are traditionally not mathematically demanding” at a deeper level than traditional textbooks “without using too much mathematics”, towards building “the confidence necessary for carrying more sophisticated analyses” through R simulation. This is a praiseworthy goal, bound to produce a great book. However, and most sadly, I find the book does not live up to expectations. As in Radford Neal’s recent coverage of introductory probability books with R, there are statements there that show a deep misunderstanding of the topic… (This post has also been published on the Statistics Forum.) Continue reading