## Archive for Statistics Forum

## golden Bayesian!

Posted in Statistics with tags badge, Bayesian, cross validated, introductory textbooks, MCMC, Monte Carlo Statistical Methods, simulation, Stack Exchange, Statistics Forum, wikipedia on November 11, 2017 by xi'an## simulation under zero measure constraints

Posted in Books, Kids, R, Statistics, University life with tags cross validated, Gibbs sampler, maximum likelihood estimation, Monte Carlo algorithm, Monte Carlo Statistical Methods, simulation, Stack Exchange, Statistics Forum on November 17, 2016 by xi'an**A** theme that comes up fairly regularly on X validated is the production of a sample with given moments, either for calibration motives or from a misunderstanding of the difference between a distribution mean and a sample average. Here are some entries on that topic:

- How to sample from a distribution so that mean of samples equals expected value?
- Sample random variables conditional on their sum
- Simulation involving conditioning on sum of random variables
- conditional expectation of squared standard normal

In most of those questions, the constraint in on the sum or mean of the sample, which allows for an easy resolution by a change of variables. It however gets somewhat harder when the constraint involves more moments or, worse, an implicit solution to an equation. A good example of the later is the quest for a sample with a given maximum likelihood estimate in the case this MLE cannot be derived analytically. As for instance with a location-scale t sample…

Actually, even when the constraint is solely on the sum, a relevant question is the production of an efficient simulation mechanism. Using a Gibbs sampler that changes one component of the sample at each iteration does not qualify, even though it eventually produces the proper sample. Except for small samples. As in this example

```
n=3;T=1e4
s0=.5 #fixed average
sampl=matrix(s0,T,n)
for (t in 2:T){
sampl[t,]=sampl[t-1,]
for (i in 1:(n-1)){
sampl[t,i]=runif(1,
min=max(0,n*s0-sum(sampl[t,c(-i,-n)])-1),
max=min(1,n*s0-sum(sampl[t,c(-i,-n)])))
sampl[t,n]=n*s0-sum(sampl[t,-n])}}
```

For very large samples, I figure that proposing from the unconstrained density can achieve a sufficient efficiency, but the in-between setting remains an interesting problem.

## The foundations of Statistics [reply]

Posted in Books, R, Statistics, University life with tags blog, foundations, introductory textbooks, linguistics, mathematics, R, simulation, Statistics Forum on July 19, 2011 by xi'an**S**hravan Vasishth has written a response to my review both published on the Statistics Forum. His response is quite straightforward and honest. In particular, he acknowledges not being a statistician and that he “should spend more time studying statistics”. I also understand the authors’ frustration at trying “to recruit several statisticians (at different points) to join [them] as co-authors for this book, in order to save [them] from [them]selves, so to speak. Nobody was willing to do join in.” (Despite the kind proposal to join as a co-author to a new edition, I would be rather unwilling as well, mostly because of the concept to avoid calculus at all cost… I will actually meet with Shravan at the end of the month to discuss specifics of the statistical flaws in this book.)

**H**owever, I still do not understand why the book was published without a proper review from a statistician. Springer is a/my serious scientific editor and book proposals usually go through several reviews, prior to and after redaction. Shravan Vasishth asks for alternative references, which I personally cannot provide for lack of teaching at this level, but this is somehow besides the point: even if a book at the intended level and for the intended audience did not exist, this would not justify the publication of a book on statistics (and only statistics) by authors not proficient enough in the topic.

**O**ne point of the response I do not get is the third item about the blog and letting my “rage get the better of [myself] (the rage is no doubt there for good reason)”. Indeed, while I readily acknowledge the review is utterly negative, I have tried to stick to facts, either statistical flaws (like the unbiasedness of *s*) or presentation defects. The reference to a blog in the book could be a major incentive to adopt the book, so if the blog does not live as a blog, it is both a disappointment to the reader and a sort of a breach of advertising. I perfectly understand the many reasons for not maintaining a blog (!), but then the site should have been advertised as a site rather than a blog. This was the meaning of the paragraph

The authors advertise a blog about the book that contains very little information. (The last entry is from December 2010: “The book is out”.) This was a neat idea, had it been implemented.

that does not sound full of rage to me… Anyway, this is a minor point.

## The foundations of Statistics: a simulation-based approach

Posted in Books, R, Statistics, University life with tags blog, Emacs, foundations, hypothesis testing, introductory textbooks, LaTeX, linguistics, mathematics, Python, R, simulation, Statistics Forum on July 12, 2011 by xi'an

“We have seen that a perfect correlation is perfectly linear, so an imperfect correlation will be `imperfectly linear’.”page 128

**T**his book has been written by two linguists, Shravan Vasishth and Michael Broe, in order to teach statistics “in areas that are traditionally not mathematically demanding” at a deeper level than traditional textbooks “without using too much mathematics”, towards building “the confidence necessary for carrying more sophisticated analyses” through R simulation. This is a praiseworthy goal, bound to produce a great book. However, and most sadly, I find the book does not live up to expectations. As in Radford Neal’s recent coverage of introductory probability books with R, there are statements there that show a deep misunderstanding of the topic… *(This post has also been published on the Statistics Forum.)* Continue reading

## Time series

Posted in Books, R, Statistics with tags ARMA models, Bayesian statistics, graduate course, stationarity, Statistics Forum, textbook, time series on March 29, 2011 by xi'an*(This post got published on The Statistics Forum yesterday.)*

**T**he short book review section of the International Statistical Review sent me Raquel Prado’s and Mike West’s book, ** Time Series (Modeling, Computation, and Inference)** to review. The current post is not about this specific book, but rather on why I am unsatisfied with the textbooks in this area (and correlatively why I am always reluctant to teach a graduate course on the topic). Again, I stress that

*the following is not specifically about the book by Raquel Prado and Mike West!*

**W**ith the noticeable exception of Brockwell and Davis’ ** Time Series: Theory and Methods**, most time-series books seem to suffer (in my opinion) from the same difficulty, which sums up as being unable to provide the reader with a coherent and logical description of/introduction to the field. (This echoes a complaint made by Håvard Rue a few weeks ago in Zurich.) Instead, time-series books appear to haphazardly pile up notions and techniques, theory and methods, without paying much attention to the coherency of the presentation. That’s how I was introduced to the field (even though it was by a fantastic teacher!) and the feeling has not left me since then. It may be due to the fact that the field stemmed partly from signal processing in engineering and partly from econometrics, but such presentations never achieve a Unitarian front on how to handle time-series. In particular, the opposition between the time domain and the frequency domain always escapes me. This is presumably due to my inability to see the relevance of the spectral approach, as harmonic regression simply appears (to me) as a special type of non-linear regression with sinusoidal regressors and with a well-defined likelihood that does not require Fourier frequencies nor periodogram (nor either spectral density estimation). Even within the time domain, I find the handling of stationarity by time-series book to be mostly cavalier. Why stationarity is important is never addressed, which leads to the reader being left with the hard choice between imposing stationarity and not imposing stationarity. (My original feeling was to let the issue being decided by the data, but this is not possible!) Similarly, causality is often invoked as a reason to set constraints on MA coefficients, even though this resorts to a non-mathematical justification, namely preventing dependence on the future. I thus wonder if being an Unitarian (i.e. following a single logical process for analysing time-series data) is at all possible in the time-series world! E.g., in

*, we processed AR, MA, ARMA models in a single perspective, conditioning on the initial values of the series and imposing all the usual constraints on the roots of the lag polynomials but this choice was far from perfectly justified…*

**Bayesian Core**