Nonlinear Time Series just appeared

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , on February 26, 2014 by xi'an

My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating both inference in nonlinear time series and algorithms applied to those objects. The later includes MCMC, pMCMC, sequential Monte Carlo, particle filters, and the EM algorithm. While I am too close to the authors to write a balanced review for CHANCE (the book is under review by another researcher, before you ask!), I think this is an important book that reflects the state of the art in the rigorous study of those models. Obviously, the mathematical rigour advocated by the authors makes Nonlinear Time Series a rather advanced book (despite the authors’ reassuring statement that “nothing excessively deep is used”) more adequate for PhD students and researchers than starting graduates (and definitely not advised for self-study), but the availability of the R code (on the highly personal page of David Stoffer) comes to balance the mathematical bent of the book in the first and third parts. A great reference book!

Statistics for spatio-temporal data [book review]

Posted in Books, Statistics, University life with tags , , , , , , on October 14, 2013 by xi'an

Here is the new reference book about spatial and spatio-temporal statistical modelling!  Noel Cressie wrote the earlier classic Statistics for Spatial Data in 1993 and he has now co-authored with Christopher Wikle (a plenary speaker at ISBA 2014 in Cancún) the new bible on the topic. And with a very nice cover of a Guatemaltec lienzo about the Spanish conquest. (Disclaimer: as I am a good friend of Noel, do not expect this review to remain unbiased!)

“…we state the obvious, that political boundaries cannot hold back a one-meter rise in sea level; our environment is ultimately a global resource and its stewardship is an international responsibility.” (p.11)

The book is a sum (in the French/Latin meaning of somme/summa when applied to books—I am not sure this explanation makes any sense!) and, as its predecessor, it covers an enormous range of topics and methods. So do not expect a textbook coverage of most notions and prepare to read further articles referenced in the text. One of the many differences with the earlier book is that MCMC appears from the start as a stepping stone that is necessary to handle

“…there are model-selection criteria that could be invoked (e.g., AIC, BIC, DIC, etc.), which concentrate on the twin pillars of predictability and parsimony. But they do not address the third pillar, namely scientific interpretability (i.e., knowledge).” (p.33)

The first chapter of the book is actually a preface motivating the topics covered by the book, which may be confusing on a first read, esp. for a graduate student, as there is no math formula and no model introduced at this stage. Anyway, this is not really a book made for a linear read. It is quite  witty (with too many quotes to report here!) and often funny (I learned for instance that Einstein’s quote “Everything should be made as simple as possible, but not simpler” was a paraphrase of an earlier lecture, invented by the Reader’s Digest!).

“Thus, we believe that it is not helpful to try to classify probability distributions that determine the statistical models, as subjective or objective. Better questions to ask are about the sensitivity of inferences to model choices and whether such choices make sense scientifically.” (p.32)

The overall tone of the book is mostly Bayesian, in a non-conflictual conditional probability way, insisting on hierarchical (Bayesian) model building. Incidentally, it uses the same bracket notation for generic distributions (densities) as in Gelfand and Smith (JASA, 1990), i.e. [X|Y] and [X|Z,y][Z|y,θ], notation that did not get much of a fan club. (I actually do not know where it stemmed from.) The second chapter contains an illustration of the search for the USS Scorpion using a Bayesian model (including priors built from experts’ opinions), example which is also covered [without the maths!] in Sharon McGrayne’s Theory that would not die.

The book is too rich and my time is too tight (!) to cover each chapter in details.  (For instance, I am not so happy with the temporal chapter in that it moves away from the Bayesian perspective without much of a justification.) Suffice to say then that it appears like an updated and improved version of its predecessor, with 45 pages of references, some of them quite recent. If I was to teach from this book at a Master level, it would take the whole academic year and then some, assuming enough mathematical culture from the student audience.

As an addendum, I noticed several negative reviews on amazon due to the poor quality of the printing, but the copy I received from John Wiley was quite fine, with the many colour graphs well-rendered. Maybe an earlier printing or a different printing agreement?

non-stationary AR(10)

Posted in Books, R, Statistics, University life with tags , , , , , on January 19, 2012 by xi'an

In the revision of Bayesian Core on which Jean-Michel Marin and I worked together most of last week, having missed our CIRM break last summer (!), we have now included an illustration of what happens to an AR(p) time series when the customary stationarity+causality condition on the roots of the associated polynomial is not satisfied.  More specifically, we generated several time-series with the same underlying white noise and random coefficients that have a fair chance of providing non-stationary series and then plotted the 260 next steps of the series by the R code

p=10
T=260
dat=seri=rnorm(T) #white noise

par(mfrow=c(2,2),mar=c(2,2,1,1))
for (i in 1:4){
coef=runif(p,min=-.5,max=.5)
for (t in ((p+1):T))
seri[t]=sum(coef*seri[(t-p):(t-1)])+dat[t]
plot(seri,ty="l",lwd=2,ylab="")
}


leading to outputs like the following one

Checking for stationarity [X-valid'ed]

Posted in Books, Statistics, University life with tags , , , , , , , , on January 16, 2012 by xi'an

While working with Jean-Michel Marin on the revision of Bayesian Core, and more specifically on the time series chapter, I was wondering about the following problem:

It is well-known [at least to readers of  Bayesian Core] that an AR(p) process

$x_t=\sum_{i=1}^p \varrho_i x_{t-i} + \epsilon_t$

is causal and stationary if and only if the roots of the polynomial

$\mathcal{P}(u) = 1 - \sum_{i=1}^p \varrho_i u^i$

are all outside the unit circle in the complex plane. This defines an implicit (and unfriendly!) parameter space for the original parameters of the AR(p) model. In particular, when considering a candidate parameter, to determine whether or not the constraint is satisfied implies checking for the root of the associated polynomial. The question  I asked on Cross Validated a few days ago was whether or not there existed a faster algorithm than the naïve one that consists in (a) finding the roots of P and (b) checking none one them is inside the unit circle. Two hours later I got a reply from J. Bowman about the Schur-Cohn procedure that answers the question about the roots in O() steps without going through the determination of the roots. (This is presumably the same Issai Schur as in Schur’s lemma.) However,  J. Bowman also pointed out that the corresponding order for polynomial root solvers is O()! Nonetheless, I think the Schur-Cohn procedure is way faster.

Time series

Posted in Books, R, Statistics with tags , , , , , , on March 29, 2011 by xi'an

(This post got published on The Statistics Forum yesterday.)

The short book review section of the International Statistical Review sent me Raquel Prado’s and Mike West’s book, Time Series (Modeling, Computation, and Inference) to review. The current post is not about this specific book, but rather on why I am unsatisfied with the textbooks in this area (and correlatively why I am always reluctant to teach a graduate course on the topic). Again, I stress that the following is not specifically about the book by Raquel Prado and Mike West!

With the noticeable exception of Brockwell and Davis’ Time Series: Theory and Methods, most time-series books seem to suffer (in my opinion) from the same difficulty, which sums up as being unable to provide the reader with a coherent and logical description of/introduction to the field. (This echoes a complaint made by Håvard Rue a few weeks ago in Zurich.) Instead, time-series books appear to haphazardly pile up notions and techniques, theory and methods, without paying much attention to the coherency of the presentation. That’s how I was introduced to the field (even though it was by a fantastic teacher!) and the feeling has not left me since then. It may be due to the fact that the field stemmed partly from signal processing in engineering and partly from econometrics, but such presentations never achieve a Unitarian front on how to handle time-series. In particular, the opposition between the time domain and the frequency domain always escapes me. This is presumably due to my inability to see the relevance of the spectral approach, as harmonic regression simply appears (to me) as a special type of non-linear regression with sinusoidal regressors and with a well-defined likelihood that does not require Fourier frequencies nor periodogram (nor either spectral density estimation). Even within the time domain, I find the handling of stationarity  by time-series book to be mostly cavalier. Why stationarity is important is never addressed, which leads to the reader being left with the hard choice between imposing stationarity and not imposing stationarity. (My original feeling was to let the issue being decided by the data, but this is not possible!) Similarly, causality is often invoked as a reason to set constraints on MA coefficients, even though this resorts to a non-mathematical justification, namely preventing dependence on the future. I thus wonder if being an Unitarian (i.e. following a single logical process for analysing time-series data) is at all possible in the time-series world! E.g., in Bayesian Core, we processed AR, MA, ARMA models in a single perspective, conditioning on the initial values of the series and imposing all the usual constraints on the roots of the lag polynomials but this choice was far from perfectly justified…