Bayesian filtering and smoothing [book review]
When in Warwick last October, I met Simo Särkkä, who told me he had published an IMS monograph on Bayesian filtering and smoothing the year before. I thought it would be an appropriate book to review for CHANCE and tried to get a copy from Oxford University Press, unsuccessfully. I thus bought my own book that I received two weeks ago and took the opportunity of my Czech vacations to read it… [A warning pre-empting accusations of self-plagiarism: this is a preliminary draft for a review to appear in CHANCE under my true name!]
“From the Bayesian estimation point of view both the states and the static parameters are unknown (random) parameters of the system.” (p.20)
Bayesian filtering and smoothing is an introduction to the topic that essentially starts from ground zero. Chapter 1 motivates the use of filtering and smoothing through examples and highlights the naturally Bayesian approach to the problem(s). Two graphs illustrate the difference between filtering and smoothing by plotting for the same series of observations the successive confidence bands. The performances are obviously poorer with filtering but the fact that those intervals are point-wise rather than joint, i.e., that the graphs do not provide a confidence band. (The exercise section of that chapter is superfluous in that it suggests re-reading Kalman’s original paper and rephrases the Monty Hall paradox in a story unconnected with filtering!) Chapter 2 gives an introduction to Bayesian statistics in general, with a few pages on Bayesian computational methods. A first remark is that the above quote is both correct and mildly confusing in that the parameters can be consistently estimated, while the latent states cannot. A second remark is that justifying the MAP as associated with the 0-1 loss is incorrect in continuous settings. The third chapter deals with the batch updating of the posterior distribution, i.e., that the posterior at time t is the prior at time t+1. With applications to state-space systems including the Kalman filter. The fourth to sixth chapters concentrate on this Kalman filter and its extension, and I find it somewhat unsatisfactory in that the collection of such filters is overwhelming for a neophyte. And no assessment of the estimation error when the model is misspecified appears at this stage. And, as usual, I find the unscented Kalman filter hard to fathom! The same feeling applies to the smoothing chapters, from Chapter 8 to Chapter 10. Which mimic the earlier ones.
“The degeneracy problem can be solved by a resampling procedure.” (p.123)
By comparison, the seventh chapter on particle filters appears too introductory from my biased perspective. For instance, the above motivation for resampling in sequential importance (re)sampling is not clear enough. As stated it sounds too much like a trick, not mentioning the fast decrease in the number of first generation ancestors as the number of generations grows. And thus the need for either increasing the number of particles fast enough or checking for quick-forgetting. Chapter 11 is the equivalent of the above for particle smoothing. I would have like more details on the full posterior smoothing distribution, instead of the marginal posterior smoothing distribution at a given time t. And more of a discussion on the comparative merits of the different algorithms.
Chapter 12 is much longer than the other chapters as it caters to the much more realistic issue of parameter estimation. The chapter borrows at time from Cappé, Moulines and Rydèn (2007), where I contributed to the Bayesian estimation chapter. This is actually the first time in Bayesian filtering and smoothing when MCMC is mentioned. Including reference to adaptive MCMC and HMC. The chapter also covers some EM versions. And pMCMC à la Andrieu et al. (2010). Although a picture like Fig. 12.2 seems to convey the message that this particle MCMC approach is actually quite inefficient.
“An important question (…) which of the numerous methods should I choose?”
The book ends up with an Epilogue (Chapter 13). Suggesting to use (Monte Carlo) sampling only after all other methods have failed. Which implies assessing that those methods have indeed failed. Maybe the suggestion of running what seems like the most appropriate method first with synthetic data (rather than the real data) could be included. For one thing, it does not add much to the computing cost. All in all, and despite some criticisms voiced above, I find the book quite an handy and compact introduction to the field, albeit slightly terse for an undergraduate audience.