Archive for state space model

efficient approximate Bayesian inference for models with intractable likleihood

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on July 6, 2015 by xi'an

Awalé board on my garden table, March 15, 2013Dalhin, Villani [Mattias, not Cédric] and Schön arXived a paper this week with the above title. The type of intractable likelihood they consider is a non-linear state-space (HMM) model and the SMC-ABC they propose is based on an optimised Laplace approximation. That is, replacing the posterior distribution on the parameter θ with a normal distribution obtained by a Taylor expansion of the log-likelihood. There is no obvious solution for deriving this approximation in the case of intractable likelihood functions and the authors make use of a Bayesian optimisation technique called Gaussian process optimisation (GPO). Meaning that the Laplace approximation is the Laplace approximation of a surrogate log-posterior. GPO is a Bayesian numerical method in the spirit of the probabilistic numerics discussed on the ‘Og a few weeks ago. In the current setting, this means iterating three steps

  1. derive an approximation of the log-posterior ξ at the current θ using SMC-ABC
  2. construct a surrogate log-posterior by a Gaussian process using the past (ξ,θ)’s
  3. determine the next value of θ

In the first step, a standard particle filter cannot be used to approximate the observed log-posterior at θ because the conditional density of observed given latent is intractable. The solution is to use ABC for the HMM model, in the spirit of many papers by Ajay Jasra and co-authors. However, I find the construction of the substitute model allowing for a particle filter very obscure… (A side effect of the heat wave?!) I can spot a noisy ABC feature in equation (7), but am at a loss as to how the reparameterisation by the transform τ is compatible with the observed-given-latent conditional being unavailable: if the pair (x,v) at time t has a closed form expression, so does (x,y), at least on principle, since y is a deterministic transform of (x,v). Another thing I do not catch is why having a particle filter available prevent the use of a pMCMC approximation.

The second step constructs a Gaussian process posterior on the log-likelihood, with Gaussian errors on the ξ’s. The Gaussian process mean is chosen as zero, while the covariance function is a Matérn function. With hyperparameters that are estimated by maximum likelihood estimators (based on the argument that the marginal likelihood is available in closed form). Turning the approach into an empirical Bayes version.

The next design point in the sequence of θ’s is the argument of the maximum of a certain acquisition function, which is chosen here as a sort of maximum regret associated with the posterior predictive associated with the Gaussian process. With possible jittering. At this stage, it reminded me of the Gaussian process approach proposed by Michael Gutmann in his NIPS poster last year.

Overall, the method is just too convoluted for me to assess its worth and efficiency without a practical implementation to… practice upon, for which I do not have time! Hence I would welcome any comment from readers having attempted such implementations. I also wonder at the lack of link with Simon Wood‘s Gaussian approximation that appeared in Nature (2010) and was well-discussed in the Read Paper of Fearnhead and Prangle (2012).

Approximate Bayesian Computation in state space models

Posted in Statistics, Travel, University life with tags , , , , , , , on October 2, 2014 by xi'an

While it took quite a while (!), with several visits by three of us to our respective antipodes, incl. my exciting trip to Melbourne and Monash University two years ago, our paper on ABC for state space models was arXived yesterday! Thanks to my coauthors, Gael Martin, Brendan McCabe, and  Worapree Maneesoonthorn,  I am very glad of this outcome and of the new perspective on ABC it produces.  For one thing, it concentrates on the selection of summary statistics from a more econometrics than usual point of view, defining asymptotic sufficiency in this context and demonstrated that both asymptotic sufficiency and Bayes consistency can be achieved when using maximum likelihood estimators of the parameters of an auxiliary model as summary statistics. In addition, the proximity to (asymptotic) sufficiency yielded by the MLE is replicated by the score vector. Using the score instead of the MLE as a summary statistics allows for huge gains in terms of speed. The method is then applied to a continuous time state space model, using as auxiliary model an augmented unscented Kalman filter. We also found in the various state space models tested therein that the ABC approach based on the marginal [likelihood] score was performing quite well, including wrt Fearnhead’s and Prangle’s (2012) approach… I like the idea of using such a generic object as the unscented Kalman filter for state space models, even when it is not a particularly accurate representation of the true model. Another appealing feature of the paper is in the connections made with indirect inference.

Statistical modeling and computation [apologies]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on June 11, 2014 by xi'an

In my book review of the recent book by Dirk Kroese and Joshua Chan,  Statistical Modeling and Computation, I mistakenly and persistently typed the name of the second author as Joshua Chen. This typo alas made it to the printed and on-line versions of the subsequent CHANCE 27(2) column. I am thus very much sorry for this mistake of mine and most sincerely apologise to the authors. Indeed, it always annoys me to have my name mistyped (usually as Roberts!) in references.  [If nothing else, this typo signals it is high time for a change of my prescription glasses.]

Statistical modeling and computation [book review]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , on January 22, 2014 by xi'an

Dirk Kroese (from UQ, Brisbane) and Joshua Chan (from ANU, Canberra) just published a book entitled Statistical Modeling and Computation, distributed by Springer-Verlag (I cannot tell which series it is part of from the cover or frontpages…) The book is intended mostly for an undergrad audience (or for graduate students with no probability or statistics background). Given that prerequisite, Statistical Modeling and Computation is fairly standard in that it recalls probability basics, the principles of statistical inference, and classical parametric models. In a third part, the authors cover “advanced models” like generalised linear models, time series and state-space models. The specificity of the book lies in the inclusion of simulation methods, in particular MCMC methods, and illustrations by Matlab code boxes. (Codes that are available on the companion website, along with R translations.) It thus has a lot in common with our Bayesian Essentials with R, meaning that I am not the most appropriate or least unbiased reviewer for this book. Continue reading

particle efficient importance sampling

Posted in Statistics with tags , , , , , , on October 15, 2013 by xi'an

Marcel Scharth and Robert Kohn just arXived a new article entitled “particle efficient importance sampling“. What is—the efficiency—about?! The spectacular diminution in variance—(the authors mention a factor of 6,000 when compared with regular particle filters!—in a stochastic volatility simulation study.

If I got the details right, the improvement stems from a paper by Richard and Zhang (Journal of  Econometrics, 2007). In a state-space/hidden Markov model setting, (non-sequential) importance sampling tries to approximate the smoothing distribution one term at a time, ie p(xt|xt-1,y1:n), but Richard and Zhang (2007) modify the target by looking at


where the last term χ(xt-1,y1:n) is the normalising constant of the proposal kernel for the previous (in t-1) target, k(xt-1|xt-2,y1:n). This kernel is actually parameterised as k(xt-1|xt-2,at(y1:n)) and the EIS algorithm optimises those parameters, one term at a time. The current paper expands Richard and Zhang (2007) by using particles to approximate the likelihood contribution and reduce the variance once the “optimal” EIS solution is obtained. (They also reproduce Richard’s and Zhang’s tricks of relying on the same common random numbers.

This approach sounds like a “miracle” to me, in the sense(s) that (a) the “normalising constant” is far from being uniquely defined (and just as far from being constant in the parameter at) and (b) it is unrelated with the target distribution (except for the optimisation step). In the extreme case when the normalising constant is also constant… in at, this step clearly is useless. (This also opens the potential for an optimisation in the choice of χ(xt-1,y1:n)…)

The simulation study starts from a univariate stochastic volatility model relying on two hidden correlated AR(1) models. (There may be a typo in the definition in Section 4.1, i.e. a Φi missing.) In those simulations, EIS brings a significant variance reduction when compared with standard particle filters and particle EIS further improves upon EIS by a factor of 2 to 20 (in the variance). I could not spot in the paper which choice had been made for χ()… which is annoying as I gathered from my reading that it must have a strong impact on the efficiency attached to the name of the method!


Get every new post delivered to your Inbox.

Join 882 other followers