Archive for state space model

ABC forecasts

Posted in Books, pictures, Statistics with tags , , , , , , , , on January 9, 2018 by xi'an

My friends and co-authors David Frazier, Gael Martin, Brendan McCabe, and Worapree Maneesoonthorn arXived a paper on ABC forecasting at the turn of the year. ABC prediction is a natural extension of ABC inference in that, provided the full conditional of a future observation given past data and parameters is available but the posterior is not, ABC simulations of the parameters induce an approximation of the predictive. The paper thus considers the impact of this extension on the precision of the predictions. And argues that it is possible that this approximation is preferable to running MCMC in some settings. A first interesting result is that using ABC and hence conditioning on an insufficient summary statistic has no asymptotic impact on the resulting prediction, provided Bayesian concentration of the corresponding posterior takes place as in our convergence paper under revision.

“…conditioning inference about θ on η(y) rather than y makes no difference to the probabilistic statements made about [future observations]”

The above result holds both in terms of convergence in total variation and for proper scoring rules. Even though there is always a loss in accuracy in using ABC. Now, one may think this is a direct consequence of our (and others) earlier convergence results, but numerical experiments on standard time series show the distinct feature that, while the [MCMC] posterior and ABC posterior distributions on the parameters clearly differ, the predictives are more or less identical! With a potential speed gain in using ABC, although comparing parallel ABC versus non-parallel MCMC is rather delicate. For instance, a preliminary parallel ABC could be run as a burnin’ step for parallel MCMC, since all chains would then be roughly in the stationary regime. Another interesting outcome of these experiments is a case when the summary statistics produces a non-consistent ABC posterior, but still leads to a very similar predictive, as shown on this graph.This unexpected accuracy in prediction may further be exploited in state space models, towards producing particle algorithms that are greatly accelerated. Of course, an easy objection to this acceleration is that the impact of the approximation is unknown and un-assessed. However, such an acceleration leaves room for multiple implementations, possibly with different sets of summaries, to check for consistency over replicates.

ABC in Stockholm [on-board again]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on May 18, 2016 by xi'an

abcruiseAfter a smooth cruise from Helsinki to Stockholm, a glorious sunrise over the Ålend Islands, and a morning break for getting an hasty view of the city, ABC in Helsinki (a.k.a. ABCruise) resumed while still in Stockholm. The first talk was by Laurent Calvet about dynamic (state-space) models, when the likelihood is not available and replaced with a proximity between the observed and the simulated observables, at each discrete time in the series. The authors are using a proxy predictive for the incoming observable and derive an optimal—in a non-parametric sense—bandwidth based on this proxy. Michael Gutmann then gave a presentation that somewhat connected with his talk at ABC in Roma, and poster at NIPS 2014, about using Bayesian optimisation to reduce the rejections in ABC algorithms. Which means building a model of a discrepancy or distance by Bayesian optimisation. I definitely like this perspective as it reduces the simulation to one of a discrepancy (after a learning step). And does not require a threshold. Aki Vehtari expanded on this idea with a series of illustrations. A difficulty I have with the approach is the construction of the acquisition function… The last session while pretty late was definitely exciting with talks by Richard Wilkinson on surrogate or emulator models, which goes very much in a direction I support, namely that approximate models should be accepted on their own, by Julien Stoehr with clustering and machine learning tools to incorporate more summary statistics, and Tim Meeds who concluded with two (small) talks!, centred on the notion of deterministic algorithms that explicitly incorporate the random generators within the comparison, resulting in post-simulation recentering à la Beaumont et al. (2003), plus new advances with further incorporations of those random generators turned deterministic functions within variational Bayes inference

On Wednesday morning, we will land back in Helsinki and head back to our respective homes, after another exciting ABC in… workshop. I am terribly impressed by the way this workshop at sea operated, providing perfect opportunities for informal interactions and collaborations, without ever getting claustrophobic or dense. Enjoying very long days also helped. While it seems unlikely we can repeat this successful implementation, I hope we can aim at similar formats in the coming occurrences. Kitos paljon to our Finnish hosts!

can we trust computer simulations? [day #2]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , on July 13, 2015 by xi'an

Herrenhausen“Sometimes the models are better than the data.” G. Krinner

Second day at the conference on building trust in computer simulations. Starting with a highly debated issue, climate change projections. Since so many criticisms are addressed to climate models as being not only wrong but also unverifiable. And uncheckable. As explained by Gerhart Krinner, the IPCC has developed methodologies to compare models and evaluate predictions. However, from what I understood, this validation does not say anything about the future, which is the part of the predictions that matters. And that is attacked by critics and feeds climatic-skeptics. Because it is so easy to argue against the homogeneity of the climate evolution and for “what you’ve seen is not what you’ll get“! (Even though climatic-skeptics are the least likely to use this time-heterogeneity argument, being convinced as they are of the lack of human impact over the climate.)  The second talk was by Viktoria Radchuk about validation in ecology. Defined here as a test of predictions against independent data (and designs). And mentioning Simon Wood’s synthetic likelihood as the Bayesian reference for conducting model choice (as a synthetic likelihoods ratio). I had never thought of this use (found in Wood’s original paper) for synthetic likelihood, I feel a bit queasy about using a synthetic likelihood ratio as a genuine likelihood ratio. Which led to a lively discussion at the end of her talk. The next talk was about validation in economics by Matteo Richiardi, who discussed state-space models where the hidden state is observed through a summary statistic, perfect playground for ABC! But Matteo opted instead for a non-parametric approach that seems to increase imprecision and that I have never seen used in state-space models. The last part of the talk was about non-ergodic models, for which checking for validity becomes much more problematic, in my opinion. Unless one manages multiple observations of the non-ergodic path. Nicole Saam concluded this “Validation in…” morning with Validation in Sociology. With a more pessimistic approach to the possibility of finding a falsifying strategy, because of the vague nature of sociology models. For which data can never be fully informative. She illustrated the issue with an EU negotiation analysis. Where most hypotheses could hardly be tested.

“Bayesians persist with poor examples of randomness.” L. Smith

“Bayesians can be extremely reasonable.” L. Smith

The afternoon session was dedicated to methodology, mostly statistics! Andrew Robinson started with a talk on (frequentist) model validation. Called splitters and lumpers. Illustrated by a forest growth model. He went through traditional hypothesis tests like Neyman-Pearson’s that try to split between samples. And (bio)equivalence tests that take difference as the null. Using his equivalence R package. Then Leonard Smith took over [in a literal way!] from a sort-of-Bayesian perspective, in a work joint with Jim Berger and Gary Rosner on pragmatic Bayes which was mostly negative about Bayesian modelling. Introducing (to me) the compelling notion of structural model error as a representation of the inadequacy of the model. With illustrations from weather and climate models. His criticism of the Bayesian approach is that it cannot be holistic while pretending to be [my wording]. And being inadequate to measure model inadequacy, to the point of making prior choice meaningless. Funny enough, he went back to the ball dropping experiment David Higdon discussed at one JSM I attended a while ago, with the unexpected outcome that one ball did not make it to the bottom of the shaft. A more positive side was that posteriors are useful models but should not be interpreted from a probabilistic perspective. Move beyond probability was his final message. (For most of the talk, I misunderstood P(BS), the probability of a big surprise, for something else…) This was certainly the most provocative talk of the conference  and the discussion could have gone on for the rest of day! Somewhat, Lenny was voluntarily provocative in piling the responsibility upon the Bayesian’s head for being overconfident and not accounting for the physicist’ limitations in modelling the phenomenon of interest. Next talk was by Edward Dougherty on methods used in biology. He separated within-model uncertainty from outside-model inadequacy. The within model part is mostly easy to agree upon. Even though difficulties in estimating parameters creates uncertainty classes of models. Especially because of being from a small data discipline. He analysed the impact of machine learning techniques like classification as being useless without prior knowledge. And argued in favour of the Bayesian minimum mean square error estimator. Which can also lead to a classifier. And experimental design. (Using MSE seems rather reductive when facing large dimensional parameters.) Last talk of the day was by Nicolas Becu, a geographer, with a surprising approach to validation via stakeholders. A priori not too enticing a name! The discussion was of a more philosophical nature, going back to (re)define validation against reality and imperfect models. And including social aspects of validation, e.g., reality being socially constructed. This led to the stakeholders, because a model is then a shared representation. Nicolas illustrated the construction by simulation “games” of a collective model in a community of Thai farmers and in a group of water users.

In a rather unique fashion, we also had an evening discussion on points we share and points we disagreed upon. After dinner (and wine), which did not help I fear! Bill Oberkampf mentioned the use of manufactured solutions to check code, which seemed very much related to physics. But then we got mired into the necessity of dividing between verification and validation. Which sounded very and too much engineering-like to me. Maybe because I do not usually integrate coding errors and algorithmic errors into my reasoning (verification)… Although sharing code and making it available makes a big difference. Or maybe because considering all models are wrong is neither part of my methodology (validation). This part ended up in a fairly pessimistic conclusion on the lack of trust in most published articles. At least in the biological sciences.

efficient approximate Bayesian inference for models with intractable likelihood

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on July 6, 2015 by xi'an

Awalé board on my garden table, March 15, 2013Dalhin, Villani [Mattias, not Cédric] and Schön arXived a paper this week with the above title. The type of intractable likelihood they consider is a non-linear state-space (HMM) model and the SMC-ABC they propose is based on an optimised Laplace approximation. That is, replacing the posterior distribution on the parameter θ with a normal distribution obtained by a Taylor expansion of the log-likelihood. There is no obvious solution for deriving this approximation in the case of intractable likelihood functions and the authors make use of a Bayesian optimisation technique called Gaussian process optimisation (GPO). Meaning that the Laplace approximation is the Laplace approximation of a surrogate log-posterior. GPO is a Bayesian numerical method in the spirit of the probabilistic numerics discussed on the ‘Og a few weeks ago. In the current setting, this means iterating three steps

  1. derive an approximation of the log-posterior ξ at the current θ using SMC-ABC
  2. construct a surrogate log-posterior by a Gaussian process using the past (ξ,θ)’s
  3. determine the next value of θ

In the first step, a standard particle filter cannot be used to approximate the observed log-posterior at θ because the conditional density of observed given latent is intractable. The solution is to use ABC for the HMM model, in the spirit of many papers by Ajay Jasra and co-authors. However, I find the construction of the substitute model allowing for a particle filter very obscure… (A side effect of the heat wave?!) I can spot a noisy ABC feature in equation (7), but am at a loss as to how the reparameterisation by the transform τ is compatible with the observed-given-latent conditional being unavailable: if the pair (x,v) at time t has a closed form expression, so does (x,y), at least on principle, since y is a deterministic transform of (x,v). Another thing I do not catch is why having a particle filter available prevent the use of a pMCMC approximation.

The second step constructs a Gaussian process posterior on the log-likelihood, with Gaussian errors on the ξ’s. The Gaussian process mean is chosen as zero, while the covariance function is a Matérn function. With hyperparameters that are estimated by maximum likelihood estimators (based on the argument that the marginal likelihood is available in closed form). Turning the approach into an empirical Bayes version.

The next design point in the sequence of θ’s is the argument of the maximum of a certain acquisition function, which is chosen here as a sort of maximum regret associated with the posterior predictive associated with the Gaussian process. With possible jittering. At this stage, it reminded me of the Gaussian process approach proposed by Michael Gutmann in his NIPS poster last year.

Overall, the method is just too convoluted for me to assess its worth and efficiency without a practical implementation to… practice upon, for which I do not have time! Hence I would welcome any comment from readers having attempted such implementations. I also wonder at the lack of link with Simon Wood‘s Gaussian approximation that appeared in Nature (2010) and was well-discussed in the Read Paper of Fearnhead and Prangle (2012).

Approximate Bayesian Computation in state space models

Posted in Statistics, Travel, University life with tags , , , , , , , on October 2, 2014 by xi'an

While it took quite a while (!), with several visits by three of us to our respective antipodes, incl. my exciting trip to Melbourne and Monash University two years ago, our paper on ABC for state space models was arXived yesterday! Thanks to my coauthors, Gael Martin, Brendan McCabe, and  Worapree Maneesoonthorn,  I am very glad of this outcome and of the new perspective on ABC it produces.  For one thing, it concentrates on the selection of summary statistics from a more econometrics than usual point of view, defining asymptotic sufficiency in this context and demonstrated that both asymptotic sufficiency and Bayes consistency can be achieved when using maximum likelihood estimators of the parameters of an auxiliary model as summary statistics. In addition, the proximity to (asymptotic) sufficiency yielded by the MLE is replicated by the score vector. Using the score instead of the MLE as a summary statistics allows for huge gains in terms of speed. The method is then applied to a continuous time state space model, using as auxiliary model an augmented unscented Kalman filter. We also found in the various state space models tested therein that the ABC approach based on the marginal [likelihood] score was performing quite well, including wrt Fearnhead’s and Prangle’s (2012) approach… I like the idea of using such a generic object as the unscented Kalman filter for state space models, even when it is not a particularly accurate representation of the true model. Another appealing feature of the paper is in the connections made with indirect inference.

Statistical modeling and computation [apologies]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on June 11, 2014 by xi'an

In my book review of the recent book by Dirk Kroese and Joshua Chan,  Statistical Modeling and Computation, I mistakenly and persistently typed the name of the second author as Joshua Chen. This typo alas made it to the printed and on-line versions of the subsequent CHANCE 27(2) column. I am thus very much sorry for this mistake of mine and most sincerely apologise to the authors. Indeed, it always annoys me to have my name mistyped (usually as Roberts!) in references.  [If nothing else, this typo signals it is high time for a change of my prescription glasses.]

Statistical modeling and computation [book review]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , on January 22, 2014 by xi'an

Dirk Kroese (from UQ, Brisbane) and Joshua Chan (from ANU, Canberra) just published a book entitled Statistical Modeling and Computation, distributed by Springer-Verlag (I cannot tell which series it is part of from the cover or frontpages…) The book is intended mostly for an undergrad audience (or for graduate students with no probability or statistics background). Given that prerequisite, Statistical Modeling and Computation is fairly standard in that it recalls probability basics, the principles of statistical inference, and classical parametric models. In a third part, the authors cover “advanced models” like generalised linear models, time series and state-space models. The specificity of the book lies in the inclusion of simulation methods, in particular MCMC methods, and illustrations by Matlab code boxes. (Codes that are available on the companion website, along with R translations.) It thus has a lot in common with our Bayesian Essentials with R, meaning that I am not the most appropriate or least unbiased reviewer for this book. Continue reading