**E**arlier this year, Luca Martino wrote and arXived a review on multiple try MCMC. As its name suggests, the starting point of this algorithm is to propose N potential moves simultaneously instead of one, possibly according to N different proposal (conditional) densities, and to select one by a normalised importance sampling weight. The move is accepted by a Metropolis-Hastings step based on the ratio of the normalisation constants [at the current and at the one-before-current stages]. Besides the cost of computing the summation and generating the different variates, this method also faces the drawback of requiring N-1 supplementary simulations that are only used for achieving detailed balance and computing a backward summation of importance weights. (A first section of the review is dedicated to independent Metropolis-Hastings proposals, q(θ), which make life simpler, but are less realistic in my opinion since some prior knowledge or experimentation is necessary to build a relevant distribution q(θ).) An alternative covered in the survey is ensemble Monte Carlo (Neal, 2011), which produces a whole sample at each iteration, with target the product of the initial targets. This reminded me of our pinball sampler, which aimed at producing a spread-out sample while keeping the marginal correct. Although the motivation sounds closer to a particle sampler. Especially with this associated notion of an empirical approximation of the target. The next part of the review is about delayed rejection, which is a natural alternative approach to speeding up MCMC by considering several possibilities, if sequentially. Started in Antonietta Mira‘s 1999 PhD thesis. The difficulty with this approach is that the acceptance probability gets increasingly complex as the number of delays grows, which may annihilate its appeal relative to simultaneous multiple tries.

## Archive for particle filter

## MCMC with multiple tries

Posted in Books, pictures, Statistics, University life with tags All Blacks, delayed acceptance, ensemble Monte Carlo, MCMC, Monte Carlo Statistical Methods, multiple-try Metropolis algorithm, particle filter, population Monte Carlo, rugby, survey on April 5, 2018 by xi'an## stability of noisy Metropolis-Hastings

Posted in Statistics with tags Markov chain, Markov chain Monte Carlo algorithm, MCMC convergence, particle filter, pseudo-marginal MCMC, sequential Monte Carlo, University of Warwick on September 28, 2016 by xi'an**F**elipe Medina-Aguayo, Antony Lee and Gareth Roberts (all at Warwick University) have recently published—even though the paper was accepted a year ago—a paper in Statistics and Computing about a variant to the pseudo-marginal Metropolis-Hastings algorithm. The modification is to simulate an estimate of the likelihood or posterior at the current value of the Markov chain at every iteration, rather than reproducing the current estimate. The reason for this refreshment of the weight estimate is to prevent stickiness in the chain, when a random weight leads to a very high value of the posterior. Unfortunately, this change leads to a Markov chain with the wrong stationary distribution. When this stationary exists! The paper actually produces examples of transient noisy chains, even in simple cases such as a geometric target distribution. And even when taking the average of a large number of weights. But the paper also contains sufficient conditions, like negative weight moments or uniform ergodicity of the proposal, for the noisy chain to be geometrically ergodic. Even though the applicability of those conditions to complex targets is not always obvious.

## multilevel Monte Carlo for estimating constants

Posted in Books, Statistics, University life with tags multilevel Monte Carlo, normalising constant, particle filter, sequential Monte Carlo, telescoping formula, unbiased estimation on March 18, 2016 by xi'an**P**ierre Del Moral, Ajay Jasra, Kody Law, and Yan Zhou just arXived a paper entitled Sequential Monte Carlo samplers for normalizing constants. Which obviously attracted my interest! The context is one of a sequential Monte Carlo problem, with an associated sequence of targets and of attached normalising constants. While the quantity of interest only relates to the final distribution in the sequence, using Mike Giles‘ multilevel Monte Carlo approach allows for a more accurate estimation and recycling all the past particles, thanks to the telescoping formula. And the sequential representation also allows for an unbiased estimator, as is well known in the sequential Monte Carlo literature. The paper derives accurate bounds on both the variances of two normalisation constant estimators and the costs of producing such estimators (assuming there is an index typo in Corollary 3.1, where L-2 should be L-1). The improvement when compared with traditional SMC is clear on the example contained in the paper. As I read the paper rather quickly and without much attention to the notations, I may have missed the point, but I did not see any conclusion on the choice of the particle population size at each iteration of the SMC. After asking Ajay about it, he pointed out that this size can be derived as

(with notations taken from the paper).

## at CIRM [#3]

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life with tags ABC, ABC-SMC, Bayesian statistics, CIRM, component of a mixture, cross validated, expectation-propagation, high dimensions, identifiability, Luminy, Marseille, MCMC, Mont Puget, Monte Carlo Statistical Methods, particle filter, particle Gibbs sampler, summer school on March 4, 2016 by xi'an**S**imon Barthelmé gave his mini-course on EP, with loads of details on the implementation of the method. Focussing on the EP-ABC and MCMC-EP versions today. Leaving open the difficulty of assessing to which limit EP is converging. But mentioning the potential for asynchronous EP (on which I would like to hear more). Ironically using several times a logistic regression example, if not on the Pima Indians benchmark! He also talked about approximate EP solutions that relate to consensus MCMC. With a connection to Mark Beaumont’s talk at NIPS [at the time as mine!] on the comparison with ABC. While we saw several talks on EP during this week, I am still agnostic about the potential of the approach. It certainly produces a fast proxy to the true posterior and hence can be exploited *ad nauseam* in inference methods based on pseudo-models like indirect inference. In conjunction with other quick and dirty approximations when available. As in ABC, it would be most useful to know how far from the (ideal) posterior distribution does the approximation stands. Machine learning approaches presumably allow for an evaluation of the predictive performances, but less so for the modelling accuracy, even with new sampling steps. [But I know nothing, I know!]

Dennis Prangle presented some on-going research on high dimension [data] ABC. Raising the question of what is the true meaning of dimension in ABC algorithms. Or of sample size. Because the inference relies on the event d(s(y),s(y’))≤ξ or on the likelihood l(θ|x). Both one-dimensional. Mentioning Iain Murray’s talk at NIPS [that I also missed]. Re-expressing as well the perspective that ABC can be seen as a missing or estimated normalising constant problem as in Bornn et al. (2015) I discussed earlier. The central idea is to use SMC to simulate a particle cloud evolving as the target tolerance ξ decreases. Which supposes a latent variable structure lurking in the background.

Judith Rousseau gave her talk on non-parametric mixtures and the possibility to learn parametrically about the component weights. Starting with a rather “magic” result by Allman et al. (2009) that three repeated observations per individual, all terms in a mixture are identifiable. Maybe related to that simpler fact that mixtures of Bernoullis are not identifiable while mixtures of Binomial are identifiable, even when n=2. As “shown” in this plot made for X validated. Actually truly related because Allman et al. (2009) prove identifiability through a finite dimensional model. (I am surprised I missed this most interesting paper!) With the side condition that a mixture of p components made of r Bernoulli products is identifiable when p ≥ 2[log² r] +1, when log² is base 2-logarithm. And [x] the upper rounding. I also find most relevant this distinction between the weights and the remainder of the mixture as weights behave quite differently, hardly parameters in a sense.

## Mathematical underpinnings of Analytics (theory and applications)

Posted in Books, Statistics, University life with tags An Essay towards solving a Problem in the Doctrine of Chances, analytics, Bayes factor, Bayesian Analysis, book review, bootstrap, CHANCE, Friedrich Nietzsche, genetic algorithm, Laplace succession rule, logistic regression, machine learning, particle filter, Patti Smith, The Bayesian Choice, Yogi Berra on September 25, 2015 by xi'an

“Today, a week or two spent reading Jaynes’ book can be a life-changing experience.” (p.8)

**I** received this book by Peter Grindrod, Mathematical underpinnings of Analytics (theory and applications), from Oxford University Press, quite a while ago. (Not that long ago since the book got published in 2015.) As a book for review for CHANCE. And let it sit on my desk and in my travel bag for the same while as it was unclear to me that it was connected with Statistics and CHANCE. What is [are?!] *analytics*?! I did not find much of a definition of *analytics* when I at last opened the book, and even less mentions of statistics or machine-learning, but Wikipedia told me the following:

“Analytics is a multidimensional discipline. There is extensive use of mathematics and statistics, the use of descriptive techniques and predictive models to gain valuable knowledge from data—data analysis. The insights from data are used to recommend action or to guide decision making rooted in business context. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology.”

Barring the absurdity of speaking of a “multidimensional discipline” [and even worse of linking with the mathematical notion of dimension!], this tells me analytics is a mix of data analysis and decision making. Hence relying on (some) statistics. Fine.

“Perhaps in ten years, time, the mathematics of behavioural analytics will be common place: every mathematics department will be doing some of it.”(p.10)

First, and to start with some positive words (!), a book that quotes both Friedrich Nietzsche and Patti Smith cannot get everything wrong! (Of course, including a most likely apocryphal quote from the now late Yogi Berra does not partake from this category!) Second, from a general perspective, I feel the book meanders its way through chapters towards a higher level of statistical consciousness, from graphs to clustering, to hidden Markov models, without precisely mentioning statistics or statistical model, while insisting very much upon Bayesian procedures and Bayesian thinking. Overall, I can relate to most items mentioned in Peter Grindrod’s book, but mostly by first reconstructing the notions behind. While I personally appreciate the distanced and often ironic tone of the book, reflecting upon the author’s experience in retail modelling, I am thus wondering at which audience Mathematical underpinnings of Analytics aims, for a practitioner would have a hard time jumping the gap between the concepts exposed therein and one’s practice, while a theoretician would require more formal and deeper entries on the topics broached by the book. I just doubt this entry will be enough to lead maths departments to adopt behavioural analytics as part of their curriculum… Continue reading

## arXiv frenzy

Posted in R, Statistics, University life with tags arXiv, Bayesian statistics, MCMC, Monte Carlo Statistical Methods, Montréal, NIPS 2015, particle filter on June 23, 2015 by xi'an**I**n the few past days, there has been so many arXiv postings of interest—presumably the NIPS submission effect!—that I cannot hope to cover them in the coming weeks! Hopefully, some will still come out on the ‘Og in a near future:

- arXiv:1506.06629: Scalable Approximations of Marginal Posteriors in Variable Selection by Willem van den Boom, Galen Reeves, David B. Dunson
- arXiv:1506.06285: The MCMC split sampler: A block Gibbs sampling scheme for latent Gaussian models by Óli Páll Geirsson, Birgir Hrafnkelsson, Daniel Simpson, Helgi Sigurðarson
*[also deserves a special mention for gathering only ***son authors!]* - arXiv:1506.06268: Bayesian Nonparametric Modeling of Higher Order Markov Chains by Abhra Sarkar, David B. Dunson
- arXiv:1506.06117: Convergence of Sequential Quasi-Monte Carlo Smoothing Algorithms by Mathieu Gerber, Nicolas Chopin
- arXiv:1506.06101: Robust Bayesian inference via coarsening by Jeffrey W. Miller, David B. Dunson
- arXiv:1506.05934: Expectation Particle Belief Propagation by Thibaut Lienart, Yee Whye Teh, Arnaud Doucet
- arXiv:1506.05860: Variational Gaussian Copula Inference by Shaobo Han, Xuejun Liao, David B. Dunson, Lawrence Carin
- arXiv:1506.05855: The Frequentist Information Criterion (FIC): The unification of information-based and frequentist inference by Colin H. LaMont, Paul A. Wiggins
- arXiv:1506.05757: Bayesian Inference for the Multivariate Extended-Skew Normal Distribution by Mathieu Gerber, Florian Pelgrin
- arXiv:1506.05741: Accelerated dimension-independent adaptive Metropolis by Yuxin Chen, David Keyes, Kody J.H. Law, Hatem Ltaief
- arXiv:1506.05269: Bayesian Survival Model based on Moment Characterization by Julyan Arbel, Antonio Lijoi, Bernardo Nipoti
- arXiv:1506.04778: Fast sampling with Gaussian scale-mixture priors in high-dimensional regression by Anirban Bhattacharya, Antik Chakraborty, Bani K. Mallick
- arXiv:1506.04416: Bayesian Dark Knowledge by Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling
*[a special mention for this title!]* - arXiv:1506.03693: Optimization Monte Carlo: Efficient and Embarrassingly Parallel Likelihood-Free Inference by Edward Meeds, Max Welling
- arXiv:1506.03074: Variational consensus Monte Carlo by Maxim Rabinovich, Elaine Angelino, Michael I. Jordan
- arXiv:1506.02564: Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families by Heiko Strathmann, Dino Sejdinovic, Samuel Livingstone, Zoltan Szabo, Arthur Gretton
*[comments coming soon!]*

## Bayesian filtering and smoothing [book review]

Posted in Books, Statistics, Travel, University life with tags book review, CHANCE, EM algorithm, filtering, IMS Textbooks, Kalman filter, MAP estimators, particle filter, particle MCMC, plagiarism, Simo Särkkä, smoothing, The Monty Hall problem on February 25, 2015 by xi'an**W**hen in Warwick last October, I met Simo Särkkä, who told me he had published an IMS monograph on Bayesian filtering and smoothing the year before. I thought it would be an appropriate book to review for CHANCE and tried to get a copy from Oxford University Press, unsuccessfully. I thus bought my own book that I received two weeks ago and took the opportunity of my Czech vacations to read it… *[A warning pre-empting accusations of self-plagiarism: this is a preliminary draft for a review to appear in CHANCE under my true name!]*

“From the Bayesian estimation point of view both the states and the static parameters are unknown (random) parameters of the system.” (p.20)

Bayesian filtering and smoothing is an introduction to the topic that essentially starts from ground zero. Chapter 1 motivates the use of filtering and smoothing through examples and highlights the naturally Bayesian approach to the problem(s). Two graphs illustrate the difference between filtering and smoothing by plotting for the same series of observations the successive confidence bands. The performances are obviously poorer with filtering but the fact that those intervals are point-wise rather than joint, i.e., that the graphs do not provide a confidence band. (The exercise section of that chapter is superfluous in that it suggests re-reading Kalman’s original paper and rephrases the Monty Hall paradox in a story unconnected with filtering!) Chapter 2 gives an introduction to Bayesian statistics in general, with a few pages on Bayesian computational methods. A first remark is that the above quote is both correct and mildly confusing in that the parameters can be consistently estimated, while the latent states cannot. A second remark is that justifying the MAP as associated with the 0-1 loss is incorrect in continuous settings. The third chapter deals with the batch updating of the posterior distribution, i.e., that the posterior at time t is the prior at time t+1. With applications to state-space systems including the Kalman filter. The fourth to sixth chapters concentrate on this Kalman filter and its extension, and I find it somewhat unsatisfactory in that the collection of such filters is overwhelming for a neophyte. And no assessment of the estimation error when the model is misspecified appears at this stage. And, as usual, I find the unscented Kalman filter hard to fathom! The same feeling applies to the smoothing chapters, from Chapter 8 to Chapter 10. Which mimic the earlier ones. Continue reading