As I was asked to write a chapter on MCMC methods for an incoming Handbook of Computational Statistics and Data Science, published by Wiley, rather than cautiously declining!, I decided to recycle the answers I wrote on X validated to what I considered to be the most characteristic misunderstandings about MCMC and other computing methods, using as background the introduction produced by Wu Changye in his PhD thesis. Waiting for the opinion of the editors of the Handbook on this Q&A style. The outcome is certainly lighter than other recent surveys like the one we wrote with Peter Green, Krys Latuszinski, and Marcelo Pereyra, for Statistics and Computing, or the one with Victor Elvira, Nick Tawn, and Changye Wu.
Archive for Statistics and Computing
MCMC, with common misunderstandings
Posted in Books, pictures, R, Statistics, University life with tags ABC, Bayesian computing, computational statistics, Gibbs sampling, Handbook of Computational Statistics and Data Science, HMC, IMS Lawrence D. Brown PhD Student Award, MCMC, PhD thesis, Q&A format, Statistics and Computing, survey, variational Bayes methods on January 27, 2020 by xi'anSpringer no more!
Posted in Books, Kids, Statistics, University life with tags French universities, open access, PCI Comput Stat, predatory publishing, SpringerVerlag, Statistics and Computing on April 4, 2018 by xi'anJust learned that, starting from tomorrow night, I will not have access to any of the Springer journals, as the negotiations between the consortium of French universities, research institutes, higher educations schools, and museums, failed. The commercial published refusing to stem the ever increasing fees, while happily taking in the fast increasing open access fees it pressures from authors, a unique example of triple taxation (researchers’ salaries, open access duties, and enormous nonnegotiable subscription rates for the whole package of journals)… Following their German counterparts. Well, this is an opportunity for the boards of all these journals to withdraw and create the phantom version of their formal journal, evaluating and reviewing papers already available on arXiv! And I should definitely get my acts together, rise from my winteriscoming lethargy, and launch PCI Comput Stat now!!!
parameter space for mixture models
Posted in Statistics, University life with tags mixtures of distributions, reparameterisation, Statistics and Computing, Taylor expansion on March 24, 2017 by xi'an“The paper defines a new solution to the problem of defining a suitable parameter space for mixture models.”
When I received the table of contents of the incoming Statistics & Computing and saw a paper by V. Maroufy and P. Marriott about the above, I was quite excited about a new approach to mixture parameterisation. Especially after our recent reposting of the weakly informative reparameterisation paper. Alas, after reading the paper, I fail to see the (statistical) point of the whole exercise.
Starting from the basic fact that mixtures face many identifiability issues, not only invariance by component permutation, but the possibility to add spurious components as well, the authors move to an entirely different galaxy by defining mixtures of socalled local mixtures. Developed by one of the authors. The notion is just incomprehensible for me: the object is a weighted sum of the basic component of the original mixture, e.g., a Normal density, and of k of its derivatives wrt its mean, a sort of parameterised Taylor expansion. Which implies the parameter is unidimensional, incidentally. The weights of this strange mixture are furthermore constrained by the positivity of the resulting mixture, a constraint that seems impossible to satisfy in the Normal case when the number of derivatives is odd. And hard to analyse in any case since possibly negative components do not enjoy an interpretation as a probability density. In exponential families, the local mixture is the original exponential family density multiplied by a polynomial. The current paper moves one step further [from the reasonable] by considering mixtures [in the standard sense] of such objects. Which components are parameterised by their mean parameter and a collection of weights. The authors then restrict the mean parameters to belong to a finite and fixed set, which elements are coerced by a maximum error rate on any compound distribution derived from this exponential family structure. The remainder of the paper discusses of the choice of the mean parameters and of an EM algorithm to estimate the parameters, with a confusing lower bound on the mixture weights that impacts the estimation of the weights. And no mention made of the positivity constraint. I remain completely bemused by the paper and its purpose: I do not even fathom how this qualifies as a mixture.
Statistics & Computing [toc]
Posted in Books, Statistics with tags academic journals, Bayesian computation, Monte Carlo Statistical Methods, SpringerVerlag, Statistics and Computing on June 29, 2016 by xi'anThe latest [June] issue of Statistics & Computing is full of interesting Bayesian and Monte Carlo entries, some of which are even open access!

Computation of Gaussian orthant probabilities in high dimension
James Ridgway Pages 899916Download PDF (998KB) View Article
more of the same!
Posted in Books, pictures, Statistics, University life with tags AISTATS 2016, Gibbs sampling, ICLR 2016, JAGS, latent variable, MAP estimators, Monte Carlos Statistical Methods, simulated annealing, Statistics and Computing on December 10, 2015 by xi'anDaniel Seita, Haoyu Chen, and John Canny arXived last week a paper entitled “Fast parallel SAME Gibbs sampling on general discrete Bayesian networks“. The distributions of the observables are defined by full conditional probability tables on the nodes of a graphical model. The distributions on the latent or missing nodes of the network are multinomial, with Dirichlet priors. To derive the MAP in such models, although this goal is not explicitly stated in the paper till the second page, the authors refer to the recent paper by Zhao et al. (2015), discussed on the ‘Og just as recently, which applies our SAME methodology. Since the paper is mostly computational (and submitted to ICLR 2016, which takes place juuust before AISTATS 2016), I do not have much to comment about it. Except to notice that the authors mention our paper as “Technical report, Statistics and Computing, 2002”. I am not sure the editor of Statistics and Computing will appreciate! The proper reference is in Statistics and Computing, 12:7784, 2002.
“We argue that SAME is beneficial for Gibbs sampling because it helps to reduce excess variance.”
Still, I am a wee bit surprised at both the above statement and at the comparison with a JAGS implementation. Because SAME augments the number of latent vectors as the number of iterations increases, so should be slower by a mere curse of dimension,, slower than a regular Gibbs with a single latent vector. And because I do not get either the connection with JAGS: SAME could be programmed in JAGS, couldn’t it? If the authors means a regular Gibbs sampler with no latent vector augmentation, the comparison makes little sense as one algorithm aims at the MAP (with a modest five replicas), while the other encompasses the complete posterior distribution. But this sounds unlikely when considering that the larger the number m of replicas the better their alternative to JAGS. It would thus be interesting to understand what the authors mean by JAGS in this setup!
Statistics and Computing special issue on BNP
Posted in Books, Statistics, University life with tags algorithms, call for papers, machine learning, modelling, nonparametric statistics, special issue, Statistics and Computing on June 16, 2015 by xi'an[verbatim from the call for papers:]
Statistics and Computing is preparing a special issue on Bayesian Nonparametrics, for publication by early 2016. We invite researchers to submit manuscripts for publication in the special issue. We expect that the focus theme will increase the visibility and impact of papers in the volume.
By making use of infinitedimensional mathematical structures, Bayesian nonparametric statistics allows the complexity of a learned model to grow as the size of a data set grows. This flexibility can be particularly suited to modern data sets but can also present a number of computational and modelling challenges. In this special issue, we will showcase novel applications of Bayesian nonparametric models, new computational tools and algorithms for learning these models, and new models for the diverse structures and relations that may be present in data.
To submit to the special issue, please use the Statistics and Computing online submission system. To indicate consideration for the special issue, choose “Special Issue: Bayesian Nonparametrics” as the article type. Papers must be prepared in accordance with the Statistics and Computing journal guidelines.
Papers will go through the usual peer review process. The special issue website will be updated with any relevant deadlines and information.
Deadline for manuscript submission: August 20, 2015
scalable Bayesian inference for the inverse temperature of a hidden Potts model
Posted in Books, R, Statistics, University life with tags ABC, Approximate Bayesian computation, Australia, Brisbane, exchange algorithm, Ising model, JCGS, path sampling, Potts model, pseudolikelihood, QUT, Statistics and Computing on April 7, 2015 by xi'anMatt Moores, Tony Pettitt, and Kerrie Mengersen arXived a paper yesterday comparing different computational approaches to the processing of hidden Potts models and of the intractable normalising constant in the Potts model. This is a very interesting paper, first because it provides a comprehensive survey of the main methods used in handling this annoying normalising constant Z(β), namely pseudolikelihood, the exchange algorithm, path sampling (a.k.a., thermal integration), and ABC. A massive simulation experiment with individual simulation times up to 400 hours leads to select path sampling (what else?!) as the (XL) method of choice. Thanks to a precomputation of the expectation of the sufficient statistic E[S(Z)β]. I just wonder why the same was not done for ABC, as in the recent Statistics and Computing paper we wrote with Matt and Kerrie. As it happens, I was actually discussing yesterday in Columbia of potential if huge improvements in processing Ising and Potts models by approximating first the distribution of S(X) for some or all β before launching ABC or the exchange algorithm. (In fact, this is a more generic desiderata for all ABC methods that simulating directly if approximately the summary statistics would being huge gains in computing time, thus possible in final precision.) Simulating the distribution of the summary and sufficient Potts statistic S(X) reduces to simulating this distribution with a null correlation, as exploited in Cucala and Marin (2013, JCGS, Special ICMS issue). However, there does not seem to be an efficient way to do so, i.e. without reverting to simulating the entire grid X…