efficient MCMC sampling

Posted in Statistics with tags , , , on June 24, 2019 by xi'an

Maxime Vono, Daniel Paulin and Arnaud Doucet recently arXived a paper about a regularisation technique that allows for efficient sampling from a complex posterior which potential function factorises as a large sum of transforms of linear projections of the parameter θ

$U(\theta)=\sum_i U_i(A_i\theta)$

The central idea in the paper [which was new to me] is to introduce auxiliary variates for the different terms in the sum, replacing the projections in the transforms, with an additional regularisation forcing these auxiliary variates to be as close as possible from the corresponding projection

$U(\theta,\mathbf z)=\sum_i U_i(z_i)+\varrho^{-1}||z_i-A_i\theta||^2$

This is only an approximation to the true target but it enjoys the possibility to run a massive Gibbs sampler in quite a reduced dimension. As the variance ρ of the regularisation term goes to zero the marginal posterior on the parameter θ converges to the true posterior. The authors manage to achieve precise convergence rates both in total variation and in Wasserstein distance.

From a practical point of view, only judging from the logistic example, it is hard to fathom how much this approach improves upon other approaches (provided they still apply) as the impact of the value of ρ should be assessed on top of the convergence of the high-dimensional Gibbs sampler. Or is there an annealing version in the pipe-line? While parallelisation is a major argument, it also seems that the Gibbs sampler need a central monitoring for each new simulation of θ. Unless some asynchronous version can be implemented.

skipping sampler

Posted in Books, Statistics, University life with tags , , , , on June 13, 2019 by xi'an

“The Skipping Sampler is an adaptation of the MH algorithm designed to sample from targets which have areas of zero density. It ‘skips’ across such areas, much as a flat stone can skip or skim repeatedly across the surface of water.”

An interesting challenge is simulating from a density restricted to a set C when little is known about C, apart from a mean to check whether or not a given value x is in C or not. John Moriarty, Jure Vogrinc (University of Warwick), and Alessandro Zocca make a new proposal to address this problem in a recently arXived paper. Which somewhat reminded me of the delayed rejection methods proposed by Antonietta Mira. And of our pinball sampler.

The paper spends a large amount of space about transferring from the Euclidean representation of the symmetric proposal density q to its polar representation. Which is rather trivial, but brings the questions of the efficient polar proposals  and of selecting the right type of Euclidean distance for the intended target. The method proposed therein is to select a direction first and keep skipping step by step in that direction until the set C is met again (re-entered). Or until a stopping (halting) boundary has been hit. This makes for a more complex proposal than usual but somewhat surprisingly the symmetry in q is sufficient to make the acceptance probability only depend on the target density.

While the convergence is properly established, I wonder at the practicality of the approach when compared with a regular random walk Metropolis algorithm in that both require a scaling to the jump that relates to the support of the target. Neither too small nor too large. If the set C is that unknown that only local (in or out) information is available, scaling of the jumps (and of the stopping rule) may prove problematic. In equivalent ways for both samplers. In a completely blind exploration, sequential (or population) Monte Carlo would seem more appropriate, at least to learn about the scale of jumps and location of the set C. If this set is defined as an intersection of constraints, a tempered (and sequential) solution would be helpful.  When checking the appurtenance to C becomes a computational challenge, more advance schemes have to be constructed, I would think.

robust Bayesian synthetic likelihood

Posted in Statistics with tags , , , , , , , , , , , , , on May 16, 2019 by xi'an

David Frazier (Monash University) and Chris Drovandi (QUT) have recently come up with a robustness study of Bayesian synthetic likelihood that somehow mirrors our own work with David. In a sense, Bayesian synthetic likelihood is definitely misspecified from the start in assuming a Normal distribution on the summary statistics. When the data generating process is misspecified, even were the Normal distribution the “true” model or an appropriately converging pseudo-likelihood, the simulation based evaluation of the first two moments of the Normal is biased. Of course, for a choice of a summary statistic with limited information, the model can still be weakly compatible with the data in that there exists a pseudo-true value of the parameter θ⁰ for which the synthetic mean μ(θ⁰) is the mean of the statistics. (Sorry if this explanation of mine sounds unclear!) Or rather the Monte Carlo estimate of μ(θ⁰) coincidences with that mean.The same Normal toy example as in our paper leads to very poor performances in the MCMC exploration of the (unsympathetic) synthetic target. The robustification of the approach as proposed in the paper is to bring in an extra parameter to correct for the bias in the mean, using an additional Laplace prior on the bias to aim at sparsity. Or the same for the variance matrix towards inflating it. This over-parameterisation of the model obviously avoids the MCMC to get stuck (when implementing a random walk Metropolis with the target as a scale).

MCMC importance samplers for intractable likelihoods

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , on May 3, 2019 by xi'an

Jordan Franks just posted on arXiv his PhD dissertation at the University of Jyväskylä, where he discuses several of his works:

1. M. Vihola, J. Helske, and J. Franks. Importance sampling type estimators based on approximate marginal MCMC. Preprint arXiv:1609.02541v5, 2016.
2. J. Franks and M. Vihola. Importance sampling correction versus standard averages of reversible MCMCs in terms of the asymptotic variance. Preprint arXiv:1706.09873v4, 2017.
3. J. Franks, A. Jasra, K. J. H. Law and M. Vihola.Unbiased inference for discretely observed hidden Markov model diffusions. Preprint arXiv:1807.10259v4, 2018.
4. M. Vihola and J. Franks. On the use of ABC-MCMC with inflated tolerance and post-correction. Preprint arXiv:1902.00412, 2019

focusing on accelerated approximate MCMC (in the sense of pseudo-marginal MCMC) and delayed acceptance (as in our recently accepted paper). Comparing delayed acceptance with MCMC importance sampling to the advantage of the later. And discussing the choice of the tolerance sequence for ABC-MCMC. (Although I did not get from the thesis itself the target of the improvement discussed.)

likelihood free nested sampling

Posted in Books, Statistics with tags , , , , , , , , , , , on April 26, 2019 by xi'an

A recent paper by Mikelson and Khammash found on bioRxiv considers the (paradoxical?) mixture of nested sampling and intractable likelihood. They however cover only the case when a particle filter or another unbiased estimator of the likelihood function can be found. Unless I am missing something in the paper, this seems a very costly and convoluted approach when pseudo-marginal MCMC is available. Or the rather substantial literature on computational approaches to state-space models. Furthermore simulating under the lower likelihood constraint gets even more intricate than for standard nested sampling as the parameter space is augmented with the likelihood estimator as an extra variable. And this makes a constrained simulation the harder, to the point that the paper need resort to a Dirichlet process Gaussian mixture approximation of the constrained density. It thus sounds quite an intricate approach to the problem. (For one of the realistic examples, the authors mention a 12 hour computation on a 48 core cluster. Producing an approximation of the evidence that is not unarguably stabilised, contrary to the above.) Once again, not being completely up-to-date in sequential Monte Carlo, I may miss a difficulty in analysing such models with other methods, but the proposal seems to be highly demanding with respect to the target.

EntropyMCMC [R package]

Posted in Statistics with tags , , , , , , , , , , , , on March 26, 2019 by xi'an

My colleague from the Université d’Orléans, Didier Chauveau, has just published on CRAN a new R package called EntropyMCMC, which contains convergence assessment tools for MCMC algorithms, based on non-parametric estimates of the Kullback-Leibler divergence between current distribution and target. (A while ago, quite a while ago!, we actually collaborated with a few others on the Springer-Verlag Lecture Note #135 Discretization and MCMC convergence assessments.) This follows from a series of papers by Didier Chauveau and Pierre Vandekerkhove that started with a nearest neighbour entropy estimate. The evaluation of this entropy is based on N iid (parallel) chains, which involves a parallel implementation. While the missing normalising constant is overwhelmingly unknown, the authors this is not a major issue “since we are mostly interested in the stabilization” of the entropy distance. Or in the comparison of two MCMC algorithms. [Disclaimer: I have not experimented with the package so far, hence cannot vouch for its performances over large dimensions or problematic targets, but would as usual welcome comments and feedback on readers’ experiences.]

asymptotics of synthetic likelihood

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , on March 11, 2019 by xi'an

David Nott, Chris Drovandi and Robert Kohn just arXived a paper on a comparison between ABC and synthetic likelihood, which is both interesting and timely given that synthetic likelihood seems to be lacking behind in terms of theoretical evaluation. I am however as puzzled by the results therein as I was by the earlier paper by Price et al. on the same topic. Maybe due to the Cambodia jetlag, which is where and when I read the paper.

My puzzlement, thus, comes from the difficulty in comparing both approaches on a strictly common ground. The paper first establishes convergence and asymptotic normality for synthetic likelihood, based on the 2003 MCMC paper of Chernozukov and Hong [which I never studied in details but that appears like the MCMC reference in the econometrics literature]. The results are similar to recent ABC convergence results, unsurprisingly when assuming a CLT on the summary statistic vector. One additional dimension of the paper is to consider convergence for a misspecified covariance matrix in the synthetic likelihood [and it will come back with a revenge]. And asymptotic normality of the synthetic score function. Which is obviously unavailable in intractable models.

The first point I have difficulty with is how the computing time required for approximating mean and variance in the synthetic likelihood, by Monte Carlo means, is not accounted for in the comparison between ABC and synthetic likelihood versions. Remember that ABC only requires one (or at most two) pseudo-samples per parameter simulation. The latter requires M, which is later constrained to increase to infinity with the sample size. Simulations that are usually the costliest in the algorithms. If ABC were to use M simulated samples as well, since it already relies on a kernel, it could as well construct [at least on principle] a similar estimator of the [summary statistic] density. Or else produce M times more pairs (parameter x pseudo-sample). The authors pointed out (once this post out) that they do account for the factor M when computing the effective sample size (before Lemma 4, page 12), but I still miss why the ESS converging to N=MN/M when M goes to infinity is such a positive feature.

Another point deals with the use of multiple approximate posteriors in the comparison. Since the approximations differ, it is unclear that convergence to a given approximation is all that should matter, if the approximation is less efficient [when compared with the original and out-of-reach posterior distribution]. Especially for a finite sample size n. This chasm in the targets becomes more evident when the authors discuss the use of a constrained synthetic likelihood covariance matrix towards requiring less pseudo-samples, i.e. lower values of M, because of a smaller number of parameters to estimate. This should be balanced against the loss in concentration of the synthetic approximation, as exemplified by the realistic examples in the paper. (It is also hard to see why M could be not of order √n for Monte Carlo reasons.)

The last section in the paper is revolving around diverse issues for misspecified models, from wrong covariance matrix to wrong generating model. As we just submitted a paper on ABC for misspecified models, I will not engage into a debate on this point but find the proposed strategy that goes through an approximation of the log-likelihood surface by a Gaussian process and a derivation of the covariance matrix of the score function apparently greedy in both calibration and computing. And not so clearly validated when the generating model is misspecified.