## ABC-SAEM

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on October 8, 2019 by xi'an

In connection with the recent PhD thesis defence of Juliette Chevallier, in which I took a somewhat virtual part for being physically in Warwick, I read a paper she wrote with Stéphanie Allassonnière on stochastic approximation versions of the EM algorithm. Computing the MAP estimator can be done via some adapted for simulated annealing versions of EM, possibly using MCMC as for instance in the Monolix software and its MCMC-SAEM algorithm. Where SA stands sometimes for stochastic approximation and sometimes for simulated annealing, originally developed by Gilles Celeux and Jean Diebolt, then reframed by Marc Lavielle and Eric Moulines [friends and coauthors]. With an MCMC step because the simulation of the latent variables involves an untractable normalising constant. (Contrary to this paper, Umberto Picchini and Adeline Samson proposed in 2015 a genuine ABC version of this approach, paper that I thought I missed—although I now remember discussing it with Adeline at JSM in Seattle—, ABC is used as a substitute for the conditional distribution of the latent variables given data and parameter. To be used as a substitute for the Q step of the (SA)EM algorithm. One more approximation step and one more simulation step and we would reach a form of ABC-Gibbs!) In this version, there are very few assumptions made on the approximation sequence, except that it converges with the iteration index to the true distribution (for a fixed observed sample) if convergence of ABC-SAEM is to happen. The paper takes as an illustrative sequence a collection of tempered versions of the true conditionals, but this is quite formal as I cannot fathom a feasible simulation from the tempered version and not from the untempered one. It is thus much more a version of tempered SAEM than truly connected with ABC (although a genuine ABC-EM version could be envisioned).

## what if what???

Posted in Books, Statistics with tags , , , , , on October 7, 2019 by xi'an

[Here is a section of the Wikipedia page on Monte Carlo methods which makes little sense to me. What if it was not part of this page?!]

### Monte Carlo simulation versus “what if” scenarios

There are ways of using probabilities that are definitely not Monte Carlo simulations – for example, deterministic modeling using single-point estimates. Each uncertain variable within a model is assigned a “best guess” estimate. Scenarios (such as best, worst, or most likely case) for each input variable are chosen and the results recorded.[55]

By contrast, Monte Carlo simulations sample from a probability distribution for each variable to produce hundreds or thousands of possible outcomes. The results are analyzed to get probabilities of different outcomes occurring.[56] For example, a comparison of a spreadsheet cost construction model run using traditional “what if” scenarios, and then running the comparison again with Monte Carlo simulation and triangular probability distributions shows that the Monte Carlo analysis has a narrower range than the “what if” analysis. This is because the “what if” analysis gives equal weight to all scenarios (see quantifying uncertainty in corporate finance), while the Monte Carlo method hardly samples in the very low probability regions. The samples in such regions are called “rare events”.

## Hausdorff school on MCMC [28 March-02 April, 2020]

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , , , on September 26, 2019 by xi'an

The Hausdorff Centre for Mathematics will hold a week on recent advances in MCMC in Bonn, Germany, March 30 – April 3, 2020. Preceded by two days of tutorials. (“These tutorials will introduce basic MCMC methods and mathematical tools for studying the convergence to the invariant measure.”) There is travel support available, but the application deadline is quite close, as of 30 September.

Note that, in a Spring of German conference, the SIAM Conference on Uncertainty Quantification will take place in Munich (Garching) the week before, on March 24-27. With at least one likelihood-free session. Not to mention the ABC in Grenoble workshop in France, on 19-20 March. (Although these places are not exactly nearby!)

Posted in Statistics with tags , , , , , on August 11, 2019 by xi'an

Samuel Wiqvist and co-authors from Scandinavia have recently arXived a paper on a new version of delayed acceptance MCMC. The ADA in the novel algorithm stands for approximate and accelerated, where the approximation in the first stage is to use a Gaussian process to replace the likelihood. In our approach, we used subsets for partial likelihoods, ordering them so that the most varying sub-likelihoods were evaluated first. Furthermore, if a parameter reaches the second stage, the likelihood is not necessarily evaluated, based on the global probability that a second stage is rejected or accepted. Which of course creates an approximation. Even when using a local predictor of the probability. The outcome of a comparison in two complex models is that the delayed approach does not necessarily do better than particle MCMC in terms of effective sample size per second, since it does reject significantly more. Using various types of surrogate likelihoods and assessments of the approximation effect could boost the appeal of the method. Maybe using ABC first could suggest another surrogate?

## efficient MCMC sampling

Posted in Statistics with tags , , , on June 24, 2019 by xi'an

Maxime Vono, Daniel Paulin and Arnaud Doucet recently arXived a paper about a regularisation technique that allows for efficient sampling from a complex posterior which potential function factorises as a large sum of transforms of linear projections of the parameter θ

$U(\theta)=\sum_i U_i(A_i\theta)$

The central idea in the paper [which was new to me] is to introduce auxiliary variates for the different terms in the sum, replacing the projections in the transforms, with an additional regularisation forcing these auxiliary variates to be as close as possible from the corresponding projection

$U(\theta,\mathbf z)=\sum_i U_i(z_i)+\varrho^{-1}||z_i-A_i\theta||^2$

This is only an approximation to the true target but it enjoys the possibility to run a massive Gibbs sampler in quite a reduced dimension. As the variance ρ of the regularisation term goes to zero the marginal posterior on the parameter θ converges to the true posterior. The authors manage to achieve precise convergence rates both in total variation and in Wasserstein distance.

From a practical point of view, only judging from the logistic example, it is hard to fathom how much this approach improves upon other approaches (provided they still apply) as the impact of the value of ρ should be assessed on top of the convergence of the high-dimensional Gibbs sampler. Or is there an annealing version in the pipe-line? While parallelisation is a major argument, it also seems that the Gibbs sampler need a central monitoring for each new simulation of θ. Unless some asynchronous version can be implemented.

## skipping sampler

Posted in Books, Statistics, University life with tags , , , , on June 13, 2019 by xi'an

“The Skipping Sampler is an adaptation of the MH algorithm designed to sample from targets which have areas of zero density. It ‘skips’ across such areas, much as a flat stone can skip or skim repeatedly across the surface of water.”

An interesting challenge is simulating from a density restricted to a set C when little is known about C, apart from a mean to check whether or not a given value x is in C or not. John Moriarty, Jure Vogrinc (University of Warwick), and Alessandro Zocca make a new proposal to address this problem in a recently arXived paper. Which somewhat reminded me of the delayed rejection methods proposed by Antonietta Mira. And of our pinball sampler.

The paper spends a large amount of space about transferring from the Euclidean representation of the symmetric proposal density q to its polar representation. Which is rather trivial, but brings the questions of the efficient polar proposals  and of selecting the right type of Euclidean distance for the intended target. The method proposed therein is to select a direction first and keep skipping step by step in that direction until the set C is met again (re-entered). Or until a stopping (halting) boundary has been hit. This makes for a more complex proposal than usual but somewhat surprisingly the symmetry in q is sufficient to make the acceptance probability only depend on the target density.

While the convergence is properly established, I wonder at the practicality of the approach when compared with a regular random walk Metropolis algorithm in that both require a scaling to the jump that relates to the support of the target. Neither too small nor too large. If the set C is that unknown that only local (in or out) information is available, scaling of the jumps (and of the stopping rule) may prove problematic. In equivalent ways for both samplers. In a completely blind exploration, sequential (or population) Monte Carlo would seem more appropriate, at least to learn about the scale of jumps and location of the set C. If this set is defined as an intersection of constraints, a tempered (and sequential) solution would be helpful.  When checking the appurtenance to C becomes a computational challenge, more advance schemes have to be constructed, I would think.

## robust Bayesian synthetic likelihood

Posted in Statistics with tags , , , , , , , , , , , , , on May 16, 2019 by xi'an

David Frazier (Monash University) and Chris Drovandi (QUT) have recently come up with a robustness study of Bayesian synthetic likelihood that somehow mirrors our own work with David. In a sense, Bayesian synthetic likelihood is definitely misspecified from the start in assuming a Normal distribution on the summary statistics. When the data generating process is misspecified, even were the Normal distribution the “true” model or an appropriately converging pseudo-likelihood, the simulation based evaluation of the first two moments of the Normal is biased. Of course, for a choice of a summary statistic with limited information, the model can still be weakly compatible with the data in that there exists a pseudo-true value of the parameter θ⁰ for which the synthetic mean μ(θ⁰) is the mean of the statistics. (Sorry if this explanation of mine sounds unclear!) Or rather the Monte Carlo estimate of μ(θ⁰) coincidences with that mean.The same Normal toy example as in our paper leads to very poor performances in the MCMC exploration of the (unsympathetic) synthetic target. The robustification of the approach as proposed in the paper is to bring in an extra parameter to correct for the bias in the mean, using an additional Laplace prior on the bias to aim at sparsity. Or the same for the variance matrix towards inflating it. This over-parameterisation of the model obviously avoids the MCMC to get stuck (when implementing a random walk Metropolis with the target as a scale).