**I**n connection with the recent PhD thesis defence of Juliette Chevallier, in which I took a somewhat virtual part for being physically in Warwick, I read a paper she wrote with Stéphanie Allassonnière on stochastic approximation versions of the EM algorithm. Computing the MAP estimator can be done via some adapted for simulated annealing versions of EM, possibly using MCMC as for instance in the Monolix software and its MCMC-SAEM algorithm. Where SA stands sometimes for stochastic approximation and sometimes for simulated annealing, originally developed by Gilles Celeux and Jean Diebolt, then reframed by Marc Lavielle and Eric Moulines [friends and coauthors]. With an MCMC step because the simulation of the latent variables involves an untractable normalising constant. (Contrary to this paper, Umberto Picchini and Adeline Samson proposed in 2015 a genuine ABC version of this approach, paper that I thought I missed—although I now remember discussing it with Adeline at JSM in Seattle—, ABC is used as a substitute for the conditional distribution of the latent variables given data and parameter. To be used as a substitute for the Q step of the (SA)EM algorithm. One more approximation step and one more simulation step and we would reach a form of ABC-Gibbs!) In this version, there are very few assumptions made on the approximation sequence, except that it converges with the iteration index to the true distribution (for a fixed observed sample) if convergence of ABC-SAEM is to happen. The paper takes as an illustrative sequence a collection of tempered versions of the true conditionals, but this is quite formal as I cannot fathom a feasible simulation from the tempered version and not from the untempered one. It is thus much more a version of tempered SAEM than truly connected with ABC (although a genuine ABC-EM version could be envisioned).

## Archive for stochastic approximation

## ABC-SAEM

Posted in Books, Statistics, University life with tags ABC, ABC-Gibbs, ABC-MCMC, Alan Turing, École Polytechnique, EM, JSM 2015, MAP estimators, MCMC, MCMC-SAEM, Monolix, Paris-Saclay campus, PhD thesis, SAEM, Seattle, simulated annealing, stochastic approximation, University of Warwick, well-tempered algorithm on October 8, 2019 by xi'an## Approximate Maximum Likelihood Estimation

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags ABC, Austria, Don Rubin, James Spall, Kiefer-Wolfowitz algorithm, Linz, optimisation, Peter Diggle, stochastic approximation, stochastic gradient on September 21, 2015 by xi'an**B**ertl *et al.* arXived last July a paper on a maximum likelihood estimator based on an alternative to ABC techniques. And to indirect inference. (One of the authors in *et al.* is Andreas Futschik whom I visited last year in Linz.) Paper that I only spotted when gathering references for a reading list on ABC… The method is related to the “original ABC paper” of Diggle and Gratton (1984) which, parallel to Rubin (1984), contains in retrospect the idea of ABC methods. The starting point is stochastic approximation, namely the optimisation of a function of a parameter θ when written as an expectation of a random variable Y, **E**[Y|θ], as in the Kiefer-Wolfowitz algorithm. However, in the case of the likelihood function, there is rarely an unbiased estimator and the authors propose instead to use a kernel density estimator of the density of the summary statistic. This means that, at each iteration of the Kiefer-Wolfowitz algorithm, two sets of observations and hence of summary statistics are simulated and two kernel density estimates derived, both to be applied to the observed summary. The sequences underlying the Kiefer-Wolfowitz algorithm are taken from (the excellent optimisation book of) Spall (2003). Along with on-the-go adaptation and convergence test.

The theoretical difficulty in this extension is however that the kernel density estimator is not unbiased and thus that, rigorously speaking, the validation of the Kiefer-Wolfowitz algorithm does not apply here. On the practical side, the need for multiple starting points and multiple simulations of pseudo-samples may induce considerable time overload. Especially if bootstrap is used to evaluate the precision of the MLE approximation. Besides normal and M/G/1 queue examples, the authors illustrate the approach on a population genetic dataset of Borneo and Sumatra orang-utans. With 5 parameters and 28 summary statistics. Which thus means using a kernel density estimator in dimension 28, a rather perilous adventure..!

## WSC 20[1]1

Posted in Mountains, pictures, Statistics, Travel, University life with tags ABC, Arizona, batch means, Bruce Schmeiser, Monte Carlo Statistical Methods, Phoenix, R, simulated annealing, simulation, stochastic approximation, WSC 11 on December 13, 2011 by xi'an**T**his morning I attended the “Bruce Schmeiser session” at WSC 2011. I had once a meeting with Bruce (and Jim Berger) in Purdue to talk about MCMC methods but I never interacted directly with him. The first two talks were about batch methods, which I did not know previously, and I had trouble understanding what was the problem: for a truly iid normal sample, building an optimal confidence interval on the mean relies on the sufficient statistic rather than on the batch mean variance… It is only through the second talk that I understood that neither normality nor independence was guaranteed, hence the batches. I still wonder whether or a bootstrap strategy could be used instead, given the lack of confidence in the model assumptions. The third talk was about a stochastic approximation algorithm developed by Bruce Schmeiser, called retrospective approximation, where successive and improving approximations of the target to maximise are used in order not to waste time at the beginning. I thus found the algorithm had a simulated annealing flavour, even though the connection is rather tenuous…

**T**he second session of WSC 2011 I attended was about importance sampling, The first talk was about mixtures of importance sampling distributions towards improved efficiency for cross-entropy, à la Rubinstein and Kroese. Its implementation seemed to depend very much on some inner knowledge of the target problem. The second talk was on zero-variance approximations for computing the probability that two notes are connected in a graph, using clever collapsing schemes. The third talk of the session was unrelated with the theme since it was about cross-validated non-parametric density estimation.

**M**y own session was not terribly well attended and, judging from some questions I got at the end I am still unsure I had chosen the right level. Nonetheless, I got interesting discussions afterwards which showed that ABC was also appealing to some members of the audience. And I had a long chat with Enlu Zhou, a nice assistant professor from Urbana-Champaign who was teaching out of * Monte Carlo Statistical Method*, and had challenging questions about restricted support MCMC. Overall, an interesting day, completed with a light conference dinner in the pleasant company of Jingchen Liu from Columbia and some friends of his.

## Stochastic approximation in mixtures

Posted in R, Statistics with tags J.K. Ghosh, mixtures, Newton-Raphson, Robbins-Monro, Statistical Science, stochastic approximation on February 23, 2011 by xi'an**O**n Friday, a 2008 paper on *Stochastic Approximation and Newton’s Estimate of a Mixing Distribution* by Ryan Martin and J.K. Ghosh was posted on arXiv. (I do not really see why it took so long to post on arXiv a 2008 ** Statistical Science** paper but given that it is not available on project Euclid, it may be that not all papers in

**are published immediately. Anyway, this is irrelevant to my point here!).**

*Statistical Science*The paper provides a very nice introduction to stochastic approximation methods, making the link to recent works by Christophe Andrieu, Heikki Haario, Faming Liang, Eric Moulines, Enro Saksman, and co-authors. Martin and Ghosh also reinterpret Newton-Raphson as a special case of stochastic approximation. The whole paper is very pleasant to read, quite in tune with ** Statistical Science**. I will most certainly use this material in my graduate courses and also include part of it in the revision of

*.*

**Monte Carlo Statistical Methods**## Stochastic approximation in Bristol

Posted in R, Statistics, Travel, University life with tags Banff, Bristol, stochastic approximation on September 2, 2010 by xi'an**T**his is very short notice, but for those in the vicinity and not at the RSS conference, there is a highly interesting workshop taking place in Bristol in ten days (I would certainly have gone, had I not been at the same time in Banff!):

We would like to invite you to contribute to our 3 day workshop on “Stochastic approximation: methodology, theory and applications in statistics” that will take place in the Mathematics Department of the University of Bristol (UK) from 13-15 September 2010. The aim of this workshop is to gather world specialists on stochastic approximation and its applications, who might not have the opportunity to meet otherwise, in order to present and discuss recent methodological and theoretical developments in the area, as well as applications. The current list of invited speakers who have agreed to present their work is

Michel Benaïm

Han-Fu Chen

Jayanta Ghosh

Éric Moulines

Georg Pflug

Boris Polyak

Pierre Tarres

George Yin

The workshop is part of SuSTaIn: a programme funded by the EPSRC and the University of Bristol with the ambitious goal of strengthening the discipline of Statistics in the UK, by equipping it to face the challenges of future applications. We will offer limited funding to a restricted number of participants. Priority will be given to young researchers. If you wish to participate, please fill in the registration form