**I**n connection with the recent PhD thesis defence of Juliette Chevallier, in which I took a somewhat virtual part for being physically in Warwick, I read a paper she wrote with Stéphanie Allassonnière on stochastic approximation versions of the EM algorithm. Computing the MAP estimator can be done via some adapted for simulated annealing versions of EM, possibly using MCMC as for instance in the Monolix software and its MCMC-SAEM algorithm. Where SA stands sometimes for stochastic approximation and sometimes for simulated annealing, originally developed by Gilles Celeux and Jean Diebolt, then reframed by Marc Lavielle and Eric Moulines [friends and coauthors]. With an MCMC step because the simulation of the latent variables involves an untractable normalising constant. (Contrary to this paper, Umberto Picchini and Adeline Samson proposed in 2015 a genuine ABC version of this approach, paper that I thought I missed—although I now remember discussing it with Adeline at JSM in Seattle—, ABC is used as a substitute for the conditional distribution of the latent variables given data and parameter. To be used as a substitute for the Q step of the (SA)EM algorithm. One more approximation step and one more simulation step and we would reach a form of ABC-Gibbs!) In this version, there are very few assumptions made on the approximation sequence, except that it converges with the iteration index to the true distribution (for a fixed observed sample) if convergence of ABC-SAEM is to happen. The paper takes as an illustrative sequence a collection of tempered versions of the true conditionals, but this is quite formal as I cannot fathom a feasible simulation from the tempered version and not from the untempered one. It is thus much more a version of tempered SAEM than truly connected with ABC (although a genuine ABC-EM version could be envisioned).

## Archive for JSM 2015

## ABC-SAEM

Posted in Books, Statistics, University life with tags ABC, ABC-Gibbs, ABC-MCMC, Alan Turing, École Polytechnique, EM, JSM 2015, MAP estimators, MCMC, MCMC-SAEM, Monolix, Paris-Saclay campus, PhD thesis, SAEM, Seattle, simulated annealing, stochastic approximation, University of Warwick, well-tempered algorithm on October 8, 2019 by xi'an## scalable Langevin exact algorithm

Posted in Books, Statistics, Travel, University life with tags Brownian motion, control variate, importance sampling, JSM 2015, Langevin diffusion, normalising constant, Poisson process, quasi-stationary distribution, scalable MCMC, Seattle, sequential Monte Carlo, University of Warwick on October 18, 2016 by xi'an

“By employing a modification to existing naïve subsampling techniques we can obtain an algorithm which is still exact but has sub-linear iterative cost as a function of data size.”

**A** few weeks ago Murray Pollock, Paul Fearnhead, Adam Johansen and Gareth Roberts (all from Warwick except for Paul) arXived a paper The Scalable Langevin Exact Algorithm: Bayesian Inference for Big Data. (This was also the topic of Murray’s talk last year at JSM in Seattle.) One major advance found in the paper is the derivation of an “exact” algorithm that is sub-linear in the data size. As discussed in the introduction, the current approaches to large data problems either suffer from being approximate (like divide-and-conquer methods) or do not achieve significant reduction in the computing time, being of order O(n). The authors mention Teh and Welling (2011) sand their tochastic gradient approximation to the Langevin diffusion, when the gradient is based on a subsample. Without the Metropolis correction that would ensure an exact target but at a cost of order O(n). (Which makes the technique rather difficult to recommend.)

A novel [for me] notion at the core of this paper is the concept of *quasi-stationary distribution*, which is the limiting distribution of a Markov chain X[t] conditional on a Markov stopping time [being larger than t]. The approach is based on diffusions with appropriate stationary distributions like the Langevin diffusion. (Actually, as in most papers I have read and remember, the current paper only considers the Langevin diffusion.) In order to avoid the issues with unadjusted and Metropolis-adjusted Langevin schemes, a killed Brownian motion is created, which means a Brownian motion conditional of being alive till time T when the instantaneous killing rate is a given function of the chain, Φ(X[t]), related with the stationary measure of the Langevin diffusion ν. Under appropriate conditions, the density of this killed Brownian motion converges [in T] to √ν. Which immediately hints at creating a new Langevin diffusion targeting ν² instead of ν. And killing it with the proper rate, which can be done by thinning a Poisson process. Simulating the original path can be done by path-space rejection sampling, following the technique set by Gareth Roberts and co-authors more than ten years ago. Based on finite dimensional realisations of the path on [0,T]. And including the killing part can be done by importance sampling and checking that the simulated killing time is larger than the current (exponentially simulated) time.

One practical difficulty in the implementation of this neat principle is the derivation of the normalising constant, which evaluation degrades with the time horizon T. The solution adopted in the paper is through a sequential Monte Carlo method, using another discretisation of the time interval [0,T] (relying on the original one would get too costly?). As for subsampling, since the survival probability for the Brownian motion is based on an unbiased estimator, subsampling does not hurt if conducted in a random manner. Although this increases the variance on principle, the use of a control variate computed just once helps in reducing the complexity to O(1).

This is a tough paper and I have not gone through the effort of trying to implement it, but this is an original and innovative construct I would like to monitor in further details on a toy example, maybe next week while in Warwick. Or at least to discuss it with the authors.

## superintelligence [book review]

Posted in Books, Statistics, Travel, University life with tags 2001: A Space Odyssey, AIs, artificial intelligence, Bill Gates, Chicken Run, doomsday argument, ethics, HAL, intelligence, Isaac Asimov, JSM 2015, morality, Nick Bostrom, Philip K. DIck, Seattle on November 28, 2015 by xi'an

“The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.”I.J. Good

**I** saw the nice cover of Superintelligence: paths, dangers, strategies by Nick Bostrom [owling at me!] at the OUP booth at JSM this summer—nice owl cover that comes will a little philosophical fable at the beginning about sparrows—and, after reading an in-depth review [in English] by Olle Häggström, on Häggström hävdar, asked OUP for a review copy. Which they sent immediately. The reason why I got (so) interested in the book is that I am quite surprised at the level of alertness about the dangers of artificial intelligence (or computer intelligence) taking over. As reported in an earlier blog, and with no expertise whatsoever in the field, I was not and am not convinced that the uncontrolled and exponential rise of non-human or non-completely human intelligences is the number One entry in Doom Day scenarios. (As made clear by Radford Neal and Corey Yanovsky in their comments, I know nothing worth reporting about those issues, but remain presumably irrationally more concerned about climate change and/or a return to barbarity than by the incoming reign of the machines.) Thus, having no competence in the least in either intelligence (!), artificial or human, or in philosophy and ethics, the following comments on the book only reflect my neophyte’s reactions. *Which means the following rant should be mostly ignored! Except maybe on a rainy day like today…*

“The ideal is that of the perfect Bayesian agent, one that makes probabilistically optimal use of available information. This idea is unattainable (…) Accordingly, one can view artificial intelligence as a quest to find shortcuts…” (p.9)

Overall, the book stands much more at a philosophical and exploratory level than at attempting any engineering or technical assessment. The graphs found within are sketches rather than outputs of carefully estimated physical processes. There is thus hardly any indication how those super AIs could be coded towards super abilities to produce paper clips (but why on Earth would we need paper clips in a world dominated by AIs?!) or to involve all resources from an entire galaxy to explore even farther. The author envisions (mostly catastrophic) scenarios that require some suspension of belief and after a while I decided to read the book mostly as a higher form of science fiction, from which a series of lower form science fiction books could easily be constructed! Some passages reminded me quite forcibly of Philip K. Dick, less of electric sheep &tc. than of Ubik, where a superpowerful AI(s) turn humans into jar brains satisfied (or ensnared) with simulated virtual realities. Much less of Asimov’s novels as robots are hardly mentioned. And the third laws of robotics dismissed as ridiculously too simplistic (and too human). Continue reading

## congrats!

Posted in Statistics, University life with tags CMU, Jerry Sacks, JSM 2015, MIT Technology Review, NISS, record linkage, Syrian civil war, University of Pittsburgh on August 24, 2015 by xi'an**T**wo items of news that reached my mailbox at about the same time: my friends and CMU coauthors Rebecca (Beka) Steorts and Steve Fienberg both received a major award in the past few days. Congrats to both of them!!! At JSM 2015, Steve got the 2015 Jerome Sacks Award for Cross-Disciplinary Research “f*or a remarkable career devoted to the development and application of statistical methodology to solve problems for the benefit of society, including aspects of human rights, privacy and confidentiality, forensics, survey and census-taking, and more; and for exceptional leadership in a variety of professional and governmental organizations, including in the founding of NISS*.” The Award is delivered by the National Institute of Statistical Sciences (NISS) in honour of Jerry Sacks. And Beka has been selected as one of the 35 innovators under 35 for 2015, a list published yearly by the

*MIT Technology Review*. In particular for her record-linkage work on estimating the number of casualties in the Syrian civil war. (Which led the

*Review*to classify her as a humanitarian rather than a visionary, which list includes two other machine learners.) Great!

## STAN [no dead end]

Posted in Books, Statistics, Travel with tags JSM 2015, London, STAN on August 22, 2015 by xi'an**M**ichael Betancourt found this street name in London and used it for his talk in Seattle. Even though he should have photoshopped the dead end symbol, which begged for my sarcastic comment during the talk…

## JSM 2015 [day #4]

Posted in pictures, Running, Statistics, Travel, University life with tags ASA, bag of little bootstraps, consistency, harmonic mean estimator, JSM 2015, Langevin diffusion, Langevin MCMC algorithm, latent variable, marginal likelihood, MCMC, Monte Carlo Statistical Methods, MrBayes, philogenic trees, R.A. Fisher, Seattle, soectral clustering, spectral gap, STAN, University of Warwick on August 13, 2015 by xi'an**M**y first session today was Markov Chain Monte Carlo for Contemporary Statistical Applications with a heap of interesting directions in MCMC research! Now, without any possible bias (!), I would definitely nominate Murray Pollock (incidentally from Warwick) as the winner for best slides, funniest presentation, and most enjoyable accent! More seriously, the scalable Langevin algorithm he developed with Paul Fearnhead, Adam Johansen, and Gareth Roberts, is quite impressive in avoiding computing costly likelihoods. With of course caveats on which targets it applies to. Murali Haran showed a new proposal to handle high dimension random effect models by a projection trick that reduces the dimension. Natesh Pillai introduced us (or at least me!) to a spectral clustering that allowed for an automated partition of the target space, itself the starting point to his parallel MCMC algorithm. Quite exciting, even though I do not perceive partitions as an ideal solution to this problem. The final talk in the session was Galin Jones’ presentation of consistency results and conditions for multivariate quantities which is a surprisingly unexplored domain. MCMC is still alive and running!

The second MCMC session of the morning, Monte Carlo Methods Facing New Challenges in Statistics and Science, was equally diverse, with Lynn Kuo’s talk on the HAWK approach, where we discovered that harmonic mean estimators are still in use, e.g., in MrBayes software employed in phylogenetic inference. The proposal to replace this awful estimator that should never be seen again (!) was rather closely related to an earlier solution of us for marginal likelihood approximation, based there on a partition of the whole space rather than an HPD region in our case… Then, Michael Betancourt brilliantly acted as a proxy for Andrew to present the STAN language, with a flashy trailer he most recently designed. Featuring Andrew as the sole actor. And with great arguments for using it, including the potential to run expectation propagation (as a way of life)*. In fine*, Faming Liang proposed a bootstrap subsampling version of the Metropolis-Hastings algorithm, where the likelihood acknowledging the resulting bias in the limiting distribution.

My first afternoon session was another entry on Statistical Phylogenetics, somewhat continued from yesterday’s session. Making me realised I had not seen a single talk on ABC for the entire meeting! The issues discussed in the session were linked with aligning sequences and comparing many trees. Again in settings where likelihoods can be computed more or less explicitly. Without any expertise in the matter, I wondered at a construction that would turn all trees, like into realizations of a continuous model. For instance by growing one branch at a time while removing the MRCA root… And maybe using a particle like method to grow trees. As an aside, Vladimir Minin told me yesterday night about genetic mutations that could switch on and off phenotypes repeatedly across generations… For instance the ability to glow in the dark for species of deep sea fish.

When stating that I did not see a single talk about ABC, I omitted Steve Fienberg’s Fisher Lecture R.A. Fisher and the Statistical ABCs, keeping the *morceau de choix* for the end! Even though of course Steve did not mention the algorithm! A was for *asymptotics*, or ancilarity, B for *Bayesian* (or biducial??), C for *causation* (or cuffiency???)… Among other germs, I appreciated that Steve mentioned my great-grand father Darmois in connection with exponential families! And the connection with Jon Wellner’s LeCam Lecture from a few days ago. And reminding us that Savage was a Fisher lecturer himself. And that Fisher introduced fiducial distributions quite early. And for defending the Bayesian perspective. Steve also set some challenges like asymptotics for networks, Bayesian model assessment (I liked the notion of stepping out of the model), and randomization when experimenting with networks. And for big data issues. And for personalized medicine, building on his cancer treatment. No trace of the ABC algorithm, obviously, but a wonderful Fisher’s lecture, also most obviously!! Bravo, Steve, keep thriving!!!

## JSM 2015 [day #3]

Posted in Books, Statistics, University life with tags ABC, astrostatistics, fiducial distribution, foundations, galaxy formation, JSM 2015, Kingman's coalescent, Mount Rainier, phylogenetic tree, population genetics, Seattle on August 12, 2015 by xi'an **M**y first morning session was about inference for philogenies. While I was expecting some developments around Kingman’s coalescent models my coauthors needed and developped ABC for, I was surprised to see models that were producing closed form (or close enough to) likelihoods. Due to strong restrictions on the population sizes and migration possibilities, as explained later to me by Vladimir Minin. No need for ABC there since MCMC was working on the species trees, with Vladimir Minin making use of [the Savage Award winner] Vinayak Rao’s approach on trees that differ from the coalescent. And enough structure to even consider and demonstrate tree identifiability in Laura Kubatko’s case.

I then stopped by the astrostatistics session as the first talk by Gwendolin Eddie was about galaxy mass estimation, a problem I may actually be working on in the Fall, but it ended up being a completely different problem and I was further surprised that the issue of whether or not the data was missing at random was not considered by the authors.

Christening a session Unifying foundation(s) may be calling for trouble, at least from me! In this spirit, Xiao Li Meng gave a talk attempting at a sort of unification of the frequentist, Bayesian, and fiducial paradigms by introducing the notion of personalized inference, which is a notion I had vaguely thought of in the past. How much or how far do you condition upon? However, I have never thought of this justifying fiducial inference in any way and Xiao Li’s lively arguments during and after the session not enough to convince me of the opposite: Prior-free does not translate into (arbitrary) choice-free. In the earlier talk about confidence distributions by Regina Liu and Minge Xie, that I partly missed for Galactic reasons, I just entered into the room at the very time when ABC was briefly described as a confidence distribution because it was not producing a convergent approximation to the exact posterior, a logic that escapes me (unless those confidence distributions are described in such a loose way as to include about any method f inference). Dongchu Sun also gave us a crash course on reference priors, with a notion of random posteriors I had not heard of before… As well as constructive posteriors… (They seemed to mean constructible matching priors as far as I understood.)

The final talk in this session by Chuanhei Liu on a new approach (modestly!) called inferential model was incomprehensible, with the speaker repeatedly stating that the principles were too hard to explain in five minutes and needed an incoming book… I later took a brief look at an associated paper, which relates to fiducial inference and to Dempster’s belief functions. For me, it has the same Münchhausen feeling of creating a probability out of nothing, creating a distribution on the parameter by ignoring the fact that the fiducial equation x=a(θ,u) modifies the distribution of u once x is observed.