**I**n connection with the recent PhD thesis defence of Juliette Chevallier, in which I took a somewhat virtual part for being physically in Warwick, I read a paper she wrote with Stéphanie Allassonnière on stochastic approximation versions of the EM algorithm. Computing the MAP estimator can be done via some adapted for simulated annealing versions of EM, possibly using MCMC as for instance in the Monolix software and its MCMC-SAEM algorithm. Where SA stands sometimes for stochastic approximation and sometimes for simulated annealing, originally developed by Gilles Celeux and Jean Diebolt, then reframed by Marc Lavielle and Eric Moulines [friends and coauthors]. With an MCMC step because the simulation of the latent variables involves an untractable normalising constant. (Contrary to this paper, Umberto Picchini and Adeline Samson proposed in 2015 a genuine ABC version of this approach, paper that I thought I missed—although I now remember discussing it with Adeline at JSM in Seattle—, ABC is used as a substitute for the conditional distribution of the latent variables given data and parameter. To be used as a substitute for the Q step of the (SA)EM algorithm. One more approximation step and one more simulation step and we would reach a form of ABC-Gibbs!) In this version, there are very few assumptions made on the approximation sequence, except that it converges with the iteration index to the true distribution (for a fixed observed sample) if convergence of ABC-SAEM is to happen. The paper takes as an illustrative sequence a collection of tempered versions of the true conditionals, but this is quite formal as I cannot fathom a feasible simulation from the tempered version and not from the untempered one. It is thus much more a version of tempered SAEM than truly connected with ABC (although a genuine ABC-EM version could be envisioned).

## Archive for Seattle

## ABC-SAEM

Posted in Books, Statistics, University life with tags ABC, ABC-Gibbs, ABC-MCMC, Alan Turing, École Polytechnique, EM, JSM 2015, MAP estimators, MCMC, MCMC-SAEM, Monolix, Paris-Saclay campus, PhD thesis, SAEM, Seattle, simulated annealing, stochastic approximation, University of Warwick, well-tempered algorithm on October 8, 2019 by xi'an## Mount Baker in the sky [jatp]

Posted in Mountains, pictures, Travel with tags forest fire, jatp, JSM 2018, Kulshan, Mount Baker, North Cascades National Park, plane picture, Seattle, smog, volcano on September 23, 2018 by xi'an## a Bayesian interpretation of FDRs?

Posted in Statistics with tags baseball data, empirical Bayes methods, false discovery rate, FDRs, ferry harbour, FNR, hypothesis testing, multiple tests, Seattle, shrinkage estimation, Washington State on April 12, 2018 by xi'an**T**his week, I happened to re-read John Storey’ 2003 “The positive discovery rate: a Bayesian interpretation and the q-value”, because I wanted to check a connection with our testing by mixture [still in limbo] paper. I however failed to find what I was looking for because I could not find any Bayesian flavour in the paper apart from an FRD expressed as a “posterior probability” of the null, in the sense that the setting was one of opposing two simple hypotheses. When there is an unknown parameter common to the multiple hypotheses being tested, a prior distribution on the parameter makes these multiple hypotheses connected. What makes the connection puzzling is the assumption that the observed statistics defining the significance region are *independent* (Theorem 1). And it seems to depend on the choice of the significance region, which should be induced by the Bayesian modelling, not the opposite. (This alternative explanation does not help either, maybe because it is on baseball… Or maybe because the sentence “If a player’s [posterior mean] is above .3, it’s more likely than not that their true average is as well” does not seem to appear naturally from a Bayesian formulation.) *[Disclaimer: I am not hinting at anything wrong or objectionable in Storey’s paper, just being puzzled by the Bayesian tag!]*

## scalable Langevin exact algorithm

Posted in Books, Statistics, Travel, University life with tags Brownian motion, control variate, importance sampling, JSM 2015, Langevin diffusion, normalising constant, Poisson process, quasi-stationary distribution, scalable MCMC, Seattle, sequential Monte Carlo, University of Warwick on October 18, 2016 by xi'an

“By employing a modification to existing naïve subsampling techniques we can obtain an algorithm which is still exact but has sub-linear iterative cost as a function of data size.”

**A** few weeks ago Murray Pollock, Paul Fearnhead, Adam Johansen and Gareth Roberts (all from Warwick except for Paul) arXived a paper The Scalable Langevin Exact Algorithm: Bayesian Inference for Big Data. (This was also the topic of Murray’s talk last year at JSM in Seattle.) One major advance found in the paper is the derivation of an “exact” algorithm that is sub-linear in the data size. As discussed in the introduction, the current approaches to large data problems either suffer from being approximate (like divide-and-conquer methods) or do not achieve significant reduction in the computing time, being of order O(n). The authors mention Teh and Welling (2011) sand their tochastic gradient approximation to the Langevin diffusion, when the gradient is based on a subsample. Without the Metropolis correction that would ensure an exact target but at a cost of order O(n). (Which makes the technique rather difficult to recommend.)

A novel [for me] notion at the core of this paper is the concept of *quasi-stationary distribution*, which is the limiting distribution of a Markov chain X[t] conditional on a Markov stopping time [being larger than t]. The approach is based on diffusions with appropriate stationary distributions like the Langevin diffusion. (Actually, as in most papers I have read and remember, the current paper only considers the Langevin diffusion.) In order to avoid the issues with unadjusted and Metropolis-adjusted Langevin schemes, a killed Brownian motion is created, which means a Brownian motion conditional of being alive till time T when the instantaneous killing rate is a given function of the chain, Φ(X[t]), related with the stationary measure of the Langevin diffusion ν. Under appropriate conditions, the density of this killed Brownian motion converges [in T] to √ν. Which immediately hints at creating a new Langevin diffusion targeting ν² instead of ν. And killing it with the proper rate, which can be done by thinning a Poisson process. Simulating the original path can be done by path-space rejection sampling, following the technique set by Gareth Roberts and co-authors more than ten years ago. Based on finite dimensional realisations of the path on [0,T]. And including the killing part can be done by importance sampling and checking that the simulated killing time is larger than the current (exponentially simulated) time.

One practical difficulty in the implementation of this neat principle is the derivation of the normalising constant, which evaluation degrades with the time horizon T. The solution adopted in the paper is through a sequential Monte Carlo method, using another discretisation of the time interval [0,T] (relying on the original one would get too costly?). As for subsampling, since the survival probability for the Brownian motion is based on an unbiased estimator, subsampling does not hurt if conducted in a random manner. Although this increases the variance on principle, the use of a control variate computed just once helps in reducing the complexity to O(1).

This is a tough paper and I have not gone through the effort of trying to implement it, but this is an original and innovative construct I would like to monitor in further details on a toy example, maybe next week while in Warwick. Or at least to discuss it with the authors.

## photography exhibit in Seattle

Posted in Mountains, pictures, Running, Travel with tags Brittany Aäe, Caffe Vita, climbing, Fremont, Mazama, North Cascade Climbing Guides, North Cascades National Park, Seattle, Washington State on October 4, 2016 by xi'an## Cadence

Posted in pictures, Travel, Wines with tags Cabernet, Cadence, Merlot, Pacific North West, Petit Verdot, Red Moutain, Seattle, Washington State on December 20, 2015 by xi'an## superintelligence [book review]

Posted in Books, Statistics, Travel, University life with tags 2001: A Space Odyssey, AIs, artificial intelligence, Bill Gates, Chicken Run, doomsday argument, ethics, HAL, intelligence, Isaac Asimov, JSM 2015, morality, Nick Bostrom, Philip K. DIck, Seattle on November 28, 2015 by xi'an

“The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.”I.J. Good

**I** saw the nice cover of Superintelligence: paths, dangers, strategies by Nick Bostrom [owling at me!] at the OUP booth at JSM this summer—nice owl cover that comes will a little philosophical fable at the beginning about sparrows—and, after reading an in-depth review [in English] by Olle Häggström, on Häggström hävdar, asked OUP for a review copy. Which they sent immediately. The reason why I got (so) interested in the book is that I am quite surprised at the level of alertness about the dangers of artificial intelligence (or computer intelligence) taking over. As reported in an earlier blog, and with no expertise whatsoever in the field, I was not and am not convinced that the uncontrolled and exponential rise of non-human or non-completely human intelligences is the number One entry in Doom Day scenarios. (As made clear by Radford Neal and Corey Yanovsky in their comments, I know nothing worth reporting about those issues, but remain presumably irrationally more concerned about climate change and/or a return to barbarity than by the incoming reign of the machines.) Thus, having no competence in the least in either intelligence (!), artificial or human, or in philosophy and ethics, the following comments on the book only reflect my neophyte’s reactions. *Which means the following rant should be mostly ignored! Except maybe on a rainy day like today…*

“The ideal is that of the perfect Bayesian agent, one that makes probabilistically optimal use of available information. This idea is unattainable (…) Accordingly, one can view artificial intelligence as a quest to find shortcuts…” (p.9)

Overall, the book stands much more at a philosophical and exploratory level than at attempting any engineering or technical assessment. The graphs found within are sketches rather than outputs of carefully estimated physical processes. There is thus hardly any indication how those super AIs could be coded towards super abilities to produce paper clips (but why on Earth would we need paper clips in a world dominated by AIs?!) or to involve all resources from an entire galaxy to explore even farther. The author envisions (mostly catastrophic) scenarios that require some suspension of belief and after a while I decided to read the book mostly as a higher form of science fiction, from which a series of lower form science fiction books could easily be constructed! Some passages reminded me quite forcibly of Philip K. Dick, less of electric sheep &tc. than of Ubik, where a superpowerful AI(s) turn humans into jar brains satisfied (or ensnared) with simulated virtual realities. Much less of Asimov’s novels as robots are hardly mentioned. And the third laws of robotics dismissed as ridiculously too simplistic (and too human). Continue reading