Archive for JSM 2015

ABC-SAEM

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on October 8, 2019 by xi'an

In connection with the recent PhD thesis defence of Juliette Chevallier, in which I took a somewhat virtual part for being physically in Warwick, I read a paper she wrote with Stéphanie Allassonnière on stochastic approximation versions of the EM algorithm. Computing the MAP estimator can be done via some adapted for simulated annealing versions of EM, possibly using MCMC as for instance in the Monolix software and its MCMC-SAEM algorithm. Where SA stands sometimes for stochastic approximation and sometimes for simulated annealing, originally developed by Gilles Celeux and Jean Diebolt, then reframed by Marc Lavielle and Eric Moulines [friends and coauthors]. With an MCMC step because the simulation of the latent variables involves an untractable normalising constant. (Contrary to this paper, Umberto Picchini and Adeline Samson proposed in 2015 a genuine ABC version of this approach, paper that I thought I missed—although I now remember discussing it with Adeline at JSM in Seattle—, ABC is used as a substitute for the conditional distribution of the latent variables given data and parameter. To be used as a substitute for the Q step of the (SA)EM algorithm. One more approximation step and one more simulation step and we would reach a form of ABC-Gibbs!) In this version, there are very few assumptions made on the approximation sequence, except that it converges with the iteration index to the true distribution (for a fixed observed sample) if convergence of ABC-SAEM is to happen. The paper takes as an illustrative sequence a collection of tempered versions of the true conditionals, but this is quite formal as I cannot fathom a feasible simulation from the tempered version and not from the untempered one. It is thus much more a version of tempered SAEM than truly connected with ABC (although a genuine ABC-EM version could be envisioned).

scalable Langevin exact algorithm

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , on October 18, 2016 by xi'an

“By employing a modification to existing naïve subsampling techniques we can obtain an algorithm which is still exact but has sub-linear iterative cost as a function of data size.”

A few weeks ago Murray Pollock, Paul Fearnhead, Adam Johansen and Gareth Roberts (all from Warwick except for Paul) arXived a paper The Scalable Langevin Exact Algorithm: Bayesian Inference for Big Data. (This was also the topic of Murray’s talk last year at JSM in Seattle.) One major advance found in the paper is the derivation of an “exact” algorithm that is sub-linear in the data size. As discussed in the introduction, the current approaches to large data problems either suffer from being approximate (like divide-and-conquer methods) or do not achieve significant reduction in the computing time, being of order O(n). The authors mention Teh and Welling (2011) sand their tochastic gradient approximation to the Langevin diffusion, when the gradient is based on a subsample. Without the Metropolis correction that would ensure an exact target but at a cost of order O(n). (Which makes the technique rather difficult to recommend.)

A novel [for me] notion at the core of this paper is the concept of quasi-stationary distribution, which is the limiting distribution of a Markov chain X[t] conditional on a Markov stopping time [being larger than t]. The approach is based on diffusions with appropriate stationary distributions like the Langevin diffusion. (Actually, as in most papers I have read and remember, the current paper only considers the Langevin diffusion.) In order to avoid the issues with unadjusted and Metropolis-adjusted Langevin schemes, a killed Brownian motion is created, which means a Brownian motion conditional of being alive till time T when the instantaneous killing rate is a given function of the chain, Φ(X[t]), related with the stationary measure of the Langevin diffusion ν. Under appropriate conditions, the density of this killed Brownian motion converges [in T] to √ν. Which immediately hints at creating a new Langevin diffusion targeting ν² instead of ν. And killing it with the proper rate, which can be done by thinning a Poisson process. Simulating the original path can be done by path-space rejection sampling, following the technique set by Gareth Roberts and co-authors more than ten years ago. Based on finite dimensional realisations of the path on [0,T]. And including the killing part can be done by importance sampling and checking that the simulated killing time is larger than the current (exponentially simulated) time.

One practical difficulty in the implementation of this neat principle is the derivation of the normalising constant, which evaluation degrades with the time horizon T. The solution adopted in the paper is through a sequential Monte Carlo method, using another discretisation of the time interval [0,T] (relying on the original one would get too costly?). As for subsampling, since the survival probability for the Brownian motion is based on an unbiased estimator, subsampling does not hurt if conducted in a random manner. Although this increases the variance on principle, the use of a control variate computed just once helps in reducing the complexity to O(1).

This is a tough paper and I have not gone through the effort of trying to implement it, but this is an original and innovative construct I would like to monitor in further details on a toy example, maybe next week while in Warwick. Or at least to discuss it with the authors.

superintelligence [book review]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on November 28, 2015 by xi'an

“The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.” I.J. Good

I saw the nice cover of Superintelligence: paths, dangers, strategies by Nick Bostrom [owling at me!] at the OUP booth at JSM this summer—nice owl cover that comes will a little philosophical fable at the beginning about sparrows—and, after reading an in-depth review [in English] by Olle Häggström, on Häggström hävdar, asked OUP for a review copy. Which they sent immediately. The reason why I got (so) interested in the book is that I am quite surprised at the level of alertness about the dangers of artificial intelligence (or computer intelligence) taking over. As reported in an earlier blog, and with no expertise whatsoever in the field, I was not and am not convinced that the uncontrolled and exponential rise of non-human or non-completely human intelligences is the number One entry in Doom Day scenarios. (As made clear by Radford Neal and Corey Yanovsky in their comments, I know nothing worth reporting about those issues, but remain presumably irrationally more concerned about climate change and/or a return to barbarity than by the incoming reign of the machines.) Thus, having no competence in the least in either intelligence (!), artificial or human, or in philosophy and ethics, the following comments on the book only reflect my neophyte’s reactions. Which means the following rant should be mostly ignored! Except maybe on a rainy day like today…

“The ideal is that of the perfect Bayesian agent, one that makes probabilistically optimal use of available information.  This idea is unattainable (…) Accordingly, one can view artificial intelligence as a quest to find shortcuts…” (p.9)

Overall, the book stands much more at a philosophical and exploratory level than at attempting any engineering or technical assessment. The graphs found within are sketches rather than outputs of carefully estimated physical processes. There is thus hardly any indication how those super AIs could be coded towards super abilities to produce paper clips (but why on Earth would we need paper clips in a world dominated by AIs?!) or to involve all resources from an entire galaxy to explore even farther. The author envisions (mostly catastrophic) scenarios that require some suspension of belief and after a while I decided to read the book mostly as a higher form of science fiction, from which a series of lower form science fiction books could easily be constructed! Some passages reminded me quite forcibly of Philip K. Dick, less of electric sheep &tc. than of Ubik, where a superpowerful AI(s) turn humans into jar brains satisfied (or ensnared) with simulated virtual realities. Much less of Asimov’s novels as robots are hardly mentioned. And the third laws of robotics dismissed as ridiculously too simplistic (and too human). Continue reading

congrats!

Posted in Statistics, University life with tags , , , , , , , on August 24, 2015 by xi'an

Two items of news that reached my mailbox at about the same time: my friends and CMU coauthors Rebecca (Beka) Steorts and Steve Fienberg both received a major award in the past few days. Congrats to both of them!!! At JSM 2015, Steve got the 2015 Jerome Sacks Award for Cross-Disciplinary Research “for a remarkable career devoted to the development and application of statistical methodology to solve problems for the benefit of society, including aspects of human rights, privacy and confidentiality, forensics, survey and census-taking, and more; and for exceptional leadership in a variety of professional and governmental organizations, including in the founding of NISS.” The Award is delivered by the National Institute of Statistical Sciences (NISS) in honour of Jerry Sacks. And Beka has been selected as one of the 35 innovators under 35 for 2015, a list published yearly by the MIT Technology Review. In particular for her record-linkage work on estimating the number of casualties in the Syrian civil war. (Which led the Review to classify her as a humanitarian rather than a visionary, which list includes two other machine learners.) Great!

STAN [no dead end]

Posted in Books, Statistics, Travel with tags , , on August 22, 2015 by xi'an

stanmoreMichael Betancourt found this street name in London and used it for his talk in Seattle. Even though he should have photoshopped the dead end symbol, which begged for my sarcastic comment during the talk…