Archive for Edinburgh

the naming of the Dead [book review]

Posted in Statistics with tags , , , , , , , , , , , , , on July 21, 2018 by xi'an

When leaving for ISBA 2018 in Edinburgh, I picked a Rebus book in my bookshelf,  book that happened to be The Naming of the Dead, which was published in 2006 and takes place in 2005, during the week of the G8 summit in Scotland and of the London Underground bombings. Quite a major week in recent British history! But also for Rebus and his colleague Siobhan Clarke, who investigate a sacrificial murder close, too close, to the location of the G8 meeting and as a result collide with superiors, secret services, protesters, politicians, and executives, including a brush with Bush ending up with his bike accident at Gleneagles, and ending up with both of them suspended from the force. But more than this close connection with true events in and around Edinburgh, the book is a masterpiece, maybe Rankin’s best, because of the depiction of the characters, who have even more depth and dimensions than in the other novels.  And for the analysis of the events of that week. Having been in Edinburgh at the time I started re-reading the book also made the description of the city much more vivid and realistic, as I could locate and sometimes remember some places. (The conclusion of some subplots may be less realistic than I would like them to be, but this is of very minor relevance.)

ABC variable selection

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on July 18, 2018 by xi'an

Prior to the ISBA 2018 meeting, Yi Liu, Veronika Ročková, and Yuexi Wang arXived a paper on relying ABC for finding relevant variables, which is a very original approach in that ABC is not as much the object as it is a tool. And which Veronika considered during her Susie Bayarri lecture at ISBA 2018. In other words, it is not about selecting summary variables for running ABC but quite the opposite, selecting variables in a non-linear model through an ABC step. I was going to separate the two selections into algorithmic and statistical selections, but it is more like projections in the observation and covariate spaces. With ABC still providing an appealing approach to approximate the marginal likelihood. Now, one may wonder at the relevance of ABC for variable selection, aka model choice, given our warning call of a few years ago. But the current paper does not require low-dimension summary statistics, hence avoids the difficulty with the “other” Bayes factor.

In the paper, the authors consider a spike-and… forest prior!, where the Bayesian CART selection of active covariates proceeds through a regression tree, selected covariates appearing in the tree and others not appearing. With a sparsity prior on the tree partitions and this new ABC approach to select the subset of active covariates. A specific feature is in splitting the data, one part to learn about the regression function, simulating from this function and comparing with the remainder of the data. The paper further establishes that ABC Bayesian Forests are consistent for variable selection.

“…we observe a curious empirical connection between π(θ|x,ε), obtained with ABC Bayesian Forests  and rescaled variable importances obtained with Random Forests.”

The difference with our ABC-RF model choice paper is that we select summary statistics [for classification] rather than covariates. For instance, in the current paper, simulation of pseudo-data will depend on the selected subset of covariates, meaning simulating a model index, and then generating the pseudo-data, acceptance being a function of the L² distance between data and pseudo-data. And then relying on all ABC simulations to find which variables are in more often than not to derive the median probability model of Barbieri and Berger (2004). Which does not work very well if implemented naïvely. Because of the immense size of the model space, it is quite hard to find pseudo-data close to actual data, resulting in either very high tolerance or very low acceptance. The authors get over this difficulty by a neat device that reminds me of fractional or intrinsic (pseudo-)Bayes factors in that the dataset is split into two parts, one that learns about the posterior given the model index and another one that simulates from this posterior to compare with the left-over data. Bringing simulations closer to the data. I do not remember seeing this trick before in ABC settings, but it is very neat, assuming the small data posterior can be simulated (which may be a fundamental reason for the trick to remain unused!). Note that the split varies at each iteration, which means there is no impact of ordering the observations.

ISBA 18 tidbits

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 2, 2018 by xi'an

Among a continuous sequence of appealing sessions at this ISBA 2018 meeting [says a member of the scientific committee!], I happened to attend two talks [with a wee bit of overlap] by Sid Chib in two consecutive sessions, because his co-author Ana Simoni (CREST) was unfortunately sick. Their work was about models defined by a collection of moment conditions, as often happens in econometrics, developed in a recent JASA paper by Chib, Shin, and Simoni (2017). With an extension about moving to defining conditional expectations by use of a functional basis. The main approach relies on exponentially tilted empirical likelihoods, which reminded me of the empirical likelihood [BCel] implementation we ran with Kerrie Mengersen and Pierre Pudlo a few years ago. As a substitute to ABC. This problematic made me wonder on how much Bayesian the estimating equation concept is, as it should somewhat involve a nonparametric prior under the moment constraints.

Note that Sid’s [talks and] papers are disconnected from ABC, as everything comes in closed form, apart from the empirical likelihood derivation, as we actually found in our own work!, but this could become a substitute model for ABC uses. For instance, identifying the parameter θ of the model by identifying equations. Would that impose too much input from the modeller? I figure I came with this notion mostly because of the emphasis on proxy models the previous day at ABC in ‘burgh! Another connected item of interest in the work is the possibility of accounting for misspecification of these moment conditions by introducing a vector of errors with a spike & slab distribution, although I am not sure this is 100% necessary without getting further into the paper(s) [blame conference pressure on my time].

Another highlight was attending a fantastic poster session Monday night on computational methods except I would have needed four more hours to get through every and all posters. This new version of ISBA has split the posters between two sites (great) and themes (not so great!), while I would have preferred more sites covering all themes over all nights, to lower the noise (still bearable this year) and to increase the possibility to check all posters of interest in a particular theme…

Mentioning as well a great talk by Dan Roy about assessing deep learning performances by what he calls non-vacuous error bounds. Namely, through PAC-Bayesian bounds. One major comment of his was about deep learning models being much more non-parametric (number of parameters rising with number of observations) than parametric models, meaning that generative adversarial constructs as the one I discussed a few days ago may face a fundamental difficulty as models are taken at face value there.

On closed-form solutions, a closed-form Bayes factor for component selection in mixture models by Fũqene, Steel and Rossell that resemble the Savage-Dickey version, without the measure theoretic difficulties. But with non-local priors. And closed-form conjugate priors for the probit regression model, using unified skew-normal priors, as exhibited by Daniele Durante. Which are product of Normal cdfs and pdfs, and which allow for closed form marginal likelihoods and marginal posteriors as well. (The approach is not exactly conjugate as the prior and the posterior are not in the same family.)

And on the final session I attended, there were two talks on scalable MCMC, one on coresets, which will require some time and effort to assimilate, by Trevor Campbell and Tamara Broderick, and another one using Poisson subsampling. By Matias Quiroz and co-authors. Which did not completely convinced me (but this was the end of a long day…)

All in all, this has been a great edition of the ISBA meetings, if quite intense due to a non-stop schedule, with a very efficient organisation that made parallel sessions manageable and poster sessions back to a reasonable scale [although I did not once manage to cross the street to the other session]. Being in unreasonably sunny Edinburgh helped a lot obviously! I am a wee bit disappointed that no one else follows my call to wear a kilt, but I had low expectations to start with… And too bad I missed the Ironman 70.3 Edinburgh by one day!

go, Iron scots!

Posted in Statistics with tags , , , , , , , on June 30, 2018 by xi'an

ABC in Ed’burgh

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on June 28, 2018 by xi'an

A glorious day for this new edition of the “ABC in…” workshops, in the capital City of Edinburgh! I enjoyed very much this ABC day for demonstrating ABC is still alive and kicking!, i.e., enjoying plenty of new developments and reinterpretations. With more talks and posters on the way during the main ISBA 2018 meeting. (All nine talks are available on the webpage of the conference.)

After Michael Gutmann’s tutorial on ABC, Gael Martin (Monash) presented her recent work with David Frazier, Ole Maneesoonthorn, and Brendan McCabe on ABC  for prediction. Maybe unsurprisingly, Bayesian consistency for the given summary statistics is a sufficient condition for concentration of the ABC predictor, but ABC seems to do better for the prediction problem than for parameter estimation, not losing to exact Bayesian inference, possibly because in essence the summary statistics there need not be of a large dimension to being consistent. The following talk by Guillaume Kon Kam King was also about prediction, for the specific problem of gas offer, with a latent Wright-Fisher point process in the model. He used a population ABC solution to handle this model.

Alexander Buchholz (CREST) introduced an ABC approach with quasi-Monte Carlo steps that helps in reducing the variability and hence improves the approximation in ABC. He also looked at a Negative Geometric variant of regular ABC by running a random number of proposals until reaching a given number of acceptances, which while being more costly produces more stability.

Other talks by Trevelyan McKinley, Marko Järvenpää, Matt Moores (Warwick), and Chris Drovandi (QUT) illustrated the urge of substitute models as a first step, and not solely via Gaussian processes. With for instance the new notion of a loss function to evaluate this approximation. Chris made a case in favour of synthetic vs ABC approaches, due to degradation of the performances of nonparametric density estimation with the dimension. But I remain a doubting Thomas [Bayes] on that point as high dimensions in the data or the summary statistics are not necessarily the issue, as also processed in the paper on ABC-CDE discussed on a recent post. While synthetic likelihood requires estimating a mean function and a covariance function of the parameter of the dimension of the summary statistic. Even though estimated by simulation.

Another neat feature of the day was a special session on cosmostatistics with talks by Emille Ishida and Jessica Cisewski, from explaining how ABC was starting to make an impact on cosmo- and astro-statistics, to the special example of the stellar initial mass distribution in clusters.

Call is now open for the next “ABC in”! Note that, while these workshops have been often formally sponsored by ISBA and its BayesComp section, they are not managed by a society or a board of administrators, and hence are not much contrived by a specific format. It would just be nice to keep the low fees as part of the tradition.

from Arthur’s Seat [spot ISBA participants]

Posted in Mountains, pictures, Running, Travel with tags , , , , , , , , , , , , on June 27, 2018 by xi'an

Bayesian GANs [#2]

Posted in Books, pictures, R, Statistics with tags , , , , , , , , , , , , on June 27, 2018 by xi'an

As an illustration of the lack of convergence of the Gibbs sampler applied to the two “conditionals” defined in the Bayesian GANs paper discussed yesterday, I took the simplest possible example of a Normal mean generative model (one parameter) with a logistic discriminator (one parameter) and implemented the scheme (during an ISBA 2018 session). With flat priors on both parameters. And a Normal random walk as Metropolis-Hastings proposal. As expected, since there is no stationary distribution associated with the Markov chain, simulated chains do not exhibit a stationary pattern,

And they eventually reach an overflow error or a trapping state as the log-likelihood gets approximately to zero (red curve).

Too bad I missed the talk by Shakir Mohammed yesterday, being stuck on the Edinburgh by-pass at rush hour!, as I would have loved to hear his views about this rather essential issue…