## ABC variable selection

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on July 18, 2018 by xi'an

Prior to the ISBA 2018 meeting, Yi Liu, Veronika Ročková, and Yuexi Wang arXived a paper on relying ABC for finding relevant variables, which is a very original approach in that ABC is not as much the object as it is a tool. And which Veronika considered during her Susie Bayarri lecture at ISBA 2018. In other words, it is not about selecting summary variables for running ABC but quite the opposite, selecting variables in a non-linear model through an ABC step. I was going to separate the two selections into algorithmic and statistical selections, but it is more like projections in the observation and covariate spaces. With ABC still providing an appealing approach to approximate the marginal likelihood. Now, one may wonder at the relevance of ABC for variable selection, aka model choice, given our warning call of a few years ago. But the current paper does not require low-dimension summary statistics, hence avoids the difficulty with the “other” Bayes factor.

In the paper, the authors consider a spike-and… forest prior!, where the Bayesian CART selection of active covariates proceeds through a regression tree, selected covariates appearing in the tree and others not appearing. With a sparsity prior on the tree partitions and this new ABC approach to select the subset of active covariates. A specific feature is in splitting the data, one part to learn about the regression function, simulating from this function and comparing with the remainder of the data. The paper further establishes that ABC Bayesian Forests are consistent for variable selection.

“…we observe a curious empirical connection between π(θ|x,ε), obtained with ABC Bayesian Forests  and rescaled variable importances obtained with Random Forests.”

The difference with our ABC-RF model choice paper is that we select summary statistics [for classification] rather than covariates. For instance, in the current paper, simulation of pseudo-data will depend on the selected subset of covariates, meaning simulating a model index, and then generating the pseudo-data, acceptance being a function of the L² distance between data and pseudo-data. And then relying on all ABC simulations to find which variables are in more often than not to derive the median probability model of Barbieri and Berger (2004). Which does not work very well if implemented naïvely. Because of the immense size of the model space, it is quite hard to find pseudo-data close to actual data, resulting in either very high tolerance or very low acceptance. The authors get over this difficulty by a neat device that reminds me of fractional or intrinsic (pseudo-)Bayes factors in that the dataset is split into two parts, one that learns about the posterior given the model index and another one that simulates from this posterior to compare with the left-over data. Bringing simulations closer to the data. I do not remember seeing this trick before in ABC settings, but it is very neat, assuming the small data posterior can be simulated (which may be a fundamental reason for the trick to remain unused!). Note that the split varies at each iteration, which means there is no impact of ordering the observations.

## go, Iron scots!

Posted in Statistics with tags , , , , , , , on June 30, 2018 by xi'an

## ABC in Ed’burgh

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on June 28, 2018 by xi'an

A glorious day for this new edition of the “ABC in…” workshops, in the capital City of Edinburgh! I enjoyed very much this ABC day for demonstrating ABC is still alive and kicking!, i.e., enjoying plenty of new developments and reinterpretations. With more talks and posters on the way during the main ISBA 2018 meeting. (All nine talks are available on the webpage of the conference.)

After Michael Gutmann’s tutorial on ABC, Gael Martin (Monash) presented her recent work with David Frazier, Ole Maneesoonthorn, and Brendan McCabe on ABC  for prediction. Maybe unsurprisingly, Bayesian consistency for the given summary statistics is a sufficient condition for concentration of the ABC predictor, but ABC seems to do better for the prediction problem than for parameter estimation, not losing to exact Bayesian inference, possibly because in essence the summary statistics there need not be of a large dimension to being consistent. The following talk by Guillaume Kon Kam King was also about prediction, for the specific problem of gas offer, with a latent Wright-Fisher point process in the model. He used a population ABC solution to handle this model.

Alexander Buchholz (CREST) introduced an ABC approach with quasi-Monte Carlo steps that helps in reducing the variability and hence improves the approximation in ABC. He also looked at a Negative Geometric variant of regular ABC by running a random number of proposals until reaching a given number of acceptances, which while being more costly produces more stability.

Other talks by Trevelyan McKinley, Marko Järvenpää, Matt Moores (Warwick), and Chris Drovandi (QUT) illustrated the urge of substitute models as a first step, and not solely via Gaussian processes. With for instance the new notion of a loss function to evaluate this approximation. Chris made a case in favour of synthetic vs ABC approaches, due to degradation of the performances of nonparametric density estimation with the dimension. But I remain a doubting Thomas [Bayes] on that point as high dimensions in the data or the summary statistics are not necessarily the issue, as also processed in the paper on ABC-CDE discussed on a recent post. While synthetic likelihood requires estimating a mean function and a covariance function of the parameter of the dimension of the summary statistic. Even though estimated by simulation.

Another neat feature of the day was a special session on cosmostatistics with talks by Emille Ishida and Jessica Cisewski, from explaining how ABC was starting to make an impact on cosmo- and astro-statistics, to the special example of the stellar initial mass distribution in clusters.

Call is now open for the next “ABC in”! Note that, while these workshops have been often formally sponsored by ISBA and its BayesComp section, they are not managed by a society or a board of administrators, and hence are not much contrived by a specific format. It would just be nice to keep the low fees as part of the tradition.

## from Arthur’s Seat [spot ISBA participants]

Posted in Mountains, pictures, Running, Travel with tags , , , , , , , , , , , , on June 27, 2018 by xi'an

## fast ε-free ABC

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on June 8, 2017 by xi'an

Last Fall, George Papamakarios and Iain Murray from Edinburgh arXived an ABC paper on fast ε-free inference on simulation models with Bayesian conditional density estimation, paper that I missed. The idea there is to approximate the posterior density by maximising the likelihood associated with a parameterised family of distributions on θ, conditional on the associated x. The data being then the ABC reference table. The family chosen there is a mixture of K Gaussian components, which parameters are then estimated by a (Bayesian) neural network using x as input and θ as output. The parameter values are simulated from an adaptive proposal that aims at approximating the posterior better and better. As in population Monte Carlo, actually. Except for the neural network part, which I fail to understand why it makes a significant improvement when compared with EM solutions. The overall difficulty with this approach is that I do not see a way out of the curse of dimensionality: when the dimension of θ increases, the approximation to the posterior distribution of θ does deteriorate, even in the best of cases, as any other non-parametric resolution. It would have been of (further) interest to see a comparison with a most rudimentary approach, namely the one we proposed based on empirical likelihoods.

## ISBA 2018, Edinburgh, 24-28 June

Posted in Statistics with tags , , , , , , , , , , on March 1, 2017 by xi'an

The ISBA 2018 World Meeting will take place in Edinburgh, Scotland, on 24-29 June 2018. (Since there was some confusion about the date, it is worth stressing that these new dates are definitive!) Note also that there are other relevant conferences and workshops in the surrounding weeks:

• a possible ABC in Edinburgh the previous weekend, 23-24 June [to be confirmed!]
• the Young Bayesian Meeting (BaYSM) in Warwick, 2-3 July 2018
• a week-long school on fundamentals of simulation in Warwick, 9-13 July 2018 with courses given by Nicolas Chopin, Art Owen, Jeff Rosenthal and others
• MCqMC 2018 in Rennes, 1-6 July 208
• ICML 2018 in Stockholm, 10-15 July 2018
• the 2018 International Biometrics Conference in Barcelona, 8-13 July 2018

## asymptotically exact inference in likelihood-free models

Posted in Books, pictures, Statistics with tags , , , , , , , on November 29, 2016 by xi'an

“We use the intuition that inference corresponds to integrating a density across the manifold corresponding to the set of inputs consistent with the observed outputs.”

Following my earlier post on that paper by Matt Graham and Amos Storkey (University of Edinburgh), I now read through it. The beginning is somewhat unsettling, albeit mildly!, as it starts by mentioning notions like variational auto-encoders, generative adversial nets, and simulator models, by which they mean generative models represented by a (differentiable) function g that essentially turn basic variates with density p into the variates of interest (with intractable density). A setting similar to Meeds’ and Welling’s optimisation Monte Carlo. Another proximity pointed out in the paper is Meeds et al.’s Hamiltonian ABC.

“…the probability of generating simulated data exactly matching the observed data is zero.”

The section on the standard ABC algorithms mentions the fact that ABC MCMC can be (re-)interpreted as a pseudo-marginal MCMC, albeit one targeting the ABC posterior instead of the original posterior. The starting point of the paper is the above quote, which echoes a conversation I had with Gabriel Stolz a few weeks ago, when he presented me his free energy method and when I could not see how to connect it with ABC, because having an exact match seemed to cancel the appeal of ABC, all parameter simulations then producing an exact match under the right constraint. However, the paper maintains this can be done, by looking at the joint distribution of the parameters, latent variables, and observables. Under the implicit restriction imposed by keeping the observables constant. Which defines a manifold. The mathematical validation is achieved by designing the density over this manifold, which looks like

$p(u)\left|\frac{\partial g^0}{\partial u}\frac{\partial g^0}{\partial u}^\text{T}\right|^{-\textonehalf}$

if the constraint can be rewritten as g⁰(u)=0. (This actually follows from a 2013 paper by Diaconis, Holmes, and Shahshahani.) In the paper, the simulation is conducted by Hamiltonian Monte Carlo (HMC), the leapfrog steps consisting of an unconstrained move followed by a projection onto the manifold. This however sounds somewhat intense in that it involves a quasi-Newton resolution at each step. I also find it surprising that this projection step does not jeopardise the stationary distribution of the process, as the argument found therein about the approximation of the approximation is not particularly deep. But the main thing that remains unclear to me after reading the paper is how the constraint that the pseudo-data be equal to the observable data can be turned into a closed form condition like g⁰(u)=0. As mentioned above, the authors assume a generative model based on uniform (or other simple) random inputs but this representation seems impossible to achieve in reasonably complex settings.