Archive for BART


Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on January 10, 2020 by xi'an

First, I really have to congratulate my friend Jim Hobert for a great organisation of the meeting adopting my favourite minimalist principles (no name tag, no “goodies” apart from the conference schedule, no official talks). Without any pretense at objectivity, I also appreciated very much the range of topics and the sweet frustration of having to choose between two or three sessions each time. Here are some notes taken during some talks (with no implicit implication for the talks no mentioned, re. above frustration! as well as very short nights making sudden lapse in concentration highly likely).

On Day 1, Paul Fearnhead’s inaugural plenary talk was on continuous time Monte Carlo methods, mostly bouncy particle and zig-zag samplers, with a detailed explanation on the simulation of the switching times which likely brought the audience up to speed even if they had never heard of them. And an opening on PDMPs used as equivalents to reversible jump MCMC, reminding me of the continuous time (point process) solutions of Matthew Stephens for mixture inference (and of Preston, Ripley, Møller).

The same morn I heard of highly efficient techniques to handle very large matrices and p>n variables selections by Akihiko Nishimura and Ruth Baker on a delayed acceptance ABC, using a cheap proxy model. Somewhat different from indirect inference. I found the reliance on ESS somewhat puzzling given the intractability of the likelihood (and the low reliability of the frequency estimate) and the lack of connection with the “real” posterior. At the same ABC session, Umberto Picchini spoke on a joint work with Richard Everitt (Warwick) on linking ABC and pseudo-marginal MCMC by bootstrap. Actually, the notion of ABC likelihood was already proposed as pseudo-marginal ABC by Anthony Lee, Christophe Andrieu and Arnaud Doucet in the discussion of Fearnhead and Prangle (2012) but I wonder at the focus of being unbiased when the quantity is not the truth, i.e. the “real” likelihood. It would seem more appropriate to attempt better kernel estimates on the distribution of the summary itself. The same session also involved David Frazier who linked our work on ABC for misspecified models and an on-going investigation of synthetic likelihood.

Later, there was a surprise occurrence of the Bernoulli factory in a talk by Radu Herbei on Gaussian process priors with accept-reject algorithms, leading to exact MCMC, although the computing implementation remains uncertain. And several discussions during the poster session, incl. one on the planning of a 2021 workshop in Oaxaca centred on objective Bayes advances as we received acceptance of our proposal by BIRS today!

On Day 2, David Blei gave a plenary introduction to variational Bayes inference and latent Dirichlet allocations, somewhat too introductory for my taste although other participants enjoyed this exposition. He also mentioned a recent JASA paper on the frequentist consistency of variational Bayes that I should check. Speaking later with PhD students, they really enjoyed this opening on an area they did not know that well.

A talk by Kengo Kamatani (whom I visited last summer) on improved ergodicity rates for heavy tailed targets and Crank-NIcholson modifications to the random walk proposal (which uses an AR(1) representation instead of the random walk). With the clever idea of adding the scale of the proposal as an extra parameter with a prior of its own. Gaining one order of magnitude in the convergence speed (i.e. from d to 1 and from d² to d, where d is the dimension), which is quite impressive (and just published in JAP).Veronica Rockova linked Bayesian variable selection and machine learning via ABC, with conditions on the prior for model consistency. And a novel approach using part of the data to learn an ABC partial posterior, which reminded me of the partial  Bayes factors of the 1990’s although it is presumably unrelated. And a replacement of the original rejection ABC via multi-armed bandits, where each variable is represented by an arm, called ABC Bayesian forests. Recalling the simulation trick behind Thompson’s approach, reproduced for the inclusion or exclusion of variates and producing a fixed estimate for the (marginal) inclusion probabilities, which makes it sound like a prior-feeback form of empirical Bayes. Followed by a talk of Gregor Kastner on MCMC handling of large time series with specific priors and a massive number of parameters.

The afternoon also had a wealth of exciting talks and missed opportunities (in the other sessions!). Which ended up with a strong if unintended French bias since I listened to Christophe Andrieu, Gabriel Stolz, Umut Simsekli, and Manon Michel on different continuous time processes, with Umut linking GANs, multidimensional optimal transport, sliced-Wasserstein, generative models, and new stochastic differential equations. Manon Michel gave a highly intuitive talk on creating non-reversibility, getting rid of refreshment rates in PDMPs to kill any form of reversibility.

ABC variable selection

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on July 18, 2018 by xi'an

Prior to the ISBA 2018 meeting, Yi Liu, Veronika Ročková, and Yuexi Wang arXived a paper on relying ABC for finding relevant variables, which is a very original approach in that ABC is not as much the object as it is a tool. And which Veronika considered during her Susie Bayarri lecture at ISBA 2018. In other words, it is not about selecting summary variables for running ABC but quite the opposite, selecting variables in a non-linear model through an ABC step. I was going to separate the two selections into algorithmic and statistical selections, but it is more like projections in the observation and covariate spaces. With ABC still providing an appealing approach to approximate the marginal likelihood. Now, one may wonder at the relevance of ABC for variable selection, aka model choice, given our warning call of a few years ago. But the current paper does not require low-dimension summary statistics, hence avoids the difficulty with the “other” Bayes factor.

In the paper, the authors consider a spike-and… forest prior!, where the Bayesian CART selection of active covariates proceeds through a regression tree, selected covariates appearing in the tree and others not appearing. With a sparsity prior on the tree partitions and this new ABC approach to select the subset of active covariates. A specific feature is in splitting the data, one part to learn about the regression function, simulating from this function and comparing with the remainder of the data. The paper further establishes that ABC Bayesian Forests are consistent for variable selection.

“…we observe a curious empirical connection between π(θ|x,ε), obtained with ABC Bayesian Forests  and rescaled variable importances obtained with Random Forests.”

The difference with our ABC-RF model choice paper is that we select summary statistics [for classification] rather than covariates. For instance, in the current paper, simulation of pseudo-data will depend on the selected subset of covariates, meaning simulating a model index, and then generating the pseudo-data, acceptance being a function of the L² distance between data and pseudo-data. And then relying on all ABC simulations to find which variables are in more often than not to derive the median probability model of Barbieri and Berger (2004). Which does not work very well if implemented naïvely. Because of the immense size of the model space, it is quite hard to find pseudo-data close to actual data, resulting in either very high tolerance or very low acceptance. The authors get over this difficulty by a neat device that reminds me of fractional or intrinsic (pseudo-)Bayes factors in that the dataset is split into two parts, one that learns about the posterior given the model index and another one that simulates from this posterior to compare with the left-over data. Bringing simulations closer to the data. I do not remember seeing this trick before in ABC settings, but it is very neat, assuming the small data posterior can be simulated (which may be a fundamental reason for the trick to remain unused!). Note that the split varies at each iteration, which means there is no impact of ordering the observations.

likelihood inflating sampling algorithm

Posted in Books, Statistics, University life with tags , , , , , , , , on May 24, 2016 by xi'an

My friends from Toronto Radu Craiu and Jeff Rosenthal have arXived a paper along with Reihaneh Entezari on MCMC scaling for large datasets, in the spirit of Scott et al.’s (2013) consensus Monte Carlo. They devised an likelihood inflated algorithm that brings a novel perspective to the problem of large datasets. This question relates to earlier approaches like consensus Monte Carlo, but also kernel and Weierstrass subsampling, already discussed on this blog, as well as current research I am conducting with my PhD student Changye Wu. The approach by Entezari et al. is somewhat similar to consensus Monte Carlo and the other solutions in that they consider an inflated (i.e., one taken to the right power) likelihood based on a subsample, with the full sample being recovered by importance sampling. Somewhat unsurprisingly this approach leads to a less dispersed estimator than consensus Monte Carlo (Theorem 1). And the paper only draws a comparison with that sub-sampling method, rather than covering other approaches to the problem, maybe because this is the most natural connection, one approach being the k-th power of the other approach.

“…we will show that [importance sampling] is unnecessary in many instances…” (p.6)

An obvious question that stems from the approach is the call for importance sampling, since the numerator of the importance sampler involves the full likelihood which is unavailable in most instances when sub-sampled MCMC is required. I may have missed the part of the paper where the above statement is discussed, but the only realistic example discussed therein is the Bayesian regression tree (BART) of Chipman et al. (1998). Which indeed constitutes a challenging if one-dimensional example, but also one that requires delicate tuning that leads to cancelling importance weights but which may prove delicate to extrapolate to other models.