On Monday, James Johndrow, Aaron Smith, Natesh Pillai, and David Dunson arXived a paper on the diminishing benefits of using data augmentation for large and highly imbalanced categorical data. They reconsider the data augmentation scheme of Tanner and Wong (1987), surprisingly not mentioned, used in the first occurrences of the Gibbs sampler like Albert and Chib’s (1993) or our mixture estimation paper with Jean Diebolt (1990). The central difficulty with data augmentation is that the distribution to be simulated operates on a space that is of order O(n), even when the original distribution covers a single parameter. As illustrated by the coalescent in population genetics (and the subsequent intrusion of the ABC methodology), there are well-known cases when the completion is near to impossible and clearly inefficient (as again illustrated by the failure of importance sampling strategies on the coalescent). The paper provides spectral gaps for the logistic and probit regression completions, which are of order a power of log(n) divided by √n, when all observations are equal to one. In a somewhat related paper with Jim Hobert and Vivek Roy, we studied the spectral gap for mixtures with a small number of observations: I wonder at the existence of a similar result in this setting, when all observations stem from one component of the mixture, when all observations are one. The result in this paper is theoretically appealing, the more because the posteriors associated with such models are highly regular and very close to Gaussian (and hence not that challenging as argued by Chopin and Ridgway). And because the data augmentation algorithm is uniformly ergodic in this setting (as we established with Jean Diebolt and later explored with Richard Tweedie). As demonstrated in the experiment produced in the paper, when comparing with HMC and Metropolis-Hastings (same computing times?), which produce much higher effective sample sizes.
Archive for the Travel Category
Just as for the previous book, I found this travel book in a nice bookstore, Rue Mouffetard, after my talk at Agro, and bought it [in a French translation] in prevision for my incoming trip to Spain. And indeed read it while in Spain, finishing it a few minutes before touching ground in Paris.
“The hunters wolfed down chicken fried steaks or wolfed down cuds of Red Man, Beech-Nut, Levi Garrett, or Jackson’s Apple Jack”
The Snow Geese was written in 2002 by William Fiennes, a young Englishman recovering from a serious disease and embarking on a wild quest to overcome post-sickness depression. While the idea behind the trip is rather alluring, namely to follow Arctic geese from their wintering grounds in Texas to their summer nesting place on Baffin Island, the book itself is sort of a disaster. As the prose of the author is very heavy, or even very very heavy, with an accumulation of descriptions that do not contribute to the story and a highly bizarre habit to mention brands by groups of three. And of using heavy duty analogies, as in “we were travelling across the middle of a page, with whiteness and black markings all around us, and geese lifting off the snow like letters becoming unstuck”. The reflections about the recovery of the author from a bout of depression and the rise of homesickness and nostalgia are not in the least deep or challenging, while the trip of the geese does not get beyond the descriptive. Worse, the geese remain a mystery, a blur, and a collective, rather than bringing the reader closer to them. If anything is worth mentioning there, it is instead the encounters of the author with rather unique characters, at every step of his road- and plane-trips. To the point of sounding too unique to be true… His hunting trip with a couple of Inuit hunters north of Iqualit on Baffin Island is both a high and a down of the book in that sharing a few days with them in the wild is exciting in a primeval sense, while witnessing them shoot down the very geese the author followed for 5000 kilometres sort of negates the entire purpose of the trip. It then makes perfect sense to close the story with a feeling of urgency, for there is nothing worth adding.
Before leaving Helsinki, we arXived [from the Air France lounge!] the paper Jean-Michel presented on Monday at ABCruise in Helsinki. This paper summarises the experiments Louis conducted over the past months to assess the great performances of a random forest regression approach to ABC parameter inference. Thus validating in this experimental sense the use of this new approach to conducting ABC for Bayesian inference by random forests. (And not ABC model choice as in the Bioinformatics paper with Pierre Pudlo and others.)
I think the major incentives in exploiting the (still mysterious) tool of random forests [against more traditional ABC approaches like Fearnhead and Prangle (2012) on summary selection] are that (i) forests do not require a preliminary selection of the summary statistics, since an arbitrary number of summaries can be used as input for the random forest, even when including a large number of useless white noise variables; (b) there is no longer a tolerance level involved in the process, since the many trees in the random forest define a natural if rudimentary distance that corresponds to being or not being in the same leaf as the observed vector of summary statistics η(y); (c) the size of the reference table simulated from the prior (predictive) distribution does not need to be as large as for in usual ABC settings and hence this approach leads to significant gains in computing time since the production of the reference table usually is the costly part! To the point that deriving a different forest for each univariate transform of interest is truly a minor drag in the overall computing cost of the approach.
An intriguing point we uncovered through Louis’ experiments is that an unusual version of the variance estimator is preferable to the standard estimator: we indeed exposed better estimation performances when using a weighted version of the out-of-bag residuals (which are computed as the differences between the simulated value of the parameter transforms and their expectation obtained by removing the random trees involving this simulated value). Another intriguing feature [to me] is that the regression weights as proposed by Meinshausen (2006) are obtained as an average of the inverse of the number of terms in the leaf of interest. When estimating the posterior expectation of a transform h(θ) given the observed η(y), this summary statistic η(y) ends up in a given leaf for each tree in the forest and all that matters for computing the weight is the number of points from the reference table ending up in this very leaf. I do find this difficult to explain when confronting the case when many simulated points are in the leaf against the case when a single simulated point makes the leaf. This single point ends up being much more influential that all the points in the other situation… While being an outlier of sorts against the prior simulation. But now that I think more about it (after an expensive Lapin Kulta beer in the Helsinki airport while waiting for a change of tire on our airplane!), it somewhat makes sense that rare simulations that agree with the data should be weighted much more than values that stem from the prior simulations and hence do not translate much of an information brought by the observation. (If this sounds murky, blame the beer.) What I found great about this new approach is that it produces a non-parametric evaluation of the cdf of the quantity of interest h(θ) at no calibration cost or hardly any. (An R package is in the making, to be added to the existing R functions of abcrf we developed for the ABC model choice paper.)