## an avalanche of new arXivals

Posted in Books, Statistics, University life with tags on May 25, 2016 by xi'an

In the past few days, I spotted the following papers being arXived, far too many for me to comment them all!

Enjoy!

## likelihood inflating sampling algorithm

Posted in Books, Statistics, University life with tags , , , , , , , , on May 24, 2016 by xi'an

My friends from Toronto Radu Craiu and Jeff Rosenthal have arXived a paper along with Reihaneh Entezari on MCMC scaling for large datasets, in the spirit of Scott et al.’s (2013) consensus Monte Carlo. They devised an likelihood inflated algorithm that brings a novel perspective to the problem of large datasets. This question relates to earlier approaches like consensus Monte Carlo, but also kernel and Weierstrass subsampling, already discussed on this blog, as well as current research I am conducting with my PhD student Changye Wu. The approach by Entezari et al. is somewhat similar to consensus Monte Carlo and the other solutions in that they consider an inflated (i.e., one taken to the right power) likelihood based on a subsample, with the full sample being recovered by importance sampling. Somewhat unsurprisingly this approach leads to a less dispersed estimator than consensus Monte Carlo (Theorem 1). And the paper only draws a comparison with that sub-sampling method, rather than covering other approaches to the problem, maybe because this is the most natural connection, one approach being the k-th power of the other approach.

“…we will show that [importance sampling] is unnecessary in many instances…” (p.6)

An obvious question that stems from the approach is the call for importance sampling, since the numerator of the importance sampler involves the full likelihood which is unavailable in most instances when sub-sampled MCMC is required. I may have missed the part of the paper where the above statement is discussed, but the only realistic example discussed therein is the Bayesian regression tree (BART) of Chipman et al. (1998). Which indeed constitutes a challenging if one-dimensional example, but also one that requires delicate tuning that leads to cancelling importance weights but which may prove delicate to extrapolate to other models.

## the snow geese [book review]

Posted in Books, Kids, pictures, Travel with tags , , , , , on May 21, 2016 by xi'an

Just as for the previous book, I found this travel book in a nice bookstore, Rue Mouffetard, after my talk at Agro, and bought it [in a French translation] in prevision for my incoming trip to Spain. And indeed read it while in Spain, finishing it a few minutes before touching ground in Paris.

“The hunters wolfed down chicken fried steaks or wolfed down cuds of Red Man, Beech-Nut, Levi Garrett, or Jackson’s Apple Jack”

The Snow Geese was written in 2002 by William Fiennes, a young Englishman recovering from a serious disease and embarking on a wild quest to overcome post-sickness depression. While the idea behind the trip is rather alluring, namely to follow Arctic geese from their wintering grounds in Texas to their summer nesting place on Baffin Island, the book itself is sort of a disaster. As the prose of the author is very heavy, or even very very heavy, with an accumulation of descriptions that do not contribute to the story and a highly bizarre habit to mention brands by groups of three. And of using heavy duty analogies, as in “we were travelling across the middle of a page, with whiteness and black markings all around us, and geese lifting off the snow like letters becoming unstuck”. The reflections about the recovery of the author from a bout of depression and the rise of homesickness and nostalgia are not in the least deep or challenging, while the trip of the geese does not get beyond the descriptive. Worse, the geese remain a mystery, a blur, and a collective, rather than bringing the reader closer to them. If anything is worth mentioning there, it is instead the encounters of the author with rather unique characters, at every step of his road- and plane-trips. To the point of sounding too unique to be true…  His hunting trip with a couple of Inuit hunters north of Iqualit on Baffin Island is both a high and a down of the book in that sharing a few days with them in the wild is exciting in a primeval sense, while witnessing them shoot down the very geese the author followed for 5000 kilometres sort of negates the entire purpose of the trip. It then makes perfect sense to close the story with a feeling of urgency, for there is nothing worth adding.

## ABC random forests for Bayesian parameter inference

Posted in Books, Kids, R, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , on May 20, 2016 by xi'an

Before leaving Helsinki, we arXived [from the Air France lounge!] the paper Jean-Michel presented on Monday at ABCruise in Helsinki. This paper summarises the experiments Louis conducted over the past months to assess the great performances of a random forest regression approach to ABC parameter inference. Thus validating in this experimental sense the use of this new approach to conducting ABC for Bayesian inference by random forests. (And not ABC model choice as in the Bioinformatics paper with Pierre Pudlo and others.)

I think the major incentives in exploiting the (still mysterious) tool of random forests [against more traditional ABC approaches like Fearnhead and Prangle (2012) on summary selection] are that (i) forests do not require a preliminary selection of the summary statistics, since an arbitrary number of summaries can be used as input for the random forest, even when including a large number of useless white noise variables; (b) there is no longer a tolerance level involved in the process, since the many trees in the random forest define a natural if rudimentary distance that corresponds to being or not being in the same leaf as the observed vector of summary statistics η(y); (c) the size of the reference table simulated from the prior (predictive) distribution does not need to be as large as for in usual ABC settings and hence this approach leads to significant gains in computing time since the production of the reference table usually is the costly part! To the point that deriving a different forest for each univariate transform of interest is truly a minor drag in the overall computing cost of the approach.

An intriguing point we uncovered through Louis’ experiments is that an unusual version of the variance estimator is preferable to the standard estimator: we indeed exposed better estimation performances when using a weighted version of the out-of-bag residuals (which are computed as the differences between the simulated value of the parameter transforms and their expectation obtained by removing the random trees involving this simulated value). Another intriguing feature [to me] is that the regression weights as proposed by Meinshausen (2006) are obtained as an average of the inverse of the number of terms in the leaf of interest. When estimating the posterior expectation of a transform h(θ) given the observed η(y), this summary statistic η(y) ends up in a given leaf for each tree in the forest and all that matters for computing the weight is the number of points from the reference table ending up in this very leaf. I do find this difficult to explain when confronting the case when many simulated points are in the leaf against the case when a single simulated point makes the leaf. This single point ends up being much more influential that all the points in the other situation… While being an outlier of sorts against the prior simulation. But now that I think more about it (after an expensive Lapin Kulta beer in the Helsinki airport while waiting for a change of tire on our airplane!), it somewhat makes sense that rare simulations that agree with the data should be weighted much more than values that stem from the prior simulations and hence do not translate much of an information brought by the observation. (If this sounds murky, blame the beer.) What I found great about this new approach is that it produces a non-parametric evaluation of the cdf of the quantity of interest h(θ) at no calibration cost or hardly any. (An R package is in the making, to be added to the existing R functions of abcrf we developed for the ABC model choice paper.)

## Using MCMC output to efficiently estimate Bayes factors

Posted in Books, R, Statistics, University life with tags , , , , on May 19, 2016 by xi'an

As I was checking for software to answer a query on X validated about generic Bayes factor derivation, I came across an R software called BayesFactor, which only applies in regression settings and relies on the Savage-Dickey representation of the Bayes factor

$B_{01}=\dfrac{f(y|\theta^0)}{m(y)}=\dfrac{\pi(\theta^0|y)}{\pi(\theta^0)}$

when the null hypothesis writes as θ=θ⁰ (and possibly additional nuisance parameters with [roughly speaking] an independent prior). As we discussed in our paper with Jean-Michel Marin [which got ignored by large!], this representation of the Bayes factor is based on picking a very specific version of the prior, or more exactly of three prior densities. Assuming such versions are selected, I wonder at the performances of this approximation, given that it involves approximating the marginal posterior at θ⁰….

“To ensure that the Bayes factor we compute using the Savage–Dickey ratio is the the ratio of marginal densities that we intend, the condition (…) is easily met by models which specify priors in which the nuisance parameters are independent of the parameters of interest.” Morey et al. (2011)

First, when reading Morey at al. (2011), I realised (a wee bit late!) that Chib’s method is nothing but a version of the Savage-Dickey representation when the marginal posterior can be estimated in a parametric (Rao-Blackwellised) way. However, outside hierarchical models based on conjugate priors such parametric approximations are intractable and non-parametric versions must be invoked instead, which necessarily degrades the quality of the method. A degradation that escalates with the dimension of the parameter θ. In addition, I am somewhat perplexed by the use of a Rao-Blackwell argument in the setting of the Dickey-Savage representation. Indeed this representation assumes that

$\pi_1(\psi|\theta_0)=\pi_0(\psi) \ \ \text{or}\quad \pi_1(\theta_0,\psi)=\pi_1(\theta_0)\pi_0(\psi)$

which means that [the specific version of] the conditional density of θ⁰ given ψ should not depend on the nuisance parameter. But relying on a Rao-Blackwellisation leads to estimate the marginal posterior via full conditionals. Of course, θ given ψ and y may depend on ψ, but still… Morey at al. (2011) advocate the recourse to Chib’s formula as optimal but this obviously requires the full conditional to be available. They acknowledge this point as moot, since it is sufficient from their perspective to specify a conjugate prior. They consider this to be a slight modification of the model (p.377). However, I see the evaluation of an estimated density at a single (I repeat, single!) point as being the direst part of the method as it is clearly more sensitive to approximations that the evaluation of a whole integral, since the later incorporates an averaging effect by definition. Hence, even if this method was truly available for all models, I would be uncertain of its worth when compared with other methods, except the harmonic mean estimator of course!

On the side, Morey at al. (2011) study a simple one-sample t test where they use an improper prior on the nuisance parameter σ, under both models. While the Savage-Dickey representation is correct in this special case, I fail to see why the identity would apply in every case under an improper prior. In particular, independence does not make sense with improper priors. The authors also indicate the possible use of this Bayes factor approximation for encompassing models. At first, I thought this could be most useful in our testing by mixture framework where we define an encompassing model as a mixture. However, I quickly realised that using a Beta Be(a,a) prior on the weight α with a<1 leads to an infinite density value at both zero and one, hence cannot be compatible with a Savage-Dickey representation of the Bayes factor.

## reversible chain[saw] massacre

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , on May 16, 2016 by xi'an

A paper in Nature this week that uses reversible-jump MCMC, phylogenetic trees, and Bayes factors. And that looks at institutionalised or ritual murders in Austronesian cultures. How better can it get?!

“by applying Bayesian phylogenetic methods (…) we find strong support for models in which human sacrifice stabilizes social stratification once stratification has arisen, and promotes a shift to strictly inherited class systems.” Joseph Watts et al.

The aim of the paper is to establish that societies with human sacrifices are more likely to have become stratified and stable than societies without such niceties. The hypothesis to be tested is then about the evolution towards more stratified societies rather the existence of a high level of stratification.

“The social control hypothesis predicts that human sacrifice (i) co-evolves with social stratification, (ii) increases the chance of a culture gaining social stratification, and (iii) reduces the chance of a culture losing social stratification once stratification has arisen.” Joseph Watts et al.

The methodological question is then how can this be tested when considering those are extinct societies about which little is known. Grouping together moderate and high stratification societies against egalitarian societies, the authors tested independence of both traits versus dependence, with a resulting Bayes factor of 3.78 in favour of the latest. Other hypotheses of a similar flavour led to Bayes factors in the same range. Which is thus not overwhelming. Actually, given that the models are quite simplistic, I do not agree that those Bayes factors prove anything of the magnitude of such anthropological conjectures. Even if the presence/absence of human sacrifices is confirmed in all of the 93 societies, and if the stratification of the cultures is free from uncertainties, the evolutionary part is rather involved, from my neophyte point of view: the evolutionary structure (reproduced above) is based on a sample of 4,200 trees based on Bayesian analysis of Austronesian basic vocabulary items, followed by a call to the BayesTrait software to infer about evolution patterns between stratification levels, concluding (with p-values!) at a phylogenetic structure of the data. BayesTrait was also instrumental in deriving MLEs for the various transition rates, “in order to inform our choice of priors” (!). BayesTrait has an MCMC function used by the authors “to test for correlated evolution between traits” and derive the above Bayes factors. Using a stepping-stone method I am unaware of. And 10⁹ iterations (repeated 3 times for checking consistency)… Reversible jump is apparently used to move between constrained and unconstrained models, leading to the pie charts at the inner nodes of the above picture. Again a by-product of BayesTrait. The trees on the left and the right are completely identical, the difference being in the inference about stratification evolution (right) and sacrifice evolution (left). While the overall hypothesis makes sense at my layman level (as a culture has to be stratified enough to impose sacrifices from its members), I am not convinced that this involved statistical analysis brings that strong a support. (But it would make a fantastic topic for an undergraduate or a Master thesis!)

## murals plus eggs

Posted in Books, Kids, pictures, Running, Travel with tags , , , , on May 15, 2016 by xi'an

When running along the Guadalquivir in Sevilla,I came across street art murals that were both fairly good and preserved from graffiti.I also came across this most horrendous sculpture of Cristoforo Colombo in his egg, a queer monstrosity of a few dozen meters… (Maybe appropriately in tune with a man who brought large scale genocide, slavery and European diseases to the Americas.)