ergodicity of approximate MCMC chains with applications to large datasets

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , on August 31, 2015 by xi'an

bhamAnother arXived paper I read on my way to Warwick! And yet another paper written by my friend Natesh Pillai (and his co-author Aaron Smith, from Ottawa). The goal of the paper is to study the ergodicity and the degree of approximation of the true posterior distribution of approximate MCMC algorithms that recently flourished as an answer to “Big Data” issues… [Comments below are about the second version of this paper.] One of the most curious results in the paper is the fact that the approximation may prove better than the original kernel, in terms of computing costs! If asymptotically in the computing cost. There also are acknowledged connections with the approximative MCMC kernel of Pierre Alquier, Neal Friel, Richard Everitt and A Boland, briefly mentioned in an earlier post.

The paper starts with a fairly theoretical part, to follow with an application to austerity sampling [and, in the earlier version of the paper, to the Hoeffding bounds of Bardenet et al., both discussed earlier on the ‘Og, to exponential random graphs (the paper being rather terse on the description of the subsampling mechanism), to stochastic gradient Langevin dynamics (by Max Welling and Yee-Whye Teh), and to ABC-MCMC]. The assumptions are about the transition kernels of a reference Markov kernel and of one associated with the approximation, imposing some bounds on the Wasserstein distance between those kernels, K and K’. Results being generic, there is no constraint as to how K is chosen or on how K’ is derived from K. Except in Lemma 3.6 and in the application section, where the same proposal kernel L is used for both Metropolis-Hastings algorithms K and K’. While I understand this makes for an easier coupling of the kernels, this also sounds like a restriction to me in that modifying the target begs for a similar modification in the proposal, if only because the tails they are a-changin’

In the case of subsampling the likelihood to gain computation time (as discussed by Korattikara et al. and by Bardenet et al.), the austerity algorithm as described in Algorithm 2 is surprising as the average of the sampled data log-densities and the log-transform of the remainder of the Metropolis-Hastings probability, which seem unrelated, are compared until they are close enough.  I also find hard to derive from the different approximation theorems bounding exceedance probabilities a rule to decide on the subsampling rate as a function of the overall sample size and of the computing cost. (As a side if general remark, I remain somewhat reserved about the subsampling idea, given that it requires the entire dataset to be available at every iteration. This makes parallel implementations rather difficult to contemplate.)

no country for ‘Og snaps?!

Posted in Mountains, pictures, Travel with tags , , , , , , on August 30, 2015 by xi'an

A few days ago, I got an anonymous comment complaining about my tendency to post pictures “no one is interested in” on the ‘Og and suggesting I moved them to another electronic media like Twitter or Instagram as to avoid readers having to sort through the blog entries for statistics related ones, to separate the wheat from the chaff… While my first reaction was (unsurprisingly) one of irritation, a more constructive one is to point out to all (un)interested readers that they can always subscribe by RSS to the Statistics category (and skip the chaff), just like R bloggers only post my R related entries. Now, if more ‘Og’s readers find the presumably increasing flow of pictures a nuisance, just let me know and I will try to curb this avalanche of pixels… Not certain that I succeed, though!

walking the PCT

Posted in Books, Kids, Mountains, pictures, Running, Travel with tags , , , , , , on August 29, 2015 by xi'an

The last book I read in the hospital was wild, by Cheryl Strayed, which was about walking the Pacific Crest Trail (PCT) as a regenerating experience. The book was turned into a movie this year. I did not like the book very much and did not try to watch the film, but when I realised my vacation rental would bring me a dozen miles from the PCT, I planned a day hike along this mythical trail… Especially since my daughter had dreams of hiking the trail one day. (Not realising at the time that Cheryl Strayed had not come that far north, but had stopped at the border between Oregon and Washington.OLYMPUS DIGITAL CAMERA)

The hike was really great, staying on a high ridge for most of the time and offering 360⁰ views of the Eastern North Cascades (as well as forest fire smoke clouds in the distance…) Walking on the trail was very smooth as it was wide enough, with a limited gradient and hardly anyone around. Actually, we felt like intruding tourists on the trail, with our light backpacks, since the few hikers we crossed were long-distance hikers, “doing” the trail with sometimes backpacks that looked as heavy as Strayed’s original “Monster”. And sometimes with incredibly light ones. A great specificity of those people is that they all were more than ready to share their experiences and goals, with no complaint about the hardship of being on the trail for several months! And sounding more sorry than eager to reach the Canadian border and the end of the PCT in a few more dozen miles… For instance, a solitary female hiker told us of her plans to get back to the section near Lake Chelan she had missed the week before due to threatening forest fires. A great entry to the PCT, with the dream of walking a larger portion in an undefined future…

beyond subjective and objective in Statistics

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on August 28, 2015 by xi'an

“At the level of discourse, we would like to move beyond a subjective vs. objective shouting match.” (p.30)

This paper by Andrew Gelman and Christian Hennig calls for the abandonment of the terms objective and subjective in (not solely Bayesian) statistics. And argue that there is more than mere prior information and data to the construction of a statistical analysis. The paper is articulated as the authors’ proposal, followed by four application examples, then a survey of the philosophy of science perspectives on objectivity and subjectivity in statistics and other sciences, next to a study of the subjective and objective aspects of the mainstream statistical streams, concluding with a discussion on the implementation of the proposed move. Continue reading

abcfr 0.9-3

Posted in R, Statistics, University life with tags , , , , , , , , on August 27, 2015 by xi'an

garden tree, Jan. 12, 2012In conjunction with our reliable ABC model choice via random forest paper, about to be resubmitted to Bioinformatics, we have contributed an R package called abcrf that produces a most likely model and its posterior probability out of an ABC reference table. In conjunction with the realisation that we could devise an approximation to the (ABC) posterior probability using a secondary random forest. “We” meaning Jean-Michel Marin and Pierre Pudlo, as I only acted as a beta tester!

abcrfThe package abcrf consists of three functions:

  • abcrf, which constructs a random forest from a reference table and returns an object of class `abc-rf’;
  • plot.abcrf, which gives both variable importance plot of a model choice abc-rf object and the projection of the reference table on the LDA axes;
  • predict.abcrf, which predict the model for new data and evaluate the posterior probability of the MAP.

An illustration from the manual:

data(snp)
data(snp.obs)
mc.rf <- abcrf(snp[1:1e3, 1], snp[1:1e3, -1])
predict(mc.rf, snp[1:1e3, -1], snp.obs)

forest fires

Posted in Mountains, pictures, Travel with tags , , , , , , , , , , , , , , on August 26, 2015 by xi'an

fire1Wildfires rage through the US West, with currently 33 going in the Pacific Northwest, 29 in Northern California, and 18 in the northern Rockies, with more surface burned so far this year than in any of the past ten years. Drought, hot weather, high lightning frequency, and a shortage of firefighters across the US all are contributing factors…fire2Washington State is particularly stricken and when we drove to the North Cascades from Mt. Rainier, we came across at least two fires, one near Twisp and the other one around Chelan… The visibility was quite poor, due to the amount of smoke, and, while the road was open, we saw many burned areas with residual fumaroles and even a minor bush fire that was apparently let to die out by itself. The numerous orchards around had been spared, presumably thanks to their irrigation system.fire3The owner of a small café and fruit stand on Highway 20 told us about her employee, who had taken the day off to protect her home, near Chelane, that had already burned down last year. Among 300 or so houses. Later on our drive north, the air cleared up, but we saw many instances of past fires, like the one below near Hart’s Pass, which occurred in 2003 and has not yet reached regeneration. Wildfires have always been a reality in this area, witness the first US smokejumpers being based (in 1939) at Winthrop, in the Methow valley, but this does not make it less of an objective danger. (Which made me somewhat worried as we were staying in a remote wooden area with no Internet or phone coverage to hear about evacuation orders. And a single evacuation route through a forest…)fire5Even when crossing the fabulous North Cascades Highway to the West and Seattle-Tacoma airport, we saw further smoke clouds, like this one near Goodall, after Lake Ross, with closed side roads and campgrounds.fire4And, when flying back on Wednesday, along the Canadian border, more fire fronts and smoke clouds were visible from the plane. Little did we know then that the town of Winthrop, near which we stayed, was being evacuated at the time, that the North Cascades Highway was about to be closed, and that three firefighters had died in nearby Twisp… Kudos to all firefighters involved in those wildfires! (And close call for us as we would still be “stuck” there!)fire6

Blue Lake

Posted in Mountains, pictures, Running, Travel with tags , , , , on August 25, 2015 by xi'an

blu

Follow

Get every new post delivered to your Inbox.

Join 905 other followers