Archive for the Books Category

ABC for big data

Posted in Books, Statistics, University life with tags , , , , , , , on June 23, 2015 by xi'an

abcpestou“The results in this paper suggest that ABC can scale to large data, at least for models with a xed number of parameters, under the assumption that the summary statistics obey a central limit theorem.”

In a week rich with arXiv submissions about MCMC and “big data”, like the Variational consensus Monte Carlo of Rabinovich et al., or scalable Bayesian inference via particle mirror descent by Dai et al., Wentao Li and Paul Fearnhead contributed an impressive paper entitled Behaviour of ABC for big data. However, a word of warning: the title is somewhat misleading in that the paper does not address the issue of big or tall data per se, e.g., the impossibility to handle the whole data at once and to reproduce it by simulation, but rather the asymptotics of ABC. The setting is not dissimilar to the earlier Fearnhead and Prangle (2012) Read Paper. The central theme of this theoretical paper [with 24 pages of proofs!] is to study the connection between the number N of Monte Carlo simulations and the tolerance value ε when the number of observations n goes to infinity. A main result in the paper is that the ABC posterior mean can have the same asymptotic distribution as the MLE when ε=o(n-1/4). This is however in opposition with of no direct use in practice as the second main result that the Monte Carlo variance is well-controlled only when ε=O(n-1/2). There is therefore a sort of contradiction in the conclusion, between the positive equivalence with the MLE and

Something I have (slight) trouble with is the construction of an importance sampling function of the fABC(s|θ)α when, obviously, this function cannot be used for simulation purposes. The authors point out this fact, but still build an argument about the optimal choice of α, namely away from 0 and 1, like ½. Actually, any value different from 0,1, is sensible, meaning that the range of acceptable importance functions is wide. Most interestingly (!), the paper constructs an iterative importance sampling ABC in a spirit similar to Beaumont et al. (2009) ABC-PMC. Even more interestingly, the ½ factor amounts to updating the scale of the proposal as twice the scale of the target, just as in PMC.

Another aspect of the analysis I do not catch is the reason for keeping the Monte Carlo sample size to a fixed value N, while setting a sequence of acceptance probabilities (or of tolerances) along iterations. This is a very surprising result in that the Monte Carlo error does remain under control and does not dominate the overall error!

“Whilst our theoretical results suggest that point estimates based on the ABC posterior have good properties, they do not suggest that the ABC posterior is a good approximation to the true posterior, nor that the ABC posterior will accurately quantify the uncertainty in estimates.”

Overall, this is clearly a paper worth reading for understanding the convergence issues related with ABC. With more theoretical support than the earlier Fearnhead and Prangle (2012). However, it does not provide guidance into the construction of a sequence of Monte Carlo samples nor does it discuss the selection of the summary statistic, which has obviously a major impact on the efficiency of the estimation. And to relate to the earlier warning, it does not cope with “big data” in that it reproduces the original simulation of the n sized sample.

on Markov chain Monte Carlo methods for tall data

Posted in Books, Statistics, University life with tags , , , , , on June 22, 2015 by xi'an

Rémi Bardenet, Arnaud Doucet, and Chris Holmes arXived a long paper (with the above title) a month ago, paper that I did not have time to read in detail till today. The paper is quite comprehensive in its analysis of the current literature on MCMC for huge, tall, or big data. Even including our delayed acceptance paper! Now, it is indeed the case that we are all still struggling with this size difficulty. Making proposals in a wide range of directions, hopefully improving the efficiency of dealing with tall data. However, we are not there yet in that the outcome is either about as costly as the original MCMC implementation or its degree of approximation is unknown, even when bounds are available.

Most of the paper proposal is based on aiming at an unbiased estimator of the likelihood function in a pseudo-marginal manner à la Andrieu and Roberts (2009) and on a random subsampling scheme that presumes (a) iid-ness and (b) a lower bound on each term in the likelihood. It seems to me slightly unrealistic to assume that a much cheaper and tight lower bound on those terms could be available. Firmly set in the iid framework, the problem itself is unclear: do we need 10⁸ observations of a logistic model with a few parameters? The real challenge is rather in non-iid hierarchical models with random effects and complex dependence structures. For which subsampling gets much more delicate. None of the methods surveyed in the paper broaches upon such situations where the entire data cannot be explored at once.

An interesting experiment therein, based on the Glynn and Rhee (2014) unbiased representation, shows that the approach does not work well. This could lead the community to reconsider the focus on unbiasedness by coming full circle to the opposition  between bias and variance. And between intractable likelihood and representative subsample likelihood.

Reading the (superb) coverage of earlier proposals made me trace back on the perceived appeal of the decomposition of Neiswanger et al. (2014) as I came to realise that the product of functions renormalised into densities has no immediate probabilistic connection with its components. As an extreme example, terms may fail to integrate. (Of course, there are many Monte Carlo features that exploit such a decomposition, from the pseudo-marginal to accept-reject algorithms. And more to come.) Taking samples from terms in the product is thus not directly related to taking samples from each term, in opposition with the arithmetic mixture representation. I was first convinced by using a fraction of the prior in each term but now find it unappealing because there is no reason the prior should change for a smaller sampler and no equivalent to the prohibition of using the data several times. At this stage, I would be much more in favour of raising a random portion of the likelihood function to the right power. An approach that I suggested to a graduate student earlier this year and which is also discussed in the paper. And considered too naïve and a “very poor approach” (Section 6, p.18), even though there must be versions that do not run afoul of the non-Gaussian nature of the log likelihood ratio. I am certainly going to peruse more thoroughly this Section 6 of the paper.

Another interesting suggestion in this definitely rich paper is the foray into an alternative bypassing the uniform sampling in the Metropolis-Hastings step, using instead the subsampled likelihood ratio. The authors call this “exchanging acceptance noise for subsampling noise” (p.22). However, there is no indication about the resulting stationary and I find the notion of only moving to higher likelihoods (or estimates of) counter to the spirit of Metropolis-Hastings algorithms. (I have also eventually realised the meaning of the log-normal “difficult” benchmark that I missed in the earlier : it means log-normal data is modelled by a normal density.)  And yet another innovation along the lines of a control variate for the log likelihood ratio, no matter it sounds somewhat surrealistic.

Current trends in Bayesian methodology with applications

Posted in Books, Statistics, Travel, University life with tags , , , , , on June 20, 2015 by xi'an

When putting this volume together with Umesh Singh, Dipak Dey, and Appaia Loganathan, my friend Satyanshu Upadhyay from Varanasi, India, asked me for a foreword. The book is now out, with chapters written by a wide variety of Bayesians. And here is my foreword, for what it’s worth:

It is a great pleasure to see a new book published on current aspects of Bayesian Analysis and coming out of India. This wide scope volume reflects very accurately on the present role of Bayesian Analysis in scientific inference, be it by statisticians, computer scientists or data analysts. Indeed, we have witnessed in the past decade a massive adoption of Bayesian techniques by users in need of statistical analyses, partly because it became easier to implement such techniques, partly because both the inclusion of prior beliefs and the production of a posterior distribution that provides a single filter for all inferential questions is a natural and intuitive way to process the latter. As reflected so nicely by the subtitle of Sharon McGrayne’s The Theory that Would not Die, the Bayesian approach to inference “cracked the Enigma code, hunted down Russian submarines” and more generally contributed to solve many real life or cognitive problems that did not seem to fit within the traditional patterns of a statistical model.
Two hundred and fifty years after Bayes published his note, the field is more diverse than ever, as reflected by the range of topics covered by this new book, from the foundations (with objective Bayes developments) to the implementation by filters and simulation devices, to the new Bayesian methodology (regression and small areas, non-ignorable response and factor analysis), to a fantastic array of applications. This display reflects very very well on the vitality and appeal of Bayesian Analysis. Furthermore, I note with great pleasure that the new book is edited by distinguished Indian Bayesians, India having always been a provider of fine and dedicated Bayesians. I thus warmly congratulate the editors for putting this exciting volume together and I offer my best wishes to readers about to appreciate the appeal and diversity of Bayesian Analysis.

Objective Bayesian hypothesis testing

Posted in Books, Statistics, University life with tags , , , , on June 19, 2015 by xi'an

Our paper with Diego Salmerón and Juan Cano using integral priors for binomial regression and objective Bayesian hypothesis testing (one of my topics of interest, see yesterday’s talk!) eventually appeared in Statistica Sinica. This is Volume 25,  Number 3, of July 2015 and the table of contents shows an impressively diverse range of topics.

philosophy at the 2015 Baccalauréat

Posted in Books, Kids with tags , , , , , , , , , , on June 18, 2015 by xi'an

[Here is the pre-Bayesian quote from Hume that students had to analyse this year for the Baccalauréat:]

The maxim, by which we commonly conduct ourselves in our reasonings, is, that the objects, of which we have no experience, resembles those, of which we have; that what we have found to be most usual is always most probable; and that where there is an opposition of arguments, we ought to give the preference to such as are founded on the greatest number of past observations. But though, in proceeding by this rule, we readily reject any fact which is unusual and incredible in an ordinary degree; yet in advancing farther, the mind observes not always the same rule; but when anything is affirmed utterly absurd and miraculous, it rather the more readily admits of such a fact, upon account of that very circumstance, which ought to destroy all its authority. The passion of surprise and wonder, arising from miracles, being an agreeable emotion, gives a sensible tendency towards the belief of those events, from which it is derived.” David Hume, An Enquiry Concerning Human Understanding,

Paris Machine Learning Meeting #10 Season 2

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , on June 17, 2015 by xi'an

Invalides, Paris, May 8, 2012

Tonight, I am invited to give a speed-presenting talk at the Paris Machine Learning last meeting of Season 2, with the themes of DL, Recovering Robots, Vowpal Wabbit, Predcsis, Matlab, and Bayesian test [by yours truly!] The meeting will take place in Jussieu, Amphi 25, Here are my slides for the meeting:

As it happened, the meeting  was quite crowded with talks and plagued with technical difficulties in transmitting talks from Berlin and Toronto, so I came to talk about three hours after the beginning, which was less than optimal for the most technical presentation of the evening. I actually wonder if I even managed to carry the main idea of replacing Bayes factors with posteriors of the mixture weight! [I had plenty of time to reflect upon this on my way back home as I had to wait for several and rare and crowded RER trains until one had enough room for me and my bike!]

Statistics and Computing special issue on BNP

Posted in Books, Statistics, University life with tags , , , , , , , on June 16, 2015 by xi'an

[verbatim from the call for papers:]

Statistics and Computing is preparing a special issue on Bayesian Nonparametrics, for publication by early 2016. We invite researchers to submit manuscripts for publication in the special issue. We expect that the focus theme will increase the visibility and impact of papers in the volume.

By making use of infinite-dimensional mathematical structures, Bayesian nonparametric statistics allows the complexity of a learned model to grow as the size of a data set grows. This flexibility can be particularly suited to modern data sets but can also present a number of computational and modelling challenges. In this special issue, we will showcase novel applications of Bayesian nonparametric models, new computational tools and algorithms for learning these models, and new models for the diverse structures and relations that may be present in data.

To submit to the special issue, please use the Statistics and Computing online submission system. To indicate consideration for the special issue, choose “Special Issue: Bayesian Nonparametrics” as the article type. Papers must be prepared in accordance with the Statistics and Computing journal guidelines.

Papers will go through the usual peer review process. The special issue website will be updated with any relevant deadlines and information.

Deadline for manuscript submission: August 20, 2015

Guest editors:
Tamara Broderick (MIT)
Katherine Heller (Duke)
Peter Mueller (UT Austin)

Get every new post delivered to your Inbox.

Join 876 other followers