Anoop Korattikara, Yutian Chen and Max Welling recently arXived a paper on the appeal of using only part of the data to speed up MCMC. This is different from the growing literature on unbiased estimators of the likelihood exemplified by Andrieu & Roberts (2009). Here, the approximation to the true target is akin to the approximation in ABC algorithms in that a value of the parameter is accepted if the difference in the likelihoods is larger than a given bound. Expressing this perspective as a test on the mean of the log likelihood leads the authors to use instead a subsample from the whole sample. (The approximation level ε is then a bound on the pvalue.) While this idea only applies to iid settings, it is quite interesting and sounds a wee bit like a bootstrapped version of MCMC. Especially since it sounds as if it could provide an autoevaluation of its error.
Archive for bootstrap
austerity in MCMC land
Posted in Statistics with tags ABC, Approximate Bayesian computation, bias vs. variance, bootstrap, Gibbs sampling, MCMC on April 28, 2013 by xi'anMichael Jordan’s course at CREST
Posted in Statistics, University life with tags Bayesian nonparametrics, bootstrap, CREST, ENSAE, graduate course, Michael Jordan, Paris, PhD course, Stein's method on March 26, 2013 by xi'anNext month, Michael Jordan will give an advanced course at CRESTENSAE, Paris, on Recent Advances at the Interface of Computation and Statistics. The course will take place on April 4 (14:00, ENSAE, Room #11), 11 (14:00, ENSAE, Room #11), 15 (11:00, ENSAE, Room #11) and 18 (14:00, ENSAE, Room #11). It is open to everyone and attendance is free. The only constraint is a compulsory registration with Nadine Guedj (email: guedj[AT]ensae.fr) for security issues. I strongly advise all graduate students who can take advantage of this fantastic opportunity to grasp it! Here is the abstract to the course:
“I will discuss several recent developments in areas where statistical science meets computational science, with particular concern for bringing statistical inference into contact with distributed computing architectures and with recursive data structures :

How does one obtain confidence intervals in massive data sets? The bootstrap principle suggests resampling data to obtain fluctuations in the values of estimators, and thereby confidence intervals, but this is infeasible computationally with massive data. Subsampling the data yields fluctuations on the wrong scale, which have to be corrected to provide calibrated statistical inferences. I present a new procedure, the “bag of little bootstraps,” which circumvents this problem, inheriting the favorable theoretical properties of the bootstrap but also having a much more favorable computational profile.

The problem of matrix completion has been the focus of much recent work, both theoretical and practical. To take advantage of distributed computing architectures in this setting, it is natural to consider divideandconquer algorithms for matrix completion. I show that these work well in practice, but also note that new theoretical problems arise when attempting to characterize the statistical performance of these algorithms. Here the theoretical support is provided by concentration theorems for random matrices, and I present a new approach to matrix concentration based on Stein’s method.

Bayesian nonparametrics involves replacing the “prior distributions” of classical Bayesian analysis with “prior stochastic processes.” Of particular value are the class of “combinatorial stochastic processes,” which make it possible to express uncertainty (and perform inference) over combinatorial objects that are familiar as data structures in computer science.”
References are available on Michael’s homepage.
intrinsic quantity for a Markov chain?
Posted in Statistics with tags þ, bootstrap, convergence assessment, CREST, effective sample size, Gainesville, Markov chain Monte Carlo, renewal process on February 6, 2013 by xi'anI was attending a lecture this morning at CREST by Patrice Bertail where he was using estimated renewal parameters on a Markov chain to build (asymptotically) convergent bootstrap procedures. Estimating renewal parameters is obviously of interest in MCMC algorithms as they can be used to assess the convergence of the associated Markov chain: That is, if the estimation does not induce a significant bias. Another question that came to me during the talk is that; since those convergence assessments techniques are formally holding for any small set, choosing the small set in order to maximise the renewal rate also maximises the number of renewal events and hence the number of terms in the control sequence: Thus, the maximal renewal rate þ is definitely a quantity of interest: Now, is this quantity þ an intrinsic parameter of the chain, i.e. a quantity that drives its mixing and/or converging behaviour(s)? For instance; an iid sequence has a renewal rate of 1; because the whole set is a “small” set. Informally, the time between two consecutive renewal events is akin to the time between two simulations from the target and stationary distribution, according to the Kac’s representation we used in our AAP paper with Jim Hobert. So it could be that þ is directly related with the effective sample size of the chain, hence the autocorrelation. (A quick web search did not produce anything relevant:) Too bad this question did not pop up last week when I had the opportunity to discuss it with Sean Meyn in Gainesville!
R finals
Posted in R, Statistics, University life with tags acceptreject algorithm, bootstrap, India, ISBA, ks.test(), normalising constant, R exam, Université ParisSud, Varanasi on January 31, 2013 by xi'anOn the morning I returned from Varanasi and the ISBA meeting there, I had to give my R final exam (along with three of my colleagues in ParisDauphine). This year, the R course was completely in English, exam included, which means I can post it here as it may attract more interest than the French examens of past years…
I just completed grading my 32 copies, all from exam A, which takes a while as I have to check (and sometimes recover) the R code, and often to correct the obvious mistakes to see if the deeper understanding of the concepts is there. This year student cohort is surprisingly homogeneous: I did not spot any of the horrors I may have mentioned in previous posts.
I must alas acknowledge a grievous typo in the version of Exam B that was used the day of the final: cuttingandpasting from A to B, I forgot to change the parameters in Exercise 2, asking them to simulate a Gamma(0,1). It is only after half an hour that a bright student pointed out the impossibility… We had tested the exams prior to printing them but this somehow escaped the four of us!
Now, as I was entering my grades into the global spreadsheet, I noticed a perfect… lack of correlation between those and the grades at the midterm exam. I wonder what that means: I could be grading at random, the levels in November and in January could be uncorrelated, some students could have cheated in November and others in January, student’s names or file names got mixed up, …? A rather surprising outcome!
adjustment of bias and coverage for confidence intervals
Posted in Statistics with tags ABC, ACS 2012, Adelaide, arXiv, bootstrap, credible intervals, frequentist coverage, indirect inference, parametric bootstrap on October 18, 2012 by xi'anMenéndez, Fan, Garthwaite, and Sisson—whom I heard in Adelaide on that subject—posted yesterday a paper on arXiv about correcting the frequentist coverage of default intervals toward their nominal level. Given such an interval [L(x),U(x)], the correction for proper frequentist coverage is done by parametric bootstrap, i.e. by simulating n replicas of the original sample from the pluggin density f(.θ^{*}) and deriving the empirical cdf of L(y)θ^{*}. And of U(y)θ^{*}. Under the assumption of consistency of the estimate θ^{*}, this ensures convergence (in the original sampled size) of the corrected bounds.
Since ABC is based on the idea that pseudo data can be simulated from f(.θ) for any value of θ, the concept “naturally” applies to ABC outcomes, as illustrated in the paper by a gandk noise MA(1) model. (As noted by the authors, there always is some uncertainty with the consistency of the ABC estimator.) However, there are a few caveats:
 ABC usually aims at approximating the posterior distribution (given the summary statistics), of which the credible intervals are an inherent constituent. Hence, attempts at recovering a frequentist coverage seem contradictory with the original purpose of the method. Obviously, if ABC is instead seen as an inference method per se, like indirect inference, this objection does not hold.
 Then, once the (umbilical) link with Bayesian inference is partly severed, there is no particular reason to stick to credible sets for [L(x),U(x)]. A more standard parametric bootstrap approach, based on the bootstrap distribution of θ^{*}, should work as well. This means that a comparison with other frequentist methods like indirect inference could be relevant.
 At last, and this is also noted by the authors, the method may prove extremely expensive. If the bounds L(x) and U(x) are obtained empirically from an ABC sample, a new ABC computation must be associated with each one of the n replicas of the original sample. It would be interesting to compare the actual coverages of this ABCcorrected method with a more direct parametric bootstrap approach.
Michael Jordan à Normale demain
Posted in Statistics, University life with tags École Normale Supérieure, bag of little bootstraps, big data, bootstrap, Michael Jordan, Paris on October 1, 2012 by xi'anTomorrow (Tuesday, Oct. 2), Michael Jordan (Berkeley) will give a seminar at École Normale Supérieure, in French:
Mardi 2 octobre 2012 à 11h15. (Le séminaire sera précédé d’un café à 11 heures)
Salle W, Toits du DMA, Ecole Normale Supérieure, 45 rue d’Ulm, Paris 5e.DiviserpourRégner and Inférence Statistique pour “Big Data”
Michael I. Jordan
Fondation Sciences Mathematiques de Paris & University of California, BerkeleyDans cet exposé nous présentons quelques résultats récents dans le domaine d’inférence pour “Big Data”. Diviserpourrégner est un outil essentiel du point de vue computationel pour aborder des problèmes de traitement de données à grande échelle, surtout vu la croissance récente de systèmes distribués, mais ce paradigme présente des difficultés lorsqu’il s’agit d’inférence statistique. Considérons, par exemple, le problème fondamentale d’obtenir des intervalles de confiance pour les estimateurs. Le principe du “bootstrap” suggère d’échantilloner les données à plusieurs reprises pour obtenir des fluctuations et donc des intervalles de confiance, mais ceci est infaisable à grande échelle. Si on fait appel à des souséchantillons on obtient des fluctuations qui ne sont pas à l’échelle correcte. Nous présentons une approche nouvelle, la “bag of little bootstraps,” qui circonvient ce problème et qui peut être appliquée à des données à grande échelle. Nous parlerons aussi du problème de complétion de matrice à grande échelle, où l’approche de diviserpourrégner est un outil practique mais qui soulève des problèmes théoriques. Le soutient théorique est fournit par des théorèmes de concentration de matrices aléatoires, et à ce propos nous présentons une approche nouvelle à la concentration de matrices aléatoires basée sur la méthode de Stein. [En collaboration avec Ariel Kleiner, Lester Mackey, Purna Sarkar, Ameet Talwalkar, Richard Chen, Brendan Farrell et Joel Tropp].