Archive for the Statistics Category

life and death along the RER B, minus approximations

Posted in Statistics, Travel with tags , , , , , , , , , , , , , , , on June 30, 2015 by xi'an

viemortrerbWhile cooking for a late Sunday lunch today, I was listening as usual to the French Public Radio (France Inter) and at some point head the short [10mn] Périphéries that gives every weekend an insight on the suburbs [on the “other side’ of the Parisian Périphérique boulevard]. The idea proposed by a geographer from Montpellier, Emmanuel Vigneron, was to point out the health inequalities between the wealthy 5th arrondissement of Paris and the not-so-far-away suburbs, by following the RER B train line from Luxembourg to La Plaine-Stade de France…

The disparities between the heart of Paris and some suburbs are numerous and massive, actually the more one gets away from the lifeline represented by the RER A and RER B train lines, so far from me the idea of negating this opposition, but the presentation made during those 10 minutes of Périphéries was quite approximative in statistical terms. For instance, the mortality rate in La Plaine is 30% higher than the mortality rate in Luxembourg and this was translated into the chances for a given individual from La Plaine to die in the coming year are 30% higher than if he [or she] lives in Luxembourg. Then a few minutes later the chances for a given individual from Luxembourg to die are 30% lower than he [or she] lives in La Plaine…. Reading from the above map, it appears that the reference is the mortality rate for the Greater Paris. (Those are 2010 figures.) This opposition that Vigneron attributes to a different access to health facilities, like the number of medical general practitioners per inhabitant, does not account for the huge socio-demographic differences between both places, for instance the much younger and maybe larger population in suburbs like La Plaine. And for other confounding factors: see, e.g., the equally large difference between the neighbouring stations of Luxembourg and Saint-Michel. There is no socio-demographic difference and the accessibility of health services is about the same. Or the similar opposition between the southern suburban stops of Bagneux and [my local] Bourg-la-Reine, with the same access to health services… Or yet again the massive decrease in the Yvette valley near Orsay. The analysis is thus statistically poor and somewhat ideologically biased in that I am unsure the data discussed during this radio show tells us much more than the sad fact that suburbs with less favoured populations show a higher mortality rate.

Kamiltonian Monte Carlo [no typo]

Posted in Books, Statistics, University life with tags , , , , , , , , , , on June 29, 2015 by xi'an

kamilHeiko Strathmann, Dino Sejdinovic, Samuel Livingstone, Zoltán Szabó, and Arthur Gretton arXived a paper last week about Kamiltonian MCMC, the K being related with RKHS. (RKHS as in another KAMH paper for adaptive Metropolis-Hastings by essentially the same authors, plus Maria Lomeli and Christophe Andrieu. And another paper by some of the authors on density estimation via infinite exponential family models.) The goal here is to bypass the computation of the derivatives in the moves of the Hamiltonian MCMC algorithm by using a kernel surrogate. While the genuine RKHS approach operates within an infinite exponential family model, two versions are proposed, KMC lite with an increasing sequence of RKHS subspaces, and KMC finite, with a finite dimensional space. In practice, this means using a leapfrog integrator with a different potential function, hence with a different dynamics.

The estimation of the infinite exponential family model is somewhat of an issue, as it is estimated from the past history of the Markov chain, simplified into a random subsample from this history [presumably without replacement, meaning the Markovian structure is lost on the subsample]. This is puzzling because there is dependence on the whole past, which cancels ergodicity guarantees… For instance, we gave an illustration in Introducing Monte Carlo Methods with R [Chapter 8] of the poor impact of approximating the target by non-parametric kernel estimates. I would thus lean towards the requirement of a secondary Markov chain to build this kernel estimate. The authors are obviously aware of this difficulty and advocate an attenuation scheme. There is also the issue of the cost of a kernel estimate, in O(n³) for a subsample of size n. If, instead, a fixed dimension m for the RKHS is selected, the cost is in O(tm²+m³), with the advantage of a feasible on-line update, making it an O(m³) cost in fine. But again the worry of using the whole past of the Markov chain to set its future path…

Among the experiments, a KMC for ABC that follows the recent proposal of Hamiltonian ABC by Meeds et al. The arguments  are interesting albeit sketchy: KMC-ABC does not require simulations at each leapfrog step, is it because the kernel approximation does not get updated at each step? Puzzling.

I also discussed the paper with Michael Betancourt (Warwick) and here his comments:

“I’m hesitant for the same reason I’ve been hesitant about algorithms like Bayesian quadrature and GP emulators in general. Outside of a few dimensions I’m not convinced that GP priors have enough regularization to really specify the interpolation between the available samples, so any algorithm that uses a single interpolation will be fundamentally limited (as I believe is born out in non-trivial scaling examples) and trying to marginalize over interpolations will be too awkward.

They’re really using kernel methods to model the target density which then gives the gradient analytically. RKHS/kernel methods/ Gaussian processes are all the same math — they’re putting prior measures over functions. My hesitancy is that these measures are at once more diffuse than people think (there are lots of functions satisfying a given smoothness criterion) and more rigid than people think (perturb any of the smoothness hyper-parameters and you get an entirely new space of functions).

When using these methods as an emulator you have to set the values of the hyper-parameters which locks in a very singular definition of smoothness and neglects all others. But even within this singular definition there are a huge number of possible functions. So when you only have a few points to constrain the emulation surface, how accurate can you expect the emulator to be between the points?

In most cases where the gradient is unavailable it’s either because (a) people are using decades-old Fortran black boxes that no one understands, in which case there are bigger problems than trying to improve statistical methods or (b) there’s a marginalization, in which case the gradients are given by integrals which can be approximated with more MCMC. Lots of options.”

Introduction to Monte Carlo methods with R and Bayesian Essentials with R

Posted in Books, R, Statistics, University life with tags , , , , , , on June 26, 2015 by xi'an

sales1Here are the  download figures for my e-book with George as sent to me last week by my publisher Springer-Verlag.  With an interesting surge in the past year. Maybe simply due to new selling strategies of the published rather to a wider interest in the book. (My royalties have certainly not increased!) Anyway thanks to all readers. As an aside for wordpress wannabe bloggers, I realised it is now almost impossible to write tables with WordPress, another illustration of the move towards small-device-supported blogs. Along with a new annoying “simpler” (or more accurately dumber) interface and a default font far too small for my eyesight. So I advise alternatives to wordpress that are more sympathetic to maths contents (e.g., using MathJax) and comfortable editing.

salesBessAnd the same for the e-book with Jean-Michel, which only appeared in late 2013. And contains more chapters than Introduction to Monte Carlo methods with R. Incidentally, a reader recently pointed out to me the availability of a pirated version of The Bayesian Choice on a Saudi (religious) university website. And of a pirated version of Introducing Monte Carlo with R on a Saõ Paulo (Brazil) university website. This may be alas inevitable, given the diffusion by publishers of e-chapters that can be copied with no limitations…

Bayesian computation: a summary of the current state, and samples backwards and forwards

Posted in Books, Statistics, University life with tags , , , , , , , , on June 25, 2015 by xi'an

“The Statistics and Computing journal gratefully acknowledges the contributions for this special issue, celebrating 25 years of publication. In the past 25 years, the journal has published innovative, distinguished research by leading scholars and professionals. Papers have been read by thousands of researchers world-wide, demonstrating the global importance of this field. The Statistics and Computing journal looks forward to many more years of exciting research as the field continues to expand.” Mark Girolami, Editor in Chief for The Statistics and Computing journal

Our joint [Peter Green, Krzysztof Łatuszyński, Marcelo Pereyra, and myself] review [open access!] on the important features of Bayesian computation has already appeared in the special 25th anniversary issue of Statistics & Computing! Along with the following papers

which means very good company, indeed! And happy B’day to Statistics & Computing!

Statistics month in Marseilles (CIRM)

Posted in Books, Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , on June 24, 2015 by xi'an

Calanque de Morgiou, Marseille, July 7, 2010Next February, the fabulous Centre International de Recherche en Mathématiques (CIRM) in Marseilles, France, will hold a Statistics month, with the following programme over five weeks

Each week will see minicourses of a few hours (2-3) and advanced talks, leaving time for interactions and collaborations. (I will give one of those minicourses on Bayesian foundations.) The scientific organisers of the B’ week are Gilles Celeux and Nicolas Chopin.

The CIRM is a wonderful meeting place, in the mountains between Marseilles and Cassis, with many trails to walk and run, and hundreds of fantastic climbing routes in the Calanques at all levels. (In February, the sea is too cold to contemplate swimming. The good side is that it is not too warm to climb and the risk of bush fire is very low!) We stayed there with Jean-Michel Marin a few years ago when preparing Bayesian Essentials. The maths and stats library is well-provided, with permanent access for quiet working sessions. This is the French version of the equally fantastic German Mathematik Forschungsinstitut Oberwolfach. There will be financial support available from the supporting societies and research bodies, at least for young participants and the costs if any are low, for excellent food and excellent lodging. Definitely not a scam conference!

arXiv frenzy

Posted in R, Statistics, University life with tags , , , , , , on June 23, 2015 by xi'an

In the few past days, there has been so many arXiv postings of interest—presumably the NIPS submission effect!—that I cannot hope to cover them in the coming weeks! Hopefully, some will still come out on the ‘Og in a near future:

  • arXiv:1506.06629: Scalable Approximations of Marginal Posteriors in Variable Selection by Willem van den Boom, Galen Reeves, David B. Dunson
  • arXiv:1506.06285: The MCMC split sampler: A block Gibbs sampling scheme for latent Gaussian models by Óli Páll Geirsson, Birgir Hrafnkelsson, Daniel Simpson, Helgi Sigurðarson [also deserves a special mention for gathering only ***son authors!]
  • arXiv:1506.06268: Bayesian Nonparametric Modeling of Higher Order Markov Chains by Abhra Sarkar, David B. Dunson
  • arXiv:1506.06117: Convergence of Sequential Quasi-Monte Carlo Smoothing Algorithms by Mathieu Gerber, Nicolas Chopin
  • arXiv:1506.06101: Robust Bayesian inference via coarsening by Jeffrey W. Miller, David B. Dunson
  • arXiv:1506.05934: Expectation Particle Belief Propagation by Thibaut Lienart, Yee Whye Teh, Arnaud Doucet
  • arXiv:1506.05860: Variational Gaussian Copula Inference by Shaobo Han, Xuejun Liao, David B. Dunson, Lawrence Carin
  • arXiv:1506.05855: The Frequentist Information Criterion (FIC): The unification of information-based and frequentist inference by Colin H. LaMont, Paul A. Wiggins
  • arXiv:1506.05757: Bayesian Inference for the Multivariate Extended-Skew Normal Distribution by Mathieu Gerber, Florian Pelgrin
  • arXiv:1506.05741: Accelerated dimension-independent adaptive Metropolis by Yuxin Chen, David Keyes, Kody J.H. Law, Hatem Ltaief
  • arXiv:1506.05269: Bayesian Survival Model based on Moment Characterization by Julyan Arbel, Antonio Lijoi, Bernardo Nipoti
  • arXiv:1506.04778: Fast sampling with Gaussian scale-mixture priors in high-dimensional regression by Anirban Bhattacharya, Antik Chakraborty, Bani K. Mallick
  • arXiv:1506.04416: Bayesian Dark Knowledge by Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling [a special mention for this title!]
  • arXiv:1506.03693: Optimization Monte Carlo: Efficient and Embarrassingly Parallel Likelihood-Free Inference by Edward Meeds, Max Welling
  • arXiv:1506.03074: Variational consensus Monte Carlo by Maxim Rabinovich, Elaine Angelino, Michael I. Jordan
  • arXiv:1506.02564: Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families by Heiko Strathmann, Dino Sejdinovic, Samuel Livingstone, Zoltan Szabo, Arthur Gretton [comments coming soon!]

ABC for big data

Posted in Books, Statistics, University life with tags , , , , , , , on June 23, 2015 by xi'an

abcpestou“The results in this paper suggest that ABC can scale to large data, at least for models with a xed number of parameters, under the assumption that the summary statistics obey a central limit theorem.”

In a week rich with arXiv submissions about MCMC and “big data”, like the Variational consensus Monte Carlo of Rabinovich et al., or scalable Bayesian inference via particle mirror descent by Dai et al., Wentao Li and Paul Fearnhead contributed an impressive paper entitled Behaviour of ABC for big data. However, a word of warning: the title is somewhat misleading in that the paper does not address the issue of big or tall data per se, e.g., the impossibility to handle the whole data at once and to reproduce it by simulation, but rather the asymptotics of ABC. The setting is not dissimilar to the earlier Fearnhead and Prangle (2012) Read Paper. The central theme of this theoretical paper [with 24 pages of proofs!] is to study the connection between the number N of Monte Carlo simulations and the tolerance value ε when the number of observations n goes to infinity. A main result in the paper is that the ABC posterior mean can have the same asymptotic distribution as the MLE when ε=o(n-1/4). This is however in opposition with of no direct use in practice as the second main result that the Monte Carlo variance is well-controlled only when ε=O(n-1/2). There is therefore a sort of contradiction in the conclusion, between the positive equivalence with the MLE and

Something I have (slight) trouble with is the construction of an importance sampling function of the fABC(s|θ)α when, obviously, this function cannot be used for simulation purposes. The authors point out this fact, but still build an argument about the optimal choice of α, namely away from 0 and 1, like ½. Actually, any value different from 0,1, is sensible, meaning that the range of acceptable importance functions is wide. Most interestingly (!), the paper constructs an iterative importance sampling ABC in a spirit similar to Beaumont et al. (2009) ABC-PMC. Even more interestingly, the ½ factor amounts to updating the scale of the proposal as twice the scale of the target, just as in PMC.

Another aspect of the analysis I do not catch is the reason for keeping the Monte Carlo sample size to a fixed value N, while setting a sequence of acceptance probabilities (or of tolerances) along iterations. This is a very surprising result in that the Monte Carlo error does remain under control and does not dominate the overall error!

“Whilst our theoretical results suggest that point estimates based on the ABC posterior have good properties, they do not suggest that the ABC posterior is a good approximation to the true posterior, nor that the ABC posterior will accurately quantify the uncertainty in estimates.”

Overall, this is clearly a paper worth reading for understanding the convergence issues related with ABC. With more theoretical support than the earlier Fearnhead and Prangle (2012). However, it does not provide guidance into the construction of a sequence of Monte Carlo samples nor does it discuss the selection of the summary statistic, which has obviously a major impact on the efficiency of the estimation. And to relate to the earlier warning, it does not cope with “big data” in that it reproduces the original simulation of the n sized sample.

Follow

Get every new post delivered to your Inbox.

Join 875 other followers