**P**hew! I just finished my enormous pile of homeworks for the computational statistics course… This massive pile is due to an unexpected number of students registering for the Data Science Master at ENSAE and Paris-Dauphine. As I was not aware of this surge, I kept to my practice of asking students to hand back solved exercises from Monte Carlo Statistical Methods at the beginning of each class. And could not change the rules of the game once the course had started! Next year, I’ll make sure to get some backup for grading those exercises. Or go for group projects instead…

## Archive for Introducing Monte Carlo Methods with R

## done! [#2]

Posted in Kids, Statistics, University life with tags computational statistics, ENSAE, exercises, grading, homework, Introducing Monte Carlo Methods with R, MASH, Monte Carlo Statistical Methods, R, Université Paris Dauphine on January 21, 2016 by xi'an## Kamiltonian Monte Carlo [no typo]

Posted in Books, Statistics, University life with tags adaptive MCMC methods, Bayesian quadrature, Gatsby, Hamiltonian Monte Carlo, Introducing Monte Carlo Methods with R, London, Markov chain, non-parametric kernel estimation, reproducing kernel Hilbert space, RKHS, smoothness on June 29, 2015 by xi'an**H**eiko Strathmann, Dino Sejdinovic, Samuel Livingstone, Zoltán Szabó, and Arthur Gretton arXived a paper last week about Kamiltonian MCMC, the K being related with RKHS. (RKHS as in another KAMH paper for adaptive Metropolis-Hastings by essentially the same authors, plus Maria Lomeli and Christophe Andrieu. And another paper by some of the authors on density estimation via infinite exponential family models.) The goal here is to bypass the computation of the derivatives in the moves of the Hamiltonian MCMC algorithm by using a kernel surrogate. While the genuine RKHS approach operates within an infinite exponential family model, two versions are proposed, KMC lite with an increasing sequence of RKHS subspaces, and KMC finite, with a finite dimensional space. In practice, this means using a leapfrog integrator with a different potential function, hence with a different dynamics.

The estimation of the infinite exponential family model is somewhat of an issue, as it is estimated from the past history of the Markov chain, simplified into a random subsample from this history [presumably without replacement, meaning the Markovian structure is lost on the subsample]. This is puzzling because there is dependence on the whole past, which cancels ergodicity guarantees… For instance, we gave an illustration in Introducing Monte Carlo Methods with R [Chapter 8] of the poor impact of approximating the target by non-parametric kernel estimates. I would thus lean towards the requirement of a secondary Markov chain to build this kernel estimate. The authors are obviously aware of this difficulty and advocate an attenuation scheme. There is also the issue of the cost of a kernel estimate, in O(n³) for a subsample of size n. If, instead, a fixed dimension m for the RKHS is selected, the cost is in O(tm²+m³), with the advantage of a feasible on-line update, making it an O(m³) cost in fine. But again the worry of using the whole past of the Markov chain to set its future path…

Among the experiments, a KMC for ABC that follows the recent proposal of Hamiltonian ABC by Meeds et al. The arguments are interesting albeit sketchy: KMC-ABC does not require simulations at each leapfrog step, is it because the kernel approximation does not get updated at each step? Puzzling.

I also discussed the paper with Michael Betancourt (Warwick) and here his comments:

“I’m hesitant for the same reason I’ve been hesitant about algorithms like Bayesian quadrature and GP emulators in general. Outside of a few dimensions I’m not convinced that GP priors have enough regularization to really specify the interpolation between the available samples, so any algorithm that uses a single interpolation will be fundamentally limited (as I believe is born out in non-trivial scaling examples) and trying to marginalize over interpolations will be too awkward.

They’re really using kernel methods to model the target density which then gives the gradient analytically. RKHS/kernel methods/ Gaussian processes are all the same math — they’re putting prior measures over functions. My hesitancy is thatthese measures are at once more diffuse than people think (there are lots of functions satisfying a given smoothness criterion) and more rigid than people think (perturb any of the smoothness hyper-parameters and you get an entirely new space of functions).

When using these methods as an emulator you have to set the values of the hyper-parameters which locks in a very singulardefinition of smoothness and neglects all others. But even within this singular definition there are a huge number of possible functions. So when you only have a few points to constrain the emulation surface, how accurate can you expect the emulator to be between the points?

In most cases where the gradient is unavailable it’s either because (a) people are using decades-old Fortran black boxes that no one understands, in which case there are bigger problems than trying to improve statistical methods or (b) there’s a marginalization, in which case the gradients are given by integrals which can be approximated with more MCMC. Lots of options.”

## a vignette on Metropolis

Posted in Books, Kids, R, Statistics, Travel, University life with tags Columbia University, Introducing Monte Carlo Methods with R, Metropolis-Hastings algorithm, mixture, New York city, testing as mixture estimation, vignette on April 13, 2015 by xi'an**O**ver the past week, I wrote a short introduction to the Metropolis-Hastings algorithm, mostly in the style of our Introduction to Monte Carlo with R book, that is, with very little theory and worked-out illustrations on simple examples. (And partly over the Atlantic on my flight to New York and Columbia.) This vignette is intended for the Wiley StatsRef: Statistics Reference Online Series, modulo possible revision. Again, nothing novel therein, except for new examples.

## an email exchange about integral representations

Posted in Books, R, Statistics, University life with tags accept-reject algorithm, George Casella, Introducing Monte Carlo Methods with R, Lebesgue integration, Riemann integration on April 8, 2015 by xi'an**I** had an interesting email exchange [or rather exchange of emails] with a (German) reader of Introducing Monte Carlo Methods with R in the past days, as he had difficulties with the validation of the accept-reject algorithm via the integral

in that it took me several iterations [as shown in the above] to realise the issue was with the notation

which seemed to be missing a density term or, in other words, be different from

What is surprising for me is that the integral

has a clear meaning as a Riemann integral, hence should be more intuitive….

## another R new trick [new for me!]

Posted in Books, Kids, R, Statistics, University life with tags C code, importance sampling, Introducing Monte Carlo Methods with R, kolmim, Kolmogorov-Smirnov distance, R, stackoverflow, Université Paris Dauphine on July 16, 2014 by xi'an**W**hile working with Andrew and a student from Dauphine on importance sampling, we wanted to assess the distribution of the resulting sample via the Kolmogorov-Smirnov measure

where F is the target. This distance (times √n) has an asymptotic distribution that does not depend on n, called the Kolmogorov distribution. After searching for a little while, we could not figure where this distribution was available in R. It had to, since ks.test was returning a p-value. Hopefully correct! So I looked into the ks.test function, which happens not to be entirely programmed in C, and found the line

PVAL <- 1 - if (alternative == "two.sided") .Call(C_pKolmogorov2x, STATISTIC, n)

which means that the Kolmogorov distribution is coded as a C function C_pKolmogorov2x in R. However, I could not call the function myself.

> .Call(C_pKolmogorov2x,.3,4) Error: object 'C_pKolmogorov2x' not found

Hence, as I did not want to recode this distribution cdf, I posted the question on stackoverflow (long time no see!) and got a reply almost immediately as to use the package kolmim. Followed by the extra comment from the same person that calling the C code only required to add the path to its name, as in

> .Call(stats:::C_pKolmogorov2x,STAT=.3,n=4) [1] 0.2292

## Ｒによるモンテカルロ法入門

Posted in Books, R, Statistics with tags George Casella, Introducing Monte Carlo Methods with R, Japanese translation on May 14, 2013 by xi'an**H**ere is the cover of the Japanese translation of our Introducing Monte Carlo methods with R book. A few year after the French translation. It actually appeared last year in August but I was not informed of this till a few weeks ago. The publisher is Maruzen, with an associated webpage if you want to order… Unless I am confused the translators are Hiro Ishida and Kazue Ishida; they deserve a major ありがとう ! And too bad George is no longer with us: this must have been the first translation of one of his books in Japanese..

## CHANCE: special issue on George Casella’s books

Posted in Books, R, Statistics, University life with tags CHANCE, George Casella, Introducing Monte Carlo Methods with R, Monte Carlo Statistical Methods, Sam Behesta, statistical inference, Theory of Point Estimation, Variance Components on February 10, 2013 by xi'an **T**he special issue of CHANCE on George Casella’s books has now appeared and it contains both my earlier post on George passing away and reviews of several of his books, as follows:

- Andrew Gelman on Introducing Monte Carlo Methods with R
- Bill Strawderman on Statistical Inference
- Jean-Louis Foulley on Variance Components
- Larry Wasserman on Theory of Point Estimation
- Xiao-Li Meng on Monte Carlo Statistical Methods

Although all of those books have appeared between twenty and five years ago, the reviews are definitely worth reading! *(Disclaimer: I am the editor of the Books Review section who contacted friends of George to write the reviews, as well as the co-author of two of those books!)* They bring in my *(therefore biased)* opinion a worthy evaluation of the depths and impacts of those major books, and they also reveal why George was a great teacher, bringing much into the classroom and to his students… *(Unless I am confused the whole series of reviews is available to all, and not only to CHANCE subscribers. Thanks, Sam!)*