R brut

Posted in Kids, pictures, R, Statistics, University life with tags , , , on July 2, 2015 by xi'an

Introduction to Monte Carlo methods with R and Bayesian Essentials with R

Posted in Books, R, Statistics, University life with tags , , , , , , on June 26, 2015 by xi'an

Here are the  download figures for my e-book with George as sent to me last week by my publisher Springer-Verlag.  With an interesting surge in the past year. Maybe simply due to new selling strategies of the published rather to a wider interest in the book. (My royalties have certainly not increased!) Anyway thanks to all readers. As an aside for wordpress wannabe bloggers, I realised it is now almost impossible to write tables with WordPress, another illustration of the move towards small-device-supported blogs. Along with a new annoying “simpler” (or more accurately dumber) interface and a default font far too small for my eyesight. So I advise alternatives to wordpress that are more sympathetic to maths contents (e.g., using MathJax) and comfortable editing.

And the same for the e-book with Jean-Michel, which only appeared in late 2013. And contains more chapters than Introduction to Monte Carlo methods with R. Incidentally, a reader recently pointed out to me the availability of a pirated version of The Bayesian Choice on a Saudi (religious) university website. And of a pirated version of Introducing Monte Carlo with R on a Saõ Paulo (Brazil) university website. This may be alas inevitable, given the diffusion by publishers of e-chapters that can be copied with no limitations…

arXiv frenzy

Posted in R, Statistics, University life with tags , , , , , , on June 23, 2015 by xi'an

In the few past days, there has been so many arXiv postings of interest—presumably the NIPS submission effect!—that I cannot hope to cover them in the coming weeks! Hopefully, some will still come out on the ‘Og in a near future:

• Scalable Approximations of Marginal Posteriors in Variable Selection by Willem van den Boom, Galen Reeves, David B. Dunson
• The MCMC split sampler: A block Gibbs sampling scheme for latent Gaussian models by Óli Páll Geirsson, Birgir Hrafnkelsson, Daniel Simpson, Helgi Sigurðarson [also deserves a special mention for gathering only ***son authors!]
• Bayesian Nonparametric Modeling of Higher Order Markov Chains by Abhra Sarkar, David B. Dunson
• Convergence of Sequential Quasi-Monte Carlo Smoothing Algorithms by Mathieu Gerber, Nicolas Chopin
• Robust Bayesian inference via coarsening by Jeffrey W. Miller, David B. Dunson
• Expectation Particle Belief Propagation by Thibaut Lienart, Yee Whye Teh, Arnaud Doucet
• arXiv:1506.05860: Variational Gaussian Copula Inference by Shaobo Han, Xuejun Liao, David B. Dunson, Lawrence Carin
• arXiv:1506.05855: The Frequentist Information Criterion (FIC): The unification of information-based and frequentist inference by Colin H. LaMont, Paul A. Wiggins
• arXiv:1506.05757: Bayesian Inference for the Multivariate Extended-Skew Normal Distribution by Mathieu Gerber, Florian Pelgrin
• arXiv:1506.05741: Accelerated dimension-independent adaptive Metropolis by Yuxin Chen, David Keyes, Kody J.H. Law, Hatem Ltaief
• arXiv:1506.05269: Bayesian Survival Model based on Moment Characterization by Julyan Arbel, Antonio Lijoi, Bernardo Nipoti
• arXiv:1506.04778: Fast sampling with Gaussian scale-mixture priors in high-dimensional regression by Anirban Bhattacharya, Antik Chakraborty, Bani K. Mallick
• arXiv:1506.04416: Bayesian Dark Knowledge by Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling [a special mention for this title!]
• arXiv:1506.03693: Optimization Monte Carlo: Efficient and Embarrassingly Parallel Likelihood-Free Inference by Edward Meeds, Max Welling
• arXiv:1506.03074: Variational consensus Monte Carlo by Maxim Rabinovich, Elaine Angelino, Michael I. Jordan
• arXiv:1506.02564: Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families by Heiko Strathmann, Dino Sejdinovic, Samuel Livingstone, Zoltan Szabo, Arthur Gretton [comments coming soon!]

dynamic mixtures [at NBBC15]

Posted in R, Statistics with tags , , , , , , , , , , , , on June 18, 2015 by xi'an

A funny coincidence: as I was sitting next to Arnoldo Frigessi at the NBBC15 conference, I came upon a new question on Cross Validated about a dynamic mixture model he had developed in 2002 with Olga Haug and Håvård Rue [whom I also saw last week in Valencià]. The dynamic mixture model they proposed replaces the standard weights in the mixture with cumulative distribution functions, hence the term dynamic. Here is the version used in their paper (x>0)

$(1-w_{\mu,\tau}(x))f_{\beta,\lambda}(x)+w_{\mu,\tau}(x)g_{\epsilon,\sigma}(x)$

where f is a Weibull density, g a generalised Pareto density, and w is the cdf of a Cauchy distribution [all distributions being endowed with standard parameters]. While the above object is not a mixture of a generalised Pareto and of a Weibull distributions (instead, it is a mixture of two non-standard distributions with unknown weights), it is close to the Weibull when x is near zero and ends up with the Pareto tail (when x is large). The question was about simulating from this distribution and, while an answer was in the paper, I replied on Cross Validated with an alternative accept-reject proposal and with a somewhat (if mildly) non-standard MCMC implementation enjoying a much higher acceptance rate and the same fit.

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , on May 27, 2015 by xi'an

Another trip in the métro today (to work with Pierre Jacob and Lawrence Murray in a Paris Anticafé!, as the University was closed) led me to infer—warning!, this is not the exact distribution!—the distribution of x, namely

$f(x|N) = \frac{4^p}{4^{\ell+2p}} {\ell+p \choose p}\,\mathbb{I}_{N=\ell+2p}$

since a path x of length l(x) will corresponds to N draws if N-l(x) is an even integer 2p and p undistinguishable annihilations in 4 possible directions have to be distributed over l(x)+1 possible locations, with Feller’s number of distinguishable distributions as a result. With a prior π(N)=1/N on N, hence on p, the posterior on p is given by

$\pi(p|x) \propto 4^{-p} {\ell+p \choose p} \frac{1}{\ell+2p}$

Now, given N and  x, the probability of no annihilation on the last round is 1 when l(x)=N and in general

$\frac{4^p}{4^{\ell+2p}}{\ell-1+p \choose p}\big/\frac{4^p}{4^{\ell+2p}}{\ell+p \choose p}=\frac{\ell}{\ell+p}=\frac{2\ell}{N+\ell}$

which can be integrated against the posterior. The numerical expectation is represented for a range of values of l(x) in the above graph. Interestingly, the posterior probability is constant for l(x) large  and equal to 0.8125 under a flat prior over N.

Getting back to Pierre Druilhet’s approach, he sets a flat prior on the length of the path θ and from there derives that the probability of annihilation is about 3/4. However, “the uniform prior on the paths of lengths lower or equal to M” used for this derivation which gives a probability of length l proportional to 3l is quite different from the distribution of l(θ) given a number of draws N. Which as shown above looks much more like a Binomial B(N,1/2).

However, being not quite certain about the reasoning involving Fieller’s trick, I ran an ABC experiment under a flat prior restricted to (l(x),4l(x)) and got the above, where the histogram is for a posterior sample associated with l(x)=195 and the gold curve is the potential posterior. Since ABC is exact in this case (i.e., I only picked N’s for which l(x)=195), ABC is not to blame for the discrepancy! I asked about the distribution on Stack Exchange maths forum (and a few colleagues here as well) but got no reply so far… Here is the R code that goes with the ABC implementation:

#observation:
elo=195
#ABC version
T=1e6
el=rep(NA,T)
N=sample(elo:(4*elo),T,rep=TRUE)
for (t in 1:T){
#generate a path
paz=sample(c(-(1:2),1:2),N[t],rep=TRUE)
#eliminate U-turns
uturn=paz[-N[t]]==-paz[-1]
while (sum(uturn>0)){
uturn[-1]=uturn[-1]*(1-
uturn[-(length(paz)-1)])
uturn=c((1:(length(paz)-1))[uturn==1],
(2:length(paz))[uturn==1])
paz=paz[-uturn]
uturn=paz[-length(paz)]==-paz[-1]
}
el[t]=length(paz)}
#subsample to get exact posterior
poster=N[abs(el-elo)==0]


another viral math puzzle

Posted in Books, Kids, R, University life with tags , , , , , , , on May 25, 2015 by xi'an

After the Singapore Maths Olympiad birthday problem that went viral, here is a Vietnamese primary school puzzle that made the frontline in The Guardian. The question is: Fill the empty slots with all integers from 1 to 9 for the equality to hold. In other words, find a,b,c,d,e,f,g,h,i such that

a+13xb:c+d+12xef-11+gxh:i-10=66.

With presumably the operation ordering corresponding to

a+(13xb:c)+d+(12xe)f-11+(gxh:i)-10=66

although this is not specified in the question. Which amounts to

a+(13xb:c)+d+(12xe)f+(gxh:i)=87

and implies that c divides b and i divides gxh. Rather than pursing this analytical quest further, I resorted to R coding, checking by brute force whether or not a given sequence was working.

baoloc=function(ord=sample(1:9)){
if (ord[1]+(13*ord[2]/ord[3])+ord[4]+
12*ord[5]-ord[6]-11+(ord[7]*ord[8]/
ord[9])-10==66) return(ord)}


I then applied this function to all permutations of {1,…,9} [with the help of the perm(combinat) R function] and found the 128 distinct solutions. Including some for which b:c is not an integer. (Not of this obviously gives a hint as to how a 8-year old could solve the puzzle.)

As pointed out in a comment below, using the test == on scalars is a bad idea—once realising some fractions may be other than integers—and I should thus replace the equality with an alternative that bypasses divisions,

baoloc=function(ord=sample(1:9)){
return(((ord[1]+ord[4]+12*ord[5]-ord[6]-87)*
ord[3]*ord[9]+13*ord[2]*ord[9]+
ord[3]*ord[7]*ord[8]==0)*ord)}


leading to the overall R code

sol=NULL
perms=as.matrix(data.frame(permutations(9)),ncol=9,byrow=TRUE)
for (t in 1:factorial(9)){
a=baoloc(perms[t,])
if (a[1]>0) sol=rbind(sol,a)}
sol=sol[do.call(order, as.data.frame(sol)),]


and returning the 136 different solutions…

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on May 13, 2015 by xi'an

Pierre Druilhet arXived a note a few days ago about the Flatland paradox (due to Stone, 1976) and his arguments against the flat prior. The paradox in this highly artificial setting is as follows:  Consider a sequence θ of N independent draws from {a,b,1/a,1/b} such that

1. N and θ are unknown;
2. a draw followed by its inverse and this inverse are removed from θ;
3. the successor x of θ is observed, meaning an extra draw is made and the above rule applied.

Then the frequentist probability that x is longer than θ given θ is at least 3/4—at least because θ could be zero—while the posterior probability that x is longer than θ given x is 1/4 under the flat prior over θ. Paradox that 3/4 and 1/4 clash. Not so much of a paradox because there is no joint probability distribution over (x,θ).

The paradox was actually discussed at length in Larry Wasserman’s now defunct Normal Variate. From which I borrowed Larry’s graphical representation of the four possible values of θ given the (green) endpoint of x. Larry uses the Flatland paradox hammer to fix another nail on the coffin he contemplates for improper priors. And all things Bayes. Pierre (like others before him) argues against the flat prior on θ and shows that a flat prior on the length of θ leads to recover 3/4 as the posterior probability that x is longer than θ.

As I was reading the paper in the métro yesterday morning, I became less and less satisfied with the whole analysis of the problem in that I could not perceive θ as a parameter of the model. While this may sound a pedantic distinction, θ is a latent variable (or a random effect) associated with x in a model where the only unknown parameter is N, the total number of draws used to produce θ and x. The distributions of both θ and x are entirely determined by N. (In that sense, the flatland paradox can be seen as a marginalisation paradox in that an improper prior on N cannot be interpreted as projecting a prior on θ.) Given N, the distribution of x of length l(x) is then 1/4N times the number of ways of picking (N-l(x)) annihilation steps among N. Using a prior on N like 1/N , which is improper, then leads to favour the shortest path as well. (After discussing the issue with Pierre Druilhet, I realised he had a similar perspective on the issue. Except that he puts a flat prior on the length l(x).) Looking a wee bit further for references, I also found that Bruce Hill had adopted the same perspective of a prior on N.