## the most patronizing start to an answer I have ever received

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on April 30, 2015 by xi'an

Another occurrence [out of many!] of a question on X validated where the originator (primitivus petitor) was trying to get an explanation without the proper background. On either Bayesian statistics or simulation. The introductory sentence to the question was about “trying to understand how the choice of priors affects a Bayesian model estimated using MCMC” but the bulk of the question was in fact failing to understand an R code for a random-walk Metropolis-Hastings algorithm for a simple regression model provided in a introductory blog by Florian Hartig. And even more precisely about confusing the R code dnorm(b, sd = 5, log = T) in the prior with rnorm(1,mean=b, sd = 5, log = T) in the proposal…

“You should definitely invest some time in learning the bases of Bayesian statistics and MCMC methods from textbooks or on-line courses.” X

So I started my answer with the above warning. Which sums up my feelings about many of those X validated questions, namely that primitivi petitores lack the most basic background to consider such questions. Obviously, I should not have bothered with an answer, but it was late at night after a long day, a good meal at the pub in Kenilworth, and a broken toe still bothering me. So I got this reply from the primitivus petitor that it was a patronizing piece of advice and he prefers to learn from R code than from textbooks and on-line courses, having “looked through a number of textbooks”. Good luck with this endeavour then!

## scale acceleration

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , , on April 24, 2015 by xi'an

Kate Lee pointed me to a rather surprising inefficiency in matlab, exploited in Sylvia Früwirth-Schnatter’s bayesf package: running a gamma simulation by rgamma(n,a,b) takes longer and sometimes much longer than rgamma(n,a,1)/b, the latter taking advantage of the scale nature of b. I wanted to check on my own whether or not R faced the same difficulty, so I ran an experiment [while stuck in a Thalys train at Brussels, between Amsterdam and Paris…] Using different values for a [click on the graph] and a range of values of b. To no visible difference between both implementations, at least when using system.time for checking.

a=seq(.1,4,le=25)
for (t in 1:25) a[t]=system.time(
rgamma(10^7,.3,a[t]))[3]
a=a/system.time(rgamma(10^7,.3,1))[3]


Once arrived home, I wondered about the relevance of the above comparison, since rgamma(10^7,.3,1) forces R to use 1 as a scale, which may differ from using rgamma(10^7,.3), where 1 is known to be the scale [does this sentence make sense?!]. So I rerun an even bigger experiment as

a=seq(.1,4,le=25)
for (t in 1:25) a[t]=system.time(
rgamma(10^8,.3,a[t]))[3]
a=a/system.time(rgamma(10^8,.3))[3]


and got the graph below. Which is much more interesting because it shows that some values of a are leading to a loss of efficiency of 50%. Indeed. (The most extreme cases correspond to a=0.3, 1.1., 5.8. No clear pattern emerging.)Update

As pointed out by Martyn Plummer in his comment, the C function behind the R rgamma function and Gamma generator does take into account the scale nature of the second parameter, so the above time differences are not due to this function but rather to whatever my computer was running at the same time…! Apologies to anyone I scared with this void warning!

## simulating correlated Binomials [another Bernoulli factory]

Posted in Books, Kids, pictures, R, Running, Statistics, University life with tags , , , , , , , on April 21, 2015 by xi'an

This early morning, just before going out for my daily run around The Parc, I checked X validated for new questions and came upon that one. Namely, how to simulate X a Bin(8,2/3) variate and Y a Bin(18,2/3) such that corr(X,Y)=0.5. (No reason or motivation provided for this constraint.) And I thought the following (presumably well-known) resolution, namely to break the two binomials as sums of 8 and 18 Bernoulli variates, respectively, and to use some of those Bernoulli variates as being common to both sums. For this specific set of values (8,18,0.5), since 8×18=12², the solution is 0.5×12=6 common variates. (The probability of success does not matter.) While running, I first thought this was a very artificial problem because of this occurrence of 8×18 being a perfect square, 12², and cor(X,Y)x12 an integer. A wee bit later I realised that all positive values of cor(X,Y) could be achieved by randomisation, i.e., by deciding the identity of a Bernoulli variate in X with a Bernoulli variate in Y with a certain probability ϖ. For negative correlations, one can use the (U,1-U) trick, namely to write both Bernoulli variates as

$X_1=\mathbb{I}(U\le p)\quad Y_1=\mathbb{I}(U\ge 1-p)$

in order to minimise the probability they coincide.

I also checked this result with an R simulation

> z=rbinom(10^8,6,.66)
> y=z+rbinom(10^8,12,.66)
> x=z+rbinom(10^8,2,.66)
cor(x,y)
> cor(x,y)
[1] 0.5000539


Searching on Google gave me immediately a link to Stack Overflow with an earlier solution with the same idea. And a smarter R code.

## reis naar Amsterdam

Posted in Books, Kids, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , on April 16, 2015 by xi'an

On Monday, I went to Amsterdam to give a seminar at the University of Amsterdam, in the department of psychology. And to visit Eric-Jan Wagenmakers and his group there. And I had a fantastic time! I talked about our mixture proposal for Bayesian testing and model choice without getting hostile or adverse reactions from the audience, quite the opposite as we later discussed this new notion for several hours in the café across the street. I also had the opportunity to meet with Peter Grünwald [who authored a book on the minimum description length principle] pointed out a minor inconsistency of the common parameter approach, namely that the Jeffreys prior on the first model did not have to coincide with the Jeffreys prior on the second model. (The Jeffreys prior for the mixture being unavailable.) He also wondered about a more conservative property of the approach, compared with the Bayes factor, in the sense that the non-null parameter could get closer to the null-parameter while still being identifiable.

Among the many persons I met in the department, Maarten Marsman talked to me about his thesis research, Plausible values in statistical inference, which involved handling the Ising model [a non-sparse Ising model with O(p²) parameters] by an auxiliary representation due to Marc Kac and getting rid of the normalising (partition) constant by the way. (Warning, some approximations involved!) And who showed me a simple probit example of the Gibbs sampler getting stuck as the sample size n grows. Simply because the uniform conditional distribution on the parameter concentrates faster (in 1/n) than the posterior (in 1/√n). This does not come as a complete surprise as data augmentation operates in an n-dimensional space. Hence it requires more time to get around. As a side remark [still worth printing!], Maarten dedicated his thesis as “To my favourite random variables , Siem en Fem, and to my normalizing constant, Esther”, from which I hope you can spot the influence of at least two of my book dedications! As I left Amsterdam on Tuesday, I had time for a enjoyable dinner with E-J’s group, an equally enjoyable early morning run [with perfect skies for sunrise pictures!], and more discussions in the department. Including a presentation of the new (delicious?!) Bayesian software developed there, JASP, which aims at non-specialists [i.e., researchers unable to code in R, BUGS, or, God forbid!, STAN] And about the consequences of mixture testing in some psychological experiments. Once again, a fantastic time discussing Bayesian statistics and their applications, with a group of dedicated and enthusiastic Bayesians!

## Le Monde puzzle [#905]

Posted in Books, Kids, R, Statistics, University life with tags , , , on April 1, 2015 by xi'an

A recursive programming  Le Monde mathematical puzzle:

Given n tokens with 10≤n≤25, Alice and Bob play the following game: the first player draws an integer1≤m≤6 at random. This player can then take 1≤r≤min(2m,n) tokens. The next player is then free to take 1≤s≤min(2r,n-r) tokens. The player taking the last tokens is the winner. There is a winning strategy for Alice if she starts with m=3 and if Bob starts with m=2. Deduce the value of n.

Although I first wrote a brute force version of the following code, a moderate amount of thinking leads to conclude that the person given n remaining token and an adversary choice of m tokens such that 2m≥n always win by taking the n remaining tokens:

optim=function(n,m){

outcome=(n<2*m+1)
if (n>2*m){
for (i in 1:(2*m))
outcome=max(outcome,1-optim(n-i,i))
}
return(outcome)
}


eliminating solutions which dividers are not solutions themselves:

sol=lowa=plura[plura<100]
for (i in 3:6){
sli=plura[(plura>10^(i-1))&(plura<10^i)]
ace=sli-10^(i-1)*(sli%/%10^(i-1))
lowa=sli[apply(outer(ace,lowa,FUN="=="),
1,max)==1]
lowa=sort(unique(lowa))
sol=c(sol,lowa)}


> subs=rep(0,16)
> for (n in 10:25) subs[n-9]=optim(n,3)
> for (n in 10:25) if (subs[n-9]==1) subs[n-9]=1-optim(n,2)
> subs
[1] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
> (10:25)[subs==1]
[1] 18


Ergo, the number of tokens is 18!

## Le Monde puzzle [#902]

Posted in Books, Kids, Statistics, University life with tags , , , , , , on March 8, 2015 by xi'an

Another arithmetics Le Monde mathematical puzzle:

From the set of the integers between 1 and 15, is it possible to partition it in such a way that the product of the terms in the first set is equal to the sum of the members of the second set? can this be generalised to an arbitrary set {1,2,..,n}? What happens if instead we only consider the odd integers in those sets?.

I used brute force by looking at random for a solution,

pb <- txtProgressBar(min = 0, max = 100, style = 3)
for (N in 5:100){
sol=FALSE
while (!sol){
k=sample(1:N,1,prob=(1:N)*(N-(1:N)))
pro=sample(1:N,k)
sol=(prod(pro)==sum((1:N)[-pro]))
}
setTxtProgressBar(pb, N)}
close(pb)


and while it took a while to run the R code, it eventually got out of the loop, meaning there was at least one solution for all n’s between 5 and 100. (It does not work for n=1,2,3,4, for obvious reasons.) For instance, when n=15, the integers in the product part are either 3,5,7, 1,7,14, or 1,9,11. Jean-Louis Fouley sent me an explanation:  when n is odd, n=2p+1, one solution is (1,p,2p), while when n is even, n=2p, one solution is (1,p-1,2p).

A side remark on the R code: thanks to a Cross Validated question by Paulo Marques, on which I thought I had commented on this blog, I learned about the progress bar function in R, setTxtProgressBar(), which makes running R code with loops much nicer!

For the second question, I just adapted the R code to exclude even integers:

while (!sol){
k=1+trunc(sample(1:N,1)/2)
pro=sample(seq(1,N,by=2),k)
cum=(1:N)[-pro]
sol=(prod(pro)==sum(cum[cum%%2==1]))
}


and found a solution for n=15, namely 1,3,15 versus 5,7,9,11,13. However, there does not seem to be a solution for all n’s: I found solutions for n=15,21,23,31,39,41,47,49,55,59,63,71,75,79,87,95…

## amazing Gibbs sampler

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , on February 19, 2015 by xi'an

When playing with Peter Rossi’s bayesm R package during a visit of Jean-Michel Marin to Paris, last week, we came up with the above Gibbs outcome. The setting is a Gaussian mixture model with three components in dimension 5 and the prior distributions are standard conjugate. In this case, with 500 observations and 5000 Gibbs iterations, the Markov chain (for one component of one mean of the mixture) has two highly distinct regimes: one that revolves around the true value of the parameter, 2.5, and one that explores a much broader area (which is associated with a much smaller value of the component weight). What we found amazing is the Gibbs ability to entertain both regimes, simultaneously.