## scale acceleration

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , , on April 24, 2015 by xi'an

Kate Lee pointed me to a rather surprising inefficiency in matlab, exploited in Sylvia Früwirth-Schnatter’s bayesf package: running a gamma simulation by rgamma(n,a,b) takes longer and sometimes much longer than rgamma(n,a,1)/b, the latter taking advantage of the scale nature of b. I wanted to check on my own whether or not R faced the same difficulty, so I ran an experiment [while stuck in a Thalys train at Brussels, between Amsterdam and Paris…] Using different values for a [click on the graph] and a range of values of b. To no visible difference between both implementations, at least when using system.time for checking.

a=seq(.1,4,le=25)
for (t in 1:25) a[t]=system.time(
rgamma(10^7,.3,a[t]))[3]
a=a/system.time(rgamma(10^7,.3,1))[3]


Once arrived home, I wondered about the relevance of the above comparison, since rgamma(10^7,.3,1) forces R to use 1 as a scale, which may differ from using rgamma(10^7,.3), where 1 is known to be the scale [does this sentence make sense?!]. So I rerun an even bigger experiment as

a=seq(.1,4,le=25)
for (t in 1:25) a[t]=system.time(
rgamma(10^8,.3,a[t]))[3]
a=a/system.time(rgamma(10^7,.3))[3]


and got the graph below. Which is much more interesting because it shows that some values of a are leading to a loss of efficiency of 50%. Indeed. (The most extreme cases correspond to a=0.3, 1.1., 5.8. No clear pattern emerging.)Update

As pointed out by Martyn Plummer in his comment, the C function behind the R rgamma function and Gamma generator does take into account the scale nature of the second parameter, so the above time differences are not due to this function but rather to whatever my computer was running at the same time…! Apologies to anyone I scared with this void warning!

## simulating correlated Binomials [another Bernoulli factory]

Posted in Books, Kids, pictures, R, Running, Statistics, University life with tags , , , , , , , on April 21, 2015 by xi'an

This early morning, just before going out for my daily run around The Parc, I checked X validated for new questions and came upon that one. Namely, how to simulate X a Bin(8,2/3) variate and Y a Bin(18,2/3) such that corr(X,Y)=0.5. (No reason or motivation provided for this constraint.) And I thought the following (presumably well-known) resolution, namely to break the two binomials as sums of 8 and 18 Bernoulli variates, respectively, and to use some of those Bernoulli variates as being common to both sums. For this specific set of values (8,18,0.5), since 8×18=12², the solution is 0.5×12=6 common variates. (The probability of success does not matter.) While running, I first thought this was a very artificial problem because of this occurrence of 8×18 being a perfect square, 12², and cor(X,Y)x12 an integer. A wee bit later I realised that all positive values of cor(X,Y) could be achieved by randomisation, i.e., by deciding the identity of a Bernoulli variate in X with a Bernoulli variate in Y with a certain probability ϖ. For negative correlations, one can use the (U,1-U) trick, namely to write both Bernoulli variates as

$X_1=\mathbb{I}(U\le p)\quad Y_1=\mathbb{I}(U\ge 1-p)$

in order to minimise the probability they coincide.

I also checked this result with an R simulation

> z=rbinom(10^8,6,.66)
> y=z+rbinom(10^8,12,.66)
> x=z+rbinom(10^8,2,.66)
cor(x,y)
> cor(x,y)
[1] 0.5000539


Searching on Google gave me immediately a link to Stack Overflow with an earlier solution with the same idea. And a smarter R code.

## Bernoulli, Montmort and Waldegrave

Posted in Books, Kids, R, Statistics on April 15, 2015 by xi'an

In the last issue of Statistical Science, David Belhouse [author of De Moivre’s biography]  and Nicolas Fillion published an accounting of a discussion between Pierre Rémond de Montmort, Nicolaus Bernoulli—”the” Bernoulli associated with the St. Petersburg paradox—, and Francis Waldegrave, about the card game of Le Her (or Hère, for wretch). Here is the abridged description from the paper:

“Le Her is a game (…) played with a standard deck of fifty-two playing cards. The simplest situation is when two players [Pierre and Paul] play the game, and the solution is not simply determined  even in that situation (…) Pierre deals a card from the deck to Paul and then one to himself. Paul has the option of switching his card for Pierre’s card. Pierre can only refuse the switch if he holds a king (the highest valued card). After Paul makes his decision to hold or switch, Pierre now has the option to hold whatever card he now has or to switch it with a card drawn from the deck. However, if he draws a king, he must retain his original card. The player with the highest card wins the pot, with ties going to the dealer Pierre (…) What are the chances of each player (…) ?” (p.2)

As the paper focus on the various and conflicting resolutions by those 18th Century probabilists, reaching the solution [for Paul to win]

$\dfrac{2828ac+2834bc+2838ad+2828bd}{13\cdot 17\cdot 25 \cdot(a+b+c+d)}$

“where a is Paul’s probability of switching with seven, b is Paul’s probability of holding the seven, c is Pierre’s probability of switching with an eight, and d is Pierre’s probability of holding on to an eight”

[which sounds amazing for the time, circa 1713!], where I do not see how a+b or c+d are different from 1,  I ran a small R code to check the probability that Paul wins if he switches when there are more larger than smaller values in the remaining cards and Pierre adopts the same strategy if Paul did not switch:

cards=rep(1:13,4)
win=0
T=10^6
for (t in 1:T){
deal=sample(cards,2)
#Alice has deal[1]
switch=0
rest=cards[-deal[1]]
if ((deal[2]<13)&amp;(sum(rest<=deal[1])<sum(rest>=deal[1]))){
switch=deal[2];deal[2]=deal[1];deal[1]=switch}
#Bob's turn
if (switch>0){
rest=cards[-deal]
if (deal[2]<deal[1]){ #sure loss worse than random one
draw=sample(rest,1)
if (draw<13) deal[2]=draw}
}else{
rest=cards[-deal[2]]
if (sum(rest<=deal[2])<sum(rest>=deal[2])){
draw=sample(rest,1)
if (draw<13) deal[2]=draw}}
win=win+(deal[2]>=deal[1])
}
1-win/T


Returning a winning probability of 0.5128 [at the first try] for Paul. However, this is not the optimal strategy for either Paul or Pierre, since randomisation for card values of 7 and 8 push Paul’s odds slightly higher!

## failures and uses of Jaynes’ principle of transformation groups

Posted in Books, Kids, R, Statistics, University life with tags , , , , on April 14, 2015 by xi'an

This paper by Alon Drory was arXived last week when I was at Columbia. It reassesses Jaynes’ resolution of Bertrand’s paradox, which finds three different probabilities for a given geometric event depending on the underlying σ-algebra (or definition of randomness!). Both Poincaré and Jaynes argued against Bertrand that there was only one acceptable solution under symmetry properties. The author of this paper, Alon Drory, argues this is not the case!

“…contrary to Jaynes’ assertion, each of the classical three solutions of Bertrand’s problem (and additional ones as well!) can be derived by the principle of transformation groups, using the exact same symmetries, namely rotational, scaling and translational invariance.”

Drory rephrases as follows:  “In a circle, select at random a chord that is not a diameter. What is the probability that its length is greater than the side of the equilateral triangle inscribed in the circle?”.  Jaynes’ solution is indifferent to the orientation of one observer wrt the circle, to the radius of the circle, and to the location of the centre. The later is the one most discussed by Drory, as he argued that it does not involve an observer but the random experiment itself and relies on a specific version of straw throws in Jaynes’ argument. Meaning other versions are also available. This reminded me of an earlier post on Buffon’s needle and on the different versions of the needle being thrown over the floor. Therein reflecting on the connection with Bertrand’s paradox. And running some further R experiments. Drory’s alternative to Jaynes’ manner of throwing straws is to impale them on darts and throw the darts first! (Which is the same as one of my needle solutions.)

“…the principle of transformation groups does not make the problem well-posed, and well-posing strategies that rely on such symmetry considerations ought therefore to be rejected.”

In short, the conclusion of the paper is that there is an indeterminacy in Bertrand’s problem that allows several resolutions under the principle of indifference that end up with a large range of probabilities, thus siding with Bertrand rather than Jaynes.

## a vignette on Metropolis

Posted in Books, Kids, R, Statistics, Travel, University life with tags , , , , , , on April 13, 2015 by xi'an

Over the past week, I wrote a short introduction to the Metropolis-Hastings algorithm, mostly in the style of our Introduction to Monte Carlo with R book, that is, with very little theory and worked-out illustrations on simple examples. (And partly over the Atlantic on my flight to New York and Columbia.) This vignette is intended for the Wiley StatsRef: Statistics Reference Online Series, modulo possible revision. Again, nothing novel therein, except for new examples.

## an email exchange about integral representations

Posted in Books, R, Statistics, University life with tags , , , , on April 8, 2015 by xi'an

I had an interesting email exchange [or rather exchange of emails] with a (German) reader of Introducing Monte Carlo Methods with R in the past days, as he had difficulties with the validation of the accept-reject algorithm via the integral

$\mathbb{P}(Y\in \mathcal{A},U\le f(Y)/Mg(Y)) = \int_\mathcal{A} \int_0^{f(y)/Mg(y)}\,\text{d}u\,g(y)\,\text{d}y\,,$

in that it took me several iterations [as shown in the above] to realise the issue was with the notation

$\int_0^a \,\text{d}u\,,$

which seemed to be missing a density term or, in other words, be different from

$\int_0^1 \,\mathbb{I}_{(0,a)}(u)\,\text{d}u\,,$

What is surprising for me is that the integral

$\int_0^a \,\text{d}u$

has a clear meaning as a Riemann integral, hence should be more intuitive….

## scalable Bayesian inference for the inverse temperature of a hidden Potts model

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on April 7, 2015 by xi'an

Matt Moores, Tony Pettitt, and Kerrie Mengersen arXived a paper yesterday comparing different computational approaches to the processing of hidden Potts models and of the intractable normalising constant in the Potts model. This is a very interesting paper, first because it provides a comprehensive survey of the main methods used in handling this annoying normalising constant Z(β), namely pseudo-likelihood, the exchange algorithm, path sampling (a.k.a., thermal integration), and ABC. A massive simulation experiment with individual simulation times up to 400 hours leads to select path sampling (what else?!) as the (XL) method of choice. Thanks to a pre-computation of the expectation of the sufficient statistic E[S(Z)|β].  I just wonder why the same was not done for ABC, as in the recent Statistics and Computing paper we wrote with Matt and Kerrie. As it happens, I was actually discussing yesterday in Columbia of potential if huge improvements in processing Ising and Potts models by approximating first the distribution of S(X) for some or all β before launching ABC or the exchange algorithm. (In fact, this is a more generic desiderata for all ABC methods that simulating directly if approximately the summary statistics would being huge gains in computing time, thus possible in final precision.) Simulating the distribution of the summary and sufficient Potts statistic S(X) reduces to simulating this distribution with a null correlation, as exploited in Cucala and Marin (2013, JCGS, Special ICMS issue). However, there does not seem to be an efficient way to do so, i.e. without reverting to simulating the entire grid X…