Archive for the Books Category

simulating correlated Binomials [another Bernoulli factory]

Posted in Books, Kids, pictures, R, Running, Statistics, University life with tags , , , , , , , on April 21, 2015 by xi'an

This early morning, just before going out for my daily run around The Parc, I checked X validated for new questions and came upon that one. Namely, how to simulate X a Bin(8,2/3) variate and Y a Bin(18,2/3) such that corr(X,Y)=0.5. (No reason or motivation provided for this constraint.) And I thought the following (presumably well-known) resolution, namely to break the two binomials as sums of 8 and 18 Bernoulli variates, respectively, and to use some of those Bernoulli variates as being common to both sums. For this specific set of values (8,18,0.5), since 8×18=12², the solution is 0.5×12=6 common variates. (The probability of success does not matter.) While running, I first thought this was a very artificial problem because of this occurrence of 8×18 being a perfect square, 12², and cor(X,Y)x12 an integer. A wee bit later I realised that all positive values of cor(X,Y) could be achieved by randomisation, i.e., by deciding the identity of a Bernoulli variate in X with a Bernoulli variate in Y with a certain probability ϖ. For negative correlations, one can use the (U,1-U) trick, namely to write both Bernoulli variates as

X_1=\mathbb{I}(U\le p)\quad Y_1=\mathbb{I}(U\ge 1-p)

in order to minimise the probability they coincide.

I also checked this result with an R simulation

> z=rbinom(10^8,6,.66)
> y=z+rbinom(10^8,12,.66)
> x=z+rbinom(10^8,2,.66)
cor(x,y)
> cor(x,y)
[1] 0.5000539

Searching on Google gave me immediately a link to Stack Overflow with an earlier solution with the same idea. And a smarter R code.

Bayesian propaganda?

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on April 20, 2015 by xi'an

“The question is about frequentist approach. Bayesian is admissable [sic] only by wrong definition as it starts with the assumption that the prior is the correct pre-information. James-Stein beats OLS without assumptions. If there is an admissable [sic] frequentist estimator then it will correspond to a true objective prior.”

I had a wee bit of a (minor, very minor!) communication problem on X validated, about a question on the existence of admissible estimators of the linear regression coefficient in multiple dimensions, under squared error loss. When I first replied that all Bayes estimators with finite risk were de facto admissible, I got the above reply, which clearly misses the point, and as I had edited the OP question to include more tags, the edited version was reverted with a comment about Bayesian propaganda! This is rather funny, if not hilarious, as (a) Bayes estimators are indeed admissible in the classical or frequentist sense—I actually fail to see a definition of admissibility in the Bayesian sense—and (b) the complete class theorems of Wald, Stein, and others (like Jack Kiefer, Larry Brown, and Jim Berger) come from the frequentist quest for best estimator(s). To make my point clearer, I also reproduced in my answer the Stein’s necessary and sufficient condition for admissibility from my book but it did not help, as the theorem was “too complex for [the OP] to understand”, which shows in fine the point of reading textbooks!

the luminaries [book review]

Posted in Books, Kids, Mountains, Travel with tags , , , , , , , , on April 18, 2015 by xi'an

I bought this book by Eleanor Catton on my trip to Pittsburgh and Toronto in 2013 (thanks to Amazon associates’ gains!), mostly by chance (and also because it was the most recent Man Booker Prize). After a few sleepless nights last week (when I should not have been suffering from New York jet lag!, given my sleeping pattern when abroad), I went through this rather intellectual and somewhat contrived mystery. To keep with tradition (!), the cover was puzzling me until I realised those were phases of the moon, in line with [spoiler!] the zodiacal underlying pattern of the novel, pattern I did not even try to follow for it sounded so artificial. And presumably restricted the flow of the story by imposing further constraints on the characters’ interactions.

The novel has redeeming features, even though I am rather bemused at it getting a Man Booker Prize. (When compared with, say, The Remains of the Day…) For one thing, while a gold rush story of the 1860’s, it takes place on the South Island of New Zealand instead of Klondike, around the Hokitika gold-field, on the West Coast, with mentions of places that brings memory of our summer (well, winter!) visit to Christchurch in 2006… The mix of cultures between English settlers, Maoris, and Chinese migrants, is well-documented and information, if rather heavy at times, bordering on the info-dump, and a central character like the Maori Te Rau Tauwhare sounds caricaturesque. The fact that the story takes place in Victorian times call Dickens to mind, but I find very little connection in either style or structure, nor with Victorian contemporaries like Wilkie Collins, and Victorian pastiches like Charles Palliser‘s Quincunx…. Nothing of the sanctimonious and moral elevation and subtle irony one could expect from a Victorian novel!

While a murder mystery, the plot is fairly upside down (or down under?!): the (spoiler!) assumed victim is missing for most of the novel, the (spoiler!) extracted gold is not apparently stolen but rather lacks owner(s), and the most moral character of the story ends up being the local prostitute. The central notion of the twelve men in a council each bringing a new light on the disappearance of Emery Staines is a neat if not that innovative literary trick but twelve is a large number which means following many threads, some being dead-ends, to gather an appearance of a view on the whole story. As in Rashomon, one finishes the story with a deep misgiving as to who did what, after so many incomplete and biased accountings. Unlike Rashomon, it alas takes forever to reach this point!

vertical likelihood Monte Carlo integration

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , on April 17, 2015 by xi'an

A few months ago, Nick Polson and James Scott arXived a paper on one of my favourite problems, namely the approximation of normalising constants (and it went way under my radar, as I only became aware of it quite recently!, then it remained in my travel bag for an extra few weeks…). The method for approximating the constant Z draws from an analogy with the energy level sampling methods found in physics, like the Wang-Landau algorithm. The authors rely on a one-dimensional slice sampling representation of the posterior distribution and [main innovation in the paper] add a weight function on the auxiliary uniform. The choice of the weight function links the approach with the dreaded harmonic estimator (!), but also with power-posterior and bridge sampling. The paper recommends a specific weighting function, based on a “score-function heuristic” I do not get. Further, the optimal weight depends on intractable cumulative functions as in nested sampling. It would be fantastic if one could draw directly from the prior distribution of the likelihood function—rather than draw an x [from the prior or from something better, as suggested in our 2009 Biometrika paper] and transform it into L(x)—but as in all existing alternatives this alas is not the case. (Which is why I find the recommendations in the paper for practical implementation rather impractical, since, were the prior cdf of L(X) available, direct simulation of L(X) would be feasible. Maybe not the optimal choice though.)

“What is the distribution of the likelihood ordinates calculated via nested sampling? The answer is surprising: it is essentially the same as the distribution of likelihood ordinates by recommended weight function from Section 4.”

The approach is thus very much related to nested sampling, at least in spirit. As the authors later demonstrate, nested sampling is another case of weighting, Both versions require simulations under truncated likelihood values. Albeit with a possibility of going down [in likelihood values] with the current version. Actually, more weighting could prove [more] efficient as both the original nested and vertical sampling simulate from the prior under the likelihood constraint. Getting away from the prior should help. (I am quite curious to see how the method is received and applied.)

reis naar Amsterdam

Posted in Books, Kids, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , on April 16, 2015 by xi'an

Amster4On Monday, I went to Amsterdam to give a seminar at the University of Amsterdam, in the department of psychology. And to visit Eric-Jan Wagenmakers and his group there. And I had a fantastic time! I talked about our mixture proposal for Bayesian testing and model choice without getting hostile or adverse reactions from the audience, quite the opposite as we later discussed this new notion for several hours in the café across the street. I also had the opportunity to meet with Peter Grünwald [who authored a book on the minimum description length principle] pointed out a minor inconsistency of the common parameter approach, namely that the Jeffreys prior on the first model did not have to coincide with the Jeffreys prior on the second model. (The Jeffreys prior for the mixture being unavailable.) He also wondered about a more conservative property of the approach, compared with the Bayes factor, in the sense that the non-null parameter could get closer to the null-parameter while still being identifiable.

Amster6Among the many persons I met in the department, Maarten Marsman talked to me about his thesis research, Plausible values in statistical inference, which involved handling the Ising model [a non-sparse Ising model with O(p²) parameters] by an auxiliary representation due to Marc Kac and getting rid of the normalising (partition) constant by the way. (Warning, some approximations involved!) And who showed me a simple probit example of the Gibbs sampler getting stuck as the sample size n grows. Simply because the uniform conditional distribution on the parameter concentrates faster (in 1/n) than the posterior (in 1/√n). This does not come as a complete surprise as data augmentation operates in an n-dimensional space. Hence it requires more time to get around. As a side remark [still worth printing!], Maarten dedicated his thesis as “To my favourite random variables , Siem en Fem, and to my normalizing constant, Esther”, from which I hope you can spot the influence of at least two of my book dedications! As I left Amsterdam on Tuesday, I had time for a enjoyable dinner with E-J’s group, an equally enjoyable early morning run [with perfect skies for sunrise pictures!], and more discussions in the department. Including a presentation of the new (delicious?!) Bayesian software developed there, JASP, which aims at non-specialists [i.e., researchers unable to code in R, BUGS, or, God forbid!, STAN] And about the consequences of mixture testing in some psychological experiments. Once again, a fantastic time discussing Bayesian statistics and their applications, with a group of dedicated and enthusiastic Bayesians!Amster12

Bernoulli, Montmort and Waldegrave

Posted in Books, Kids, R, Statistics on April 15, 2015 by xi'an

In the last issue of Statistical Science, David Belhouse [author of De Moivre’s biography]  and Nicolas Fillion published an accounting of a discussion between Pierre Rémond de Montmort, Nicolaus Bernoulli—”the” Bernoulli associated with the St. Petersburg paradox—, and Francis Waldegrave, about the card game of Le Her (or Hère, for wretch). Here is the abridged description from the paper:

“Le Her is a game (…) played with a standard deck of fifty-two playing cards. The simplest situation is when two players [Pierre and Paul] play the game, and the solution is not simply determined  even in that situation (…) Pierre deals a card from the deck to Paul and then one to himself. Paul has the option of switching his card for Pierre’s card. Pierre can only refuse the switch if he holds a king (the highest valued card). After Paul makes his decision to hold or switch, Pierre now has the option to hold whatever card he now has or to switch it with a card drawn from the deck. However, if he draws a king, he must retain his original card. The player with the highest card wins the pot, with ties going to the dealer Pierre (…) What are the chances of each player (…) ?” (p.2)

As the paper focus on the various and conflicting resolutions by those 18th Century probabilists, reaching the solution [for Paul to win]

\dfrac{2828ac+2834bc+2838ad+2828bd}{13\cdot 17\cdot 25 \cdot(a+b+c+d)}

“where a is Paul’s probability of switching with seven, b is Paul’s probability of holding the seven, c is Pierre’s probability of switching with an eight, and d is Pierre’s probability of holding on to an eight”

[which sounds amazing for the time, circa 1713!], where I do not see how a+b or c+d are different from 1,  I ran a small R code to check the probability that Paul wins if he switches when there are more larger than smaller values in the remaining cards and Pierre adopts the same strategy if Paul did not switch:

cards=rep(1:13,4)
win=0
T=10^6
for (t in 1:T){
deal=sample(cards,2)
#Alice has deal[1]
switch=0
rest=cards[-deal[1]]
if ((deal[2]<13)&amp;(sum(rest<=deal[1])<sum(rest>=deal[1]))){
 switch=deal[2];deal[2]=deal[1];deal[1]=switch}
#Bob's turn
if (switch>0){
  rest=cards[-deal]
  if (deal[2]<deal[1]){ #sure loss worse than random one
    draw=sample(rest,1)
    if (draw<13) deal[2]=draw}
}else{
  rest=cards[-deal[2]]
  if (sum(rest<=deal[2])<sum(rest>=deal[2])){
   draw=sample(rest,1)
   if (draw<13) deal[2]=draw}}
win=win+(deal[2]>=deal[1])
}
1-win/T

Returning a winning probability of 0.5128 [at the first try] for Paul. However, this is not the optimal strategy for either Paul or Pierre, since randomisation for card values of 7 and 8 push Paul’s odds slightly higher!

failures and uses of Jaynes’ principle of transformation groups

Posted in Books, Kids, R, Statistics, University life with tags , , , , on April 14, 2015 by xi'an

This paper by Alon Drory was arXived last week when I was at Columbia. It reassesses Jaynes’ resolution of Bertrand’s paradox, which finds three different probabilities for a given geometric event depending on the underlying σ-algebra (or definition of randomness!). Both Poincaré and Jaynes argued against Bertrand that there was only one acceptable solution under symmetry properties. The author of this paper, Alon Drory, argues this is not the case!

“…contrary to Jaynes’ assertion, each of the classical three solutions of Bertrand’s problem (and additional ones as well!) can be derived by the principle of transformation groups, using the exact same symmetries, namely rotational, scaling and translational invariance.”

Drory rephrases as follows:  “In a circle, select at random a chord that is not a diameter. What is the probability that its length is greater than the side of the equilateral triangle inscribed in the circle?”.  Jaynes’ solution is indifferent to the orientation of one observer wrt the circle, to the radius of the circle, and to the location of the centre. The later is the one most discussed by Drory, as he argued that it does not involve an observer but the random experiment itself and relies on a specific version of straw throws in Jaynes’ argument. Meaning other versions are also available. This reminded me of an earlier post on Buffon’s needle and on the different versions of the needle being thrown over the floor. Therein reflecting on the connection with Bertrand’s paradox. And running some further R experiments. Drory’s alternative to Jaynes’ manner of throwing straws is to impale them on darts and throw the darts first! (Which is the same as one of my needle solutions.)

“…the principle of transformation groups does not make the problem well-posed, and well-posing strategies that rely on such symmetry considerations ought therefore to be rejected.”

In short, the conclusion of the paper is that there is an indeterminacy in Bertrand’s problem that allows several resolutions under the principle of indifference that end up with a large range of probabilities, thus siding with Bertrand rather than Jaynes.

Follow

Get every new post delivered to your Inbox.

Join 812 other followers